#217 - Contains Duplicate
Contains Duplicate
- Difficulty: Easy
- Topics: Array, Hash Table, Sorting
- Link: https://leetcode.com/problems/contains-duplicate/
Problem Description
Given an integer array nums
, return true
if any value appears at least twice in the array, and return false
if every element is distinct.
Example 1:
Input: nums = [1,2,3,1]
Output: true
Explanation:
The element 1 occurs at the indices 0 and 3.
Example 2:
Input: nums = [1,2,3,4]
Output: false
Explanation:
All elements are distinct.
Example 3:
Input: nums = [1,1,1,3,3,4,3,2,4,2]
Output: true
Constraints:
1 <= nums.length <= 105
-109 <= nums[i] <= 109
Solution
1. Problem Deconstruction
Technical Restatement
Given an array nums
of integers, determine if there exists at least one element that occurs two or more times. Return true
if duplicates are present, false
otherwise.
Beginner-Friendly Restatement
Imagine you have a list of numbers. Your task is to check if any number in the list repeats. For example, if the list has two 5s, you return “yes” (true). If all numbers are unique, return “no” (false).
Mathematical Restatement
Let ( S ) be a set constructed from the elements of nums
. The problem reduces to evaluating:
[
|S| < |\text{nums}| \implies \text{true}
]
Otherwise, false.
Constraint Analysis
-
1 <= nums.length <= 1e5
:- Solution Limitation: Algorithms with ( O(n^2) ) time complexity (e.g., brute force pairwise checks) are infeasible.
- Edge Case: Single-element arrays trivially return false (no duplicates possible).
-
-1e9 <= nums[i] <= 1e9
:- Solution Limitation: Values span a wide range, but hash-based structures (dictionaries, sets) handle them efficiently.
- Edge Case: Negative numbers and zeros must be hashed correctly (no special handling required in Python).
2. Intuition Scaffolding
Analogies
- Real-World: Checking for duplicate ID cards in a conference. If any attendee shares an ID, the system flags it.
- Gaming: Verifying unique usernames in a multiplayer lobby. Duplicate names trigger a warning.
- Math: Comparing the cardinality of a multiset to its underlying set. A discrepancy implies duplicates.
Common Pitfalls
- Sorting Oversight: Assuming sorted adjacency checks are sufficient without verifying all adjacent pairs.
- Hash Collisions: Misunderstanding that Python’s
set
handles collisions internally. - Early Termination: Not immediately returning
true
upon finding the first duplicate. - Space Neglect: Overlooking memory constraints in embedded systems (irrelevant here due to problem bounds).
- Index Errors: In brute force approaches, comparing an element to itself (i.e.,
i == j
).
3. Approach Encyclopedia
Brute Force (Theoretical, Not Viable)
def containsDuplicate(nums):
for i in range(len(nums)):
for j in range(i + 1, len(nums)):
if nums[i] == nums[j]:
return True
return False
- Time Complexity: ( O(n^2) ). For ( n ) elements, ( \frac{n(n-1)}{2} ) comparisons.
- Space Complexity: ( O(1) ). No additional data structures.
Sorting-Based Approach
def containsDuplicate(nums):
nums.sort()
for i in range(1, len(nums)):
if nums[i] == nums[i-1]:
return True
return False
- Time Complexity: ( O(n \log n) ) from sorting. Adjacent checks are ( O(n) ).
- Space Complexity: ( O(1) ) if sorted in-place (Python’s
sort()
does this).
Hash Set (Optimal)
def containsDuplicate(nums):
seen = set()
for num in nums:
if num in seen:
return True
seen.add(num)
return False
- Time Complexity: ( O(n) ). Each insertion and lookup in a Python set is ( O(1) ) average case.
- Space Complexity: ( O(n) ). Worst case stores all elements.
Visualization (Hash Set)
nums = [1, 2, 3, 1]
Step 1: seen = {1} → not found
Step 2: seen = {1,2} → not found
Step 3: seen = {1,2,3} → not found
Step 4: 1 is in seen → return True
4. Code Deep Dive
Optimal Solution Code
def containsDuplicate(nums):
seen = set() # Initialize an empty hash set
for num in nums: # Iterate through each element
if num in seen: # O(1) membership check
return True # Duplicate found
seen.add(num) # Add to set for future checks
return False # All elements unique
Line-by-Line Breakdown
seen = set()
: Creates an empty set to track encountered elements.for num in nums
: Iterates over each element in the input array.if num in seen
: Checks if current element exists in the set (O(1) average time).return True
: Immediate exit upon finding a duplicate.seen.add(num)
: Adds new elements to the set for subsequent checks.return False
: Executed only if loop completes without duplicates.
Edge Case Handling
- Example 2 (All Unique): Loop completes, returns
False
. - Single Element: Loop runs once, returns
False
. - Large Inputs: Hash operations remain efficient within ( O(n) ).
5. Complexity War Room
Hardware-Aware Analysis
- A 1e5-element array requires ~800KB of memory (assuming 8-byte integers). Python sets have overhead (~20MB for 1e5 elements), but modern RAM handles this easily.
Industry Comparison Table
Approach | Time | Space | Readability | Interview Viability |
---|---|---|---|---|
Brute Force | O(n²) | O(1) | 8/10 | ❌ Fails at scale |
Sorting | O(n log n) | O(1) | 9/10 | ✅ Acceptable |
Hash Set | O(n) | O(n) | 10/10 | ✅ Optimal Choice |
6. Pro Mode Extras
Variants
- Return All Duplicates (LC 442): Modify to collect duplicates in a list.
- Check for Triplicates: Change the condition to count occurrences.
- Two Duplicates Allowed: Track counts with a dictionary.
Interview Cheat Sheet
- First Mention: Always state time/space complexity upfront.
- Tradeoffs: “The hash set approach trades space for linear time, which is optimal here.”
- Follow-Up: If input is sorted, a two-pointer approach reduces space to ( O(1) ).