← ~/lc-grind/posts

#217 - Contains Duplicate

April 22, 2025

Contains Duplicate

Problem Description

Given an integer array nums, return true if any value appears at least twice in the array, and return false if every element is distinct.

Example 1:

Input: nums = [1,2,3,1]

Output: true

Explanation:

The element 1 occurs at the indices 0 and 3.

Example 2:

Input: nums = [1,2,3,4]

Output: false

Explanation:

All elements are distinct.

Example 3:

Input: nums = [1,1,1,3,3,4,3,2,4,2]

Output: true

Constraints:

  • 1 <= nums.length <= 105
  • -109 <= nums[i] <= 109

Solution

1. Problem Deconstruction

Technical Restatement
Given an array nums of integers, determine if there exists at least one element that occurs two or more times. Return true if duplicates are present, false otherwise.

Beginner-Friendly Restatement
Imagine you have a list of numbers. Your task is to check if any number in the list repeats. For example, if the list has two 5s, you return “yes” (true). If all numbers are unique, return “no” (false).

Mathematical Restatement
Let ( S ) be a set constructed from the elements of nums. The problem reduces to evaluating:
[ |S| < |\text{nums}| \implies \text{true} ] Otherwise, false.

Constraint Analysis

  1. 1 <= nums.length <= 1e5:

    • Solution Limitation: Algorithms with ( O(n^2) ) time complexity (e.g., brute force pairwise checks) are infeasible.
    • Edge Case: Single-element arrays trivially return false (no duplicates possible).
  2. -1e9 <= nums[i] <= 1e9:

    • Solution Limitation: Values span a wide range, but hash-based structures (dictionaries, sets) handle them efficiently.
    • Edge Case: Negative numbers and zeros must be hashed correctly (no special handling required in Python).

2. Intuition Scaffolding

Analogies

  1. Real-World: Checking for duplicate ID cards in a conference. If any attendee shares an ID, the system flags it.
  2. Gaming: Verifying unique usernames in a multiplayer lobby. Duplicate names trigger a warning.
  3. Math: Comparing the cardinality of a multiset to its underlying set. A discrepancy implies duplicates.

Common Pitfalls

  1. Sorting Oversight: Assuming sorted adjacency checks are sufficient without verifying all adjacent pairs.
  2. Hash Collisions: Misunderstanding that Python’s set handles collisions internally.
  3. Early Termination: Not immediately returning true upon finding the first duplicate.
  4. Space Neglect: Overlooking memory constraints in embedded systems (irrelevant here due to problem bounds).
  5. Index Errors: In brute force approaches, comparing an element to itself (i.e., i == j).

3. Approach Encyclopedia

Brute Force (Theoretical, Not Viable)

def containsDuplicate(nums):
    for i in range(len(nums)):
        for j in range(i + 1, len(nums)):
            if nums[i] == nums[j]:
                return True
    return False
  • Time Complexity: ( O(n^2) ). For ( n ) elements, ( \frac{n(n-1)}{2} ) comparisons.
  • Space Complexity: ( O(1) ). No additional data structures.

Sorting-Based Approach

def containsDuplicate(nums):
    nums.sort()
    for i in range(1, len(nums)):
        if nums[i] == nums[i-1]:
            return True
    return False
  • Time Complexity: ( O(n \log n) ) from sorting. Adjacent checks are ( O(n) ).
  • Space Complexity: ( O(1) ) if sorted in-place (Python’s sort() does this).

Hash Set (Optimal)

def containsDuplicate(nums):
    seen = set()
    for num in nums:
        if num in seen:
            return True
        seen.add(num)
    return False
  • Time Complexity: ( O(n) ). Each insertion and lookup in a Python set is ( O(1) ) average case.
  • Space Complexity: ( O(n) ). Worst case stores all elements.

Visualization (Hash Set)

nums = [1, 2, 3, 1]
Step 1: seen = {1} → not found
Step 2: seen = {1,2} → not found
Step 3: seen = {1,2,3} → not found
Step 4: 1 is in seen → return True

4. Code Deep Dive

Optimal Solution Code

def containsDuplicate(nums):
    seen = set()  # Initialize an empty hash set
    for num in nums:  # Iterate through each element
        if num in seen:  # O(1) membership check
            return True  # Duplicate found
        seen.add(num)  # Add to set for future checks
    return False  # All elements unique

Line-by-Line Breakdown

  1. seen = set(): Creates an empty set to track encountered elements.
  2. for num in nums: Iterates over each element in the input array.
  3. if num in seen: Checks if current element exists in the set (O(1) average time).
  4. return True: Immediate exit upon finding a duplicate.
  5. seen.add(num): Adds new elements to the set for subsequent checks.
  6. return False: Executed only if loop completes without duplicates.

Edge Case Handling

  • Example 2 (All Unique): Loop completes, returns False.
  • Single Element: Loop runs once, returns False.
  • Large Inputs: Hash operations remain efficient within ( O(n) ).

5. Complexity War Room

Hardware-Aware Analysis

  • A 1e5-element array requires ~800KB of memory (assuming 8-byte integers). Python sets have overhead (~20MB for 1e5 elements), but modern RAM handles this easily.

Industry Comparison Table

Approach Time Space Readability Interview Viability
Brute Force O(n²) O(1) 8/10 ❌ Fails at scale
Sorting O(n log n) O(1) 9/10 ✅ Acceptable
Hash Set O(n) O(n) 10/10 ✅ Optimal Choice

6. Pro Mode Extras

Variants

  1. Return All Duplicates (LC 442): Modify to collect duplicates in a list.
  2. Check for Triplicates: Change the condition to count occurrences.
  3. Two Duplicates Allowed: Track counts with a dictionary.

Interview Cheat Sheet

  • First Mention: Always state time/space complexity upfront.
  • Tradeoffs: “The hash set approach trades space for linear time, which is optimal here.”
  • Follow-Up: If input is sorted, a two-pointer approach reduces space to ( O(1) ).