Remove Duplicates from Sorted Array

Difficulty: Easy
Topics: Array, Two Pointers
Link: https://leetcode.com/problems/remove-duplicates-from-sorted-array/

Problem Description

Given an integer array nums sorted in non-decreasing order, remove the duplicates in-place such that each unique element appears only once. The relative order of the elements should be kept the same. Then return the number of unique elements in nums.

Consider the number of unique elements of nums to be k, to get accepted, you need to do the following things:

Change the array nums such that the first k elements of nums contain the unique elements in the order they were present in nums initially. The remaining elements of nums are not important as well as the size of nums.
Return k.

Custom Judge:

The judge will test your solution with the following code:


int[] nums = [...]; // Input array
int[] expectedNums = [...]; // The expected answer with correct length

int k = removeDuplicates(nums); // Calls your implementation

assert k == expectedNums.length;
for (int i = 0; i < k; i++) {
    assert nums[i] == expectedNums[i];
}

If all assertions pass, then your solution will be accepted.

Example 1:


Input: nums = [1,1,2]
Output: 2, nums = [1,2,_]
Explanation: Your function should return k = 2, with the first two elements of nums being 1 and 2 respectively.
It does not matter what you leave beyond the returned k (hence they are underscores).

Example 2:


Input: nums = [0,0,1,1,1,2,2,3,3,4]
Output: 5, nums = [0,1,2,3,4,_,_,_,_,_]
Explanation: Your function should return k = 5, with the first five elements of nums being 0, 1, 2, 3, and 4 respectively.
It does not matter what you leave beyond the returned k (hence they are underscores).

Constraints:

1 <= nums.length <= 3 * 104
-100 <= nums[i] <= 100
nums is sorted in non-decreasing order.

Solution

1. Problem Deconstruction

Technical Restatement:
Given a sorted array nums in non-decreasing order (i.e., $nums[i] \leq nums[i+1]$ for all valid $i$ ), perform an in-place removal of duplicate elements such that each distinct value appears exactly once in the prefix of the array. The algorithm must preserve the original relative ordering of unique elements. The return value $k$ is the count of unique elements. The modified array must satisfy the condition that the first $k$ elements are the unique elements in their original order, while elements beyond $k$ are irrelevant. The solution must operate with $O(1)$ auxiliary space (excluding the input array).

Beginner-Friendly Explanation:
You have a list of numbers that are already arranged from smallest to largest. Your task is to clean up this list by removing any repeated numbers, but you must do this within the original list without creating a new one. The unique numbers should appear at the start of the list in the same order they originally did. After moving all unique numbers to the front, you need to return how many unique numbers there are. What’s left in the list after these unique numbers doesn’t matter.

Mathematical Formulation:
Let $nums[0..n-1]$ represent the input array of length $n$ , sorted such that $nums[i] \leq nums[i+1]$ for $0 \leq i < n-1$ . The goal is to find the largest integer $k$ and a permutation of the first $k$ elements such that:

$k$ is the cardinality of the set $\{nums[0], nums[1], ..., nums[n-1]\}$ , i.e., $k = |\{nums[i] : 0 \leq i < n\}|$ .
The first $k$ elements satisfy $nums[i] \neq nums[j]$ for all $0 \leq i < j < k$ .
The sequence $nums[0], nums[1], ..., nums[k-1]$ is a subsequence of the original array, preserving relative order.
The operation must be performed in-place, meaning the algorithm may use only $O(1)$ additional memory beyond the input array.

Constraint Analysis:

1 <= nums.length <= 3 * 10^4: The array size is moderate (up to 30,000 elements). This rules out brute-force solutions with $O(n^2)$ time complexity, as $30,000^2 = 900 \times 10^6$ operations, which is borderline in C++ but often too slow in Python. An $O(n)$ solution is necessary.
-100 <= nums[i] <= 100: The limited value range (-100 to 100) implies that duplicates are very common. However, the sorted nature is the key property, not the value range.
nums is sorted in non-decreasing order: This is the most critical constraint. The sortedness guarantees that all duplicates for a given value are adjacent. This allows for efficient duplicate skipping in a single linear pass.

Hidden Edge Cases from Constraints:

Minimal Input: An array with a single element ( $n=1$ ). There are no duplicates, so $k=1$ and the array remains unchanged.
All Duplicates: An array where all elements are identical (e.g., [5,5,5]). The output should be $k=1$ with the first element being that value.
No Duplicates: An array where all elements are distinct (e.g., [1,2,3]). The output $k=n$ and the array remains identical.
Negative and Zero Values: The constraints include negative numbers and zero. The algorithm must handle these correctly (e.g., [-2,-2,-1,0,0] should become [-2,-1,0]).

2. Intuition Scaffolding

Real-World Metaphor (Conga Line):
Imagine people lined up for a conga line, arranged by height (shortest to tallest). Some people have the same height. The dance organizer wants to remove duplicates so that each height is represented only once, but the line order must otherwise remain the same. The organizer walks along the line (a single pass). They point to the next available unique spot at the front. For each person, if they are a new height, they are moved to that spot. If they are the same height as the last unique person, they are skipped. The number of unique spots filled is the final count.

Gaming Analogy (Sorted Inventory Management):
In a role-playing game, your inventory is sorted by item type (e.g., potions, swords). You want to consolidate duplicate items. You scroll through your sorted inventory list. When you find a new item type, you place it in the next available slot in your “unique items” section. Duplicates are left behind (or overwritten). The number of unique item types is your $k$ .

Math Analogy (Uniqueness Filter on a Sorted Sequence):
Given a sorted sequence $S$ , the problem reduces to computing the longest subsequence $S'$ such that for all indices $i$ in $S'$ , $S'[i] \neq S'[i+1]$ . This is equivalent to filtering out consecutive duplicates. The uniqueness is trivial to determine by comparing adjacent elements due to the sortedness.

Common Pitfalls:

Using Extra Space: Creating a new array to store unique elements violates the in-place requirement. The judge will test memory usage.
Not Updating the Array: Simply counting unique elements without modifying the array’s prefix fails the custom judge.
Overwriting Too Early: If you use a slow pointer that doesn’t advance correctly, you might overwrite values that haven’t been processed yet.
Off-by-One Errors: Incorrectly handling the indices for the unique position pointer (often called k or write_index) can lead to missing the last element or including an extra duplicate.
Ignoring the Sortedness: Attempting to use a hash set (which would require $O(n)$ space) is unnecessary and inefficient for space, given the sorted property.

3. Approach Encyclopedia

1. Brute Force (Not True In-Place)

Idea: Create a new list to store unique elements by iterating and checking if the current element is different from the last added element. Then copy back into the original array. This is not in-place but is intuitive.
Why it fails: Uses $O(n)$ auxiliary space, which violates the problem’s explicit in-place requirement. The custom judge will reject it for using extra memory.

Pseudocode:

unique_list = []
for num in nums:
    if unique_list is empty OR num != last element in unique_list:
        append num to unique_list
copy unique_list to nums[0:len(unique_list)]
return len(unique_list)

Complexity: Time $O(n)$ , Space $O(n)$ .

Visualization:

Input: [0,0,1,1,1,2,2]
Step 1: unique_list = [0] (first element)
Step 2: 0 == 0? Skip.
Step 3: 1 != 0? Add -> [0,1]
Step 4: 1 == 1? Skip.
... until unique_list = [0,1,2]
Then copy back to nums: [0,1,2, ...]

2. Two Pointers (Optimal)

Idea: Use a slow pointer (write_index) to mark the next position for a unique element. Use a fast pointer to traverse the array. Whenever a new unique element is found (i.e., nums[fast] != nums[fast-1]), place it at write_index and increment write_index.
Why it works: The sortedness ensures duplicates are consecutive. The slow pointer builds the unique prefix in-place without extra space.

Pseudocode:

if nums is empty: return 0
write_index = 1   // first element is always unique
for fast_index from 1 to len(nums)-1:
    if nums[fast_index] != nums[fast_index-1]:
        nums[write_index] = nums[fast_index]
        write_index += 1
return write_index

Complexity Proof: The algorithm does a single linear scan through the array. Each element is compared once and possibly written once. Thus, time complexity is $O(n)$ . Only two integer variables are used, so space complexity is $O(1)$ .

Visualization:

Input: [0,0,1,1,1,2,2]
write_index (w) = 1, fast_index (f) starts at 1.
f=1: nums[1]=0, nums[0]=0 -> equal? Skip.
f=2: nums[2]=1, nums[1]=0 -> not equal? 
      set nums[w]=nums[2]=1 -> array becomes [0,1,1,1,1,2,2], w=2.
f=3: nums[3]=1, nums[2]=1 -> equal? Skip.
f=4: nums[4]=1, nums[3]=1 -> equal? Skip.
f=5: nums[5]=2, nums[4]=1 -> not equal?
      set nums[w]=nums[5]=2 -> array becomes [0,1,2,1,1,2,2], w=3.
f=6: nums[6]=2, nums[5]=2 -> equal? Skip.
Return w=3.
Final array prefix: [0,1,2,...]

4. Code Deep Dive

Optimal Solution Code (Python):

def removeDuplicates(nums):
    if not nums:  # Handle empty input edge case
        return 0
    write_index = 1  # The first element (index0) is always unique
    for fast_index in range(1, len(nums)):
        if nums[fast_index] != nums[fast_index-1]:  # Check if current is new unique
            nums[write_index] = nums[fast_index]  # Write to unique section
            write_index += 1  # Move unique pointer forward
    return write_index  # Number of unique elements

Line-by-Line Annotations:

Line 2: if not nums: return 0
What: Checks if the input list is empty.
Why: If the array has zero elements, there are zero unique elements. This prevents errors in the loop.
Edge Case: Handles the constraint 1 <= nums.length by ensuring we don’t access an empty list.
Line 3: write_index = 1
What: Initializes the pointer for the next unique element position.
Why: The element at index 0 is always unique (it has no predecessor). Thus, the unique section starts at index 0, and the next available slot is index 1.
Line 4: for fast_index in range(1, len(nums)):
What: Iterates over the array from the second element to the last.
Why: The first element is already considered unique. We need to check each subsequent element against its immediate predecessor.
Line 5: if nums[fast_index] != nums[fast_index-1]:
What: Compares the current element with the previous element.
Why: Due to sortedness, duplicates are consecutive. This condition is true only when the current element is different from the previous, indicating a new unique value.
Line 6: nums[write_index] = nums[fast_index]
What: Copies the new unique value to the current write position.
Why: This builds the unique prefix in-place. Note: This overwrites values, but those values are either duplicates or already copied forward.
Line 7: write_index += 1
What: Advances the write pointer.
Why: The next unique element will be placed at the next index.
Line 8: return write_index
What: Returns the count of unique elements.
Why: The write_index effectively counts how many unique elements have been placed.

Edge Case Handling in Code:

Example 1 (nums = [1,1,2]):
- Initially: write_index=1, fast_index=1: compares nums[1]=1 and nums[0]=1 → equal? Skip.
- fast_index=2: compares nums[2]=2 and nums[1]=1 → not equal? Set nums[1]=2, then write_index=2.
- Returns 2. Array becomes [1,2,2] (the last element is irrelevant).
Example 2 (nums = [0,0,1,1,1,2,2,3,3,4]):
- The code will copy 0 (index0), then at fast_index=2 (value=1) copy to write_index=1, then at fast_index=5 (value=2) copy to write_index=2, then at fast_index=7 (value=3) copy to write_index=3, then at fast_index=9 (value=4) copy to write_index=4.
- Returns 5. The array prefix becomes [0,1,2,3,4,...].
All Duplicates ([5,5,5]):
- The condition nums[fast_index] != nums[fast_index-1] is never true after index0. So write_index remains 1.
- Returns 1. The array becomes [5,5,5] (but only the first element matters).

5. Complexity War Room

Hardware-Aware Analysis:

The algorithm uses a single pass over the array, which has up to 30,000 elements.
On a modern CPU, iterating over 30,000 elements is negligible (L1 cache is typically 32-64KB, and 30,000 integers would be about 120KB, which may fit in L2/L3 cache).
The constant space usage (two integers) is minimal.

Industry Comparison Table:

Approach	Time Complexity	Space Complexity	Readability	Interview Viability
Brute Force (Extra Array)	$O(n)$	$O(n)$	High ( intuitive)	❌ Fails in-place requirement
Two Pointers (Optimal)	$O(n)$	$O(1)$	High	✅ Ideal solution
In-place with Shifts (Bad)	$O(n^2)$	$O(1)$	Medium	❌ Too slow for large n

Why Two Pointers Wins:

Meets all problem constraints: in-place, $O(n)$ time, preserves order.
Readable and easy to explain in interviews.
Cache-friendly: sequential memory access pattern.

6. Pro Mode Extras

Variants:

Remove Duplicates from Unsorted Array: Would require $O(n)$ space (hash set) or $O(n \log n)$ time with sorting.

Remove Duplicates Allowing At Most k duplicates (LC 80):

Example: Allow at most 2 duplicates. The two-pointer method extends by comparing nums[fast] with nums[write_index - k].

Code snippet:

def removeDuplicatesK(nums, k=2):
    if len(nums) <= k: return len(nums)
    write_index = k
    for fast_index in range(k, len(nums)):
        if nums[fast_index] != nums[write_index - k]:
            nums[write_index] = nums[fast_index]
            write_index += 1
    return write_index

Remove Duplicates from Sorted List (LC 83): Same logic but applied to linked lists.

Interview Cheat Sheet:

First Mention: Always state time and space complexity upfront.
Key Insight: “Because the array is sorted, duplicates must be consecutive. This allows us to use a two-pointer approach.”
Whiteboard: Draw the array with slow and fast pointers. Show how the write index builds the unique prefix.
Test Cases: Always test with all duplicates, no duplicates, and minimal input.
Common Mistake: Avoid using nums[fast_index] != nums[write_index] as the condition—it fails when there are more than two duplicates. Compare with the previous element instead.

#26 - Remove Duplicates from Sorted Array