#26 - Remove Duplicates from Sorted Array
Remove Duplicates from Sorted Array
- Difficulty: Easy
- Topics: Array, Two Pointers
- Link: https://leetcode.com/problems/remove-duplicates-from-sorted-array/
Problem Description
Given an integer array nums
sorted in non-decreasing order, remove the duplicates in-place such that each unique element appears only once. The relative order of the elements should be kept the same. Then return the number of unique elements in nums
.
Consider the number of unique elements of nums
to be k
, to get accepted, you need to do the following things:
- Change the array
nums
such that the firstk
elements ofnums
contain the unique elements in the order they were present innums
initially. The remaining elements ofnums
are not important as well as the size ofnums
. - Return
k
.
Custom Judge:
The judge will test your solution with the following code:
int[] nums = [...]; // Input array
int[] expectedNums = [...]; // The expected answer with correct length
int k = removeDuplicates(nums); // Calls your implementation
assert k == expectedNums.length;
for (int i = 0; i < k; i++) {
assert nums[i] == expectedNums[i];
}
If all assertions pass, then your solution will be accepted.
Example 1:
Input: nums = [1,1,2]
Output: 2, nums = [1,2,_]
Explanation: Your function should return k = 2, with the first two elements of nums being 1 and 2 respectively.
It does not matter what you leave beyond the returned k (hence they are underscores).
Example 2:
Input: nums = [0,0,1,1,1,2,2,3,3,4]
Output: 5, nums = [0,1,2,3,4,_,_,_,_,_]
Explanation: Your function should return k = 5, with the first five elements of nums being 0, 1, 2, 3, and 4 respectively.
It does not matter what you leave beyond the returned k (hence they are underscores).
Constraints:
1 <= nums.length <= 3 * 104
-100 <= nums[i] <= 100
nums
is sorted in non-decreasing order.
Solution
1. Problem Deconstruction
Technical Restatement:
Given a sorted array nums
in non-decreasing order (i.e., for all valid ), perform an in-place removal of duplicate elements such that each distinct value appears exactly once in the prefix of the array. The algorithm must preserve the original relative ordering of unique elements. The return value is the count of unique elements. The modified array must satisfy the condition that the first elements are the unique elements in their original order, while elements beyond are irrelevant. The solution must operate with auxiliary space (excluding the input array).
Beginner-Friendly Explanation:
You have a list of numbers that are already arranged from smallest to largest. Your task is to clean up this list by removing any repeated numbers, but you must do this within the original list without creating a new one. The unique numbers should appear at the start of the list in the same order they originally did. After moving all unique numbers to the front, you need to return how many unique numbers there are. What’s left in the list after these unique numbers doesn’t matter.
Mathematical Formulation:
Let represent the input array of length , sorted such that for . The goal is to find the largest integer and a permutation of the first elements such that:
- is the cardinality of the set , i.e., .
- The first elements satisfy for all .
- The sequence is a subsequence of the original array, preserving relative order.
- The operation must be performed in-place, meaning the algorithm may use only additional memory beyond the input array.
Constraint Analysis:
1 <= nums.length <= 3 * 10^4
: The array size is moderate (up to 30,000 elements). This rules out brute-force solutions with time complexity, as operations, which is borderline in C++ but often too slow in Python. An solution is necessary.-100 <= nums[i] <= 100
: The limited value range (-100 to 100) implies that duplicates are very common. However, the sorted nature is the key property, not the value range.nums is sorted in non-decreasing order
: This is the most critical constraint. The sortedness guarantees that all duplicates for a given value are adjacent. This allows for efficient duplicate skipping in a single linear pass.
Hidden Edge Cases from Constraints:
- Minimal Input: An array with a single element (). There are no duplicates, so and the array remains unchanged.
- All Duplicates: An array where all elements are identical (e.g.,
[5,5,5]
). The output should be with the first element being that value. - No Duplicates: An array where all elements are distinct (e.g.,
[1,2,3]
). The output and the array remains identical. - Negative and Zero Values: The constraints include negative numbers and zero. The algorithm must handle these correctly (e.g.,
[-2,-2,-1,0,0]
should become[-2,-1,0]
).
2. Intuition Scaffolding
Real-World Metaphor (Conga Line):
Imagine people lined up for a conga line, arranged by height (shortest to tallest). Some people have the same height. The dance organizer wants to remove duplicates so that each height is represented only once, but the line order must otherwise remain the same. The organizer walks along the line (a single pass). They point to the next available unique spot at the front. For each person, if they are a new height, they are moved to that spot. If they are the same height as the last unique person, they are skipped. The number of unique spots filled is the final count.
Gaming Analogy (Sorted Inventory Management):
In a role-playing game, your inventory is sorted by item type (e.g., potions, swords). You want to consolidate duplicate items. You scroll through your sorted inventory list. When you find a new item type, you place it in the next available slot in your “unique items” section. Duplicates are left behind (or overwritten). The number of unique item types is your .
Math Analogy (Uniqueness Filter on a Sorted Sequence):
Given a sorted sequence , the problem reduces to computing the longest subsequence such that for all indices in , . This is equivalent to filtering out consecutive duplicates. The uniqueness is trivial to determine by comparing adjacent elements due to the sortedness.
Common Pitfalls:
- Using Extra Space: Creating a new array to store unique elements violates the in-place requirement. The judge will test memory usage.
- Not Updating the Array: Simply counting unique elements without modifying the array’s prefix fails the custom judge.
- Overwriting Too Early: If you use a slow pointer that doesn’t advance correctly, you might overwrite values that haven’t been processed yet.
- Off-by-One Errors: Incorrectly handling the indices for the unique position pointer (often called
k
orwrite_index
) can lead to missing the last element or including an extra duplicate. - Ignoring the Sortedness: Attempting to use a hash set (which would require space) is unnecessary and inefficient for space, given the sorted property.
3. Approach Encyclopedia
1. Brute Force (Not True In-Place)
- Idea: Create a new list to store unique elements by iterating and checking if the current element is different from the last added element. Then copy back into the original array. This is not in-place but is intuitive.
- Why it fails: Uses auxiliary space, which violates the problem’s explicit in-place requirement. The custom judge will reject it for using extra memory.
- Pseudocode:
unique_list = [] for num in nums: if unique_list is empty OR num != last element in unique_list: append num to unique_list copy unique_list to nums[0:len(unique_list)] return len(unique_list)
- Complexity: Time , Space .
- Visualization:
Input: [0,0,1,1,1,2,2] Step 1: unique_list = [0] (first element) Step 2: 0 == 0? Skip. Step 3: 1 != 0? Add -> [0,1] Step 4: 1 == 1? Skip. ... until unique_list = [0,1,2] Then copy back to nums: [0,1,2, ...]
2. Two Pointers (Optimal)
- Idea: Use a slow pointer (
write_index
) to mark the next position for a unique element. Use a fast pointer to traverse the array. Whenever a new unique element is found (i.e.,nums[fast] != nums[fast-1]
), place it atwrite_index
and incrementwrite_index
. - Why it works: The sortedness ensures duplicates are consecutive. The slow pointer builds the unique prefix in-place without extra space.
- Pseudocode:
if nums is empty: return 0 write_index = 1 // first element is always unique for fast_index from 1 to len(nums)-1: if nums[fast_index] != nums[fast_index-1]: nums[write_index] = nums[fast_index] write_index += 1 return write_index
- Complexity Proof: The algorithm does a single linear scan through the array. Each element is compared once and possibly written once. Thus, time complexity is . Only two integer variables are used, so space complexity is .
- Visualization:
Input: [0,0,1,1,1,2,2] write_index (w) = 1, fast_index (f) starts at 1. f=1: nums[1]=0, nums[0]=0 -> equal? Skip. f=2: nums[2]=1, nums[1]=0 -> not equal? set nums[w]=nums[2]=1 -> array becomes [0,1,1,1,1,2,2], w=2. f=3: nums[3]=1, nums[2]=1 -> equal? Skip. f=4: nums[4]=1, nums[3]=1 -> equal? Skip. f=5: nums[5]=2, nums[4]=1 -> not equal? set nums[w]=nums[5]=2 -> array becomes [0,1,2,1,1,2,2], w=3. f=6: nums[6]=2, nums[5]=2 -> equal? Skip. Return w=3. Final array prefix: [0,1,2,...]
4. Code Deep Dive
Optimal Solution Code (Python):
def removeDuplicates(nums):
if not nums: # Handle empty input edge case
return 0
write_index = 1 # The first element (index0) is always unique
for fast_index in range(1, len(nums)):
if nums[fast_index] != nums[fast_index-1]: # Check if current is new unique
nums[write_index] = nums[fast_index] # Write to unique section
write_index += 1 # Move unique pointer forward
return write_index # Number of unique elements
Line-by-Line Annotations:
-
Line 2:
if not nums: return 0
What: Checks if the input list is empty.
Why: If the array has zero elements, there are zero unique elements. This prevents errors in the loop.
Edge Case: Handles the constraint1 <= nums.length
by ensuring we don’t access an empty list. -
Line 3:
write_index = 1
What: Initializes the pointer for the next unique element position.
Why: The element at index 0 is always unique (it has no predecessor). Thus, the unique section starts at index 0, and the next available slot is index 1. -
Line 4:
for fast_index in range(1, len(nums)):
What: Iterates over the array from the second element to the last.
Why: The first element is already considered unique. We need to check each subsequent element against its immediate predecessor. -
Line 5:
if nums[fast_index] != nums[fast_index-1]:
What: Compares the current element with the previous element.
Why: Due to sortedness, duplicates are consecutive. This condition is true only when the current element is different from the previous, indicating a new unique value. -
Line 6:
nums[write_index] = nums[fast_index]
What: Copies the new unique value to the current write position.
Why: This builds the unique prefix in-place. Note: This overwrites values, but those values are either duplicates or already copied forward. -
Line 7:
write_index += 1
What: Advances the write pointer.
Why: The next unique element will be placed at the next index. -
Line 8:
return write_index
What: Returns the count of unique elements.
Why: The write_index effectively counts how many unique elements have been placed.
Edge Case Handling in Code:
-
Example 1 (
nums = [1,1,2]
):- Initially:
write_index=1
,fast_index=1
: comparesnums[1]=1
andnums[0]=1
→ equal? Skip. fast_index=2
: comparesnums[2]=2
andnums[1]=1
→ not equal? Setnums[1]=2
, thenwrite_index=2
.- Returns 2. Array becomes
[1,2,2]
(the last element is irrelevant).
- Initially:
-
Example 2 (
nums = [0,0,1,1,1,2,2,3,3,4]
):- The code will copy
0
(index0), then atfast_index=2
(value=1) copy towrite_index=1
, then atfast_index=5
(value=2) copy towrite_index=2
, then atfast_index=7
(value=3) copy towrite_index=3
, then atfast_index=9
(value=4) copy towrite_index=4
. - Returns 5. The array prefix becomes
[0,1,2,3,4,...]
.
- The code will copy
-
All Duplicates (
[5,5,5]
):- The condition
nums[fast_index] != nums[fast_index-1]
is never true after index0. Sowrite_index
remains 1. - Returns 1. The array becomes
[5,5,5]
(but only the first element matters).
- The condition
5. Complexity War Room
Hardware-Aware Analysis:
- The algorithm uses a single pass over the array, which has up to 30,000 elements.
- On a modern CPU, iterating over 30,000 elements is negligible (L1 cache is typically 32-64KB, and 30,000 integers would be about 120KB, which may fit in L2/L3 cache).
- The constant space usage (two integers) is minimal.
Industry Comparison Table:
Approach | Time Complexity | Space Complexity | Readability | Interview Viability |
---|---|---|---|---|
Brute Force (Extra Array) | High ( intuitive) | ❌ Fails in-place requirement | ||
Two Pointers (Optimal) | High | ✅ Ideal solution | ||
In-place with Shifts (Bad) | Medium | ❌ Too slow for large n |
Why Two Pointers Wins:
- Meets all problem constraints: in-place, time, preserves order.
- Readable and easy to explain in interviews.
- Cache-friendly: sequential memory access pattern.
6. Pro Mode Extras
Variants:
- Remove Duplicates from Unsorted Array: Would require space (hash set) or time with sorting.
- Remove Duplicates Allowing At Most k duplicates (LC 80):
- Example: Allow at most 2 duplicates. The two-pointer method extends by comparing
nums[fast]
withnums[write_index - k]
. - Code snippet:
def removeDuplicatesK(nums, k=2): if len(nums) <= k: return len(nums) write_index = k for fast_index in range(k, len(nums)): if nums[fast_index] != nums[write_index - k]: nums[write_index] = nums[fast_index] write_index += 1 return write_index
- Example: Allow at most 2 duplicates. The two-pointer method extends by comparing
- Remove Duplicates from Sorted List (LC 83): Same logic but applied to linked lists.
Interview Cheat Sheet:
- First Mention: Always state time and space complexity upfront.
- Key Insight: “Because the array is sorted, duplicates must be consecutive. This allows us to use a two-pointer approach.”
- Whiteboard: Draw the array with slow and fast pointers. Show how the write index builds the unique prefix.
- Test Cases: Always test with all duplicates, no duplicates, and minimal input.
- Common Mistake: Avoid using
nums[fast_index] != nums[write_index]
as the condition—it fails when there are more than two duplicates. Compare with the previous element instead.