#347 - Top K Frequent Elements
Top K Frequent Elements
- Difficulty: Medium
- Topics: Array, Hash Table, Divide and Conquer, Sorting, Heap (Priority Queue), Bucket Sort, Counting, Quickselect
- Link: https://leetcode.com/problems/top-k-frequent-elements/
Problem Description
Given an integer array nums
and an integer k
, return the k
most frequent elements. You may return the answer in any order.
Example 1:
Input: nums = [1,1,1,2,2,3], k = 2
Output: [1,2]
Example 2:
Input: nums = [1], k = 1
Output: [1]
Constraints:
1 <= nums.length <= 105
-104 <= nums[i] <= 104
k
is in the range[1, the number of unique elements in the array]
.- It is guaranteed that the answer is unique.
Follow up: Your algorithm’s time complexity must be better than O(n log n)
, where n is the array’s size.
Solution
1. Problem Deconstruction
Technical Restatement
Given an array nums
of integers and integer k
, identify the k
most frequently occurring elements. Formally:
- Let be an integer array of length
- Let be the frequency function
- Find set such that and
- Return as an ordered list (order irrelevant)
Beginner Restatement
You have a list of numbers. Count how often each number appears. Find the k
numbers that appear most frequently. Return them in any order.
Mathematical Restatement
For array of size :
where . Uniqueness guarantee implies strict ordering:
where is the -th largest frequency.
Constraint Analysis
1 <= nums.length <= 10^5
- Rules out brute force (e.g., 10$^{10}$ ops at )
- Requires or solution
-10^4 <= nums[i] <= 10^4
- Implies bounded distinct values
- Permits methods (since is constant-bounded)
- Enables frequency bucket arrays of size
k ∈ [1, number of unique elements]
- Ensures non-trivial solution ( invalid)
- Worst-case (requires heap/selection)
- Guaranteed unique answer
- No ties at -th position:
- Simplifies termination in bucket sort (no partial bucket handling)
Hidden Edge Cases
- : Single most frequent element (degenerate heap/sort)
- : All distinct elements must be returned
- : Single-element array trivially satisfies
- Uniform frequencies (e.g., , ): Prevented by uniqueness guarantee
2. Intuition Scaffolding
Real-World Metaphor
A music service counts song plays. With 100,000 plays and 20,001 distinct songs, find the top k
played songs without full sorting (analogous to frequency counts and partial sorting).
Gaming Analogy
In an RPG, enemy types appear with varying frequency. Log all encounters (array) and prioritize defenses for the top `k$ most common enemies. Bounded enemy types (20,001) enable efficient counting.
Math Analogy
Find modes of a multiset. With bounded element domain, reduce to frequency counting and -selection on frequency space.
Common Pitfalls
- Full frequency sort (): Valid due to bounded but violates follow-up if
- Max-heap all elements (): Wasted work vs. min-heap of size
- Ignoring uniqueness guarantee: Unnecessary tie-breaking logic
- Bucket index overflow: Frequencies , so buckets suffice
- Premature optimization: For bounded , simple sort beats complex selection
3. Approach Encyclopedia
Approach 1: Hash Map + Sorting
def topKFrequent(nums, k):
freq = {}
for num in nums: # O(n): Build frequency map
freq[num] = freq.get(num, 0) + 1
unique_items = list(freq.items()) # O(m): Extract (num, count) pairs
# Sort by frequency descending, extract nums
unique_items.sort(key=lambda x: x[1], reverse=True) # O(m log m)
return [item[0] for item in unique_items[:k]] # O(k)
Complexity Proof
- Time:
Since (bounded), , total - Space: for map and list
Visualization
nums = [1,1,1,2,2,3], k=2
freq: {1:3, 2:2, 3:1}
Pairs: [(1,3), (2,2), (3,1)] → Sorted: [(1,3), (2,2), (3,1)]
Output: [1,2] (first k elements)
Approach 2: Min-Heap of Size k
import heapq
def topKFrequent(nums, k):
freq = {}
for num in nums: # O(n): Frequency map
freq[num] = freq.get(num, 0) + 1
heap = []
for num, count in freq.items(): # O(m): Process distinct items
heapq.heappush(heap, (count, num)) # Push (frequency, num)
if len(heap) > k:
heapq.heappop(heap) # Remove smallest if > k elements
return [num for count, num in heap] # O(k)
Complexity Proof
- Time:
, , , so - Space: (map + heap)
Visualization
freq: {1:3, 2:2, 3:1}, k=2
Heap after 1: [(3,1)]
Heap after 2: [(2,2), (3,1)] (min-heap: (2,2) at root)
Add 3: [(1,3), (3,1), (2,2)] → pop (1,3) → [(2,2), (3,1)]
Output: [2,1] (order irrelevant)
Approach 3: Bucket Sort (Optimal)
def topKFrequent(nums, k):
freq = {}
for num in nums: # O(n): Frequency map
freq[num] = freq.get(num, 0) + 1
n = len(nums)
buckets = [[] for _ in range(n+1)] # 0..n frequency buckets
for num, count in freq.items(): # O(m): Assign to buckets
buckets[count].append(num)
res = []
for count in range(n, 0, -1): # Descend frequencies
for num in buckets[count]: # Empty buckets skipped
res.append(num)
if len(res) == k: # Terminate early
return res
return res
Complexity Proof
- Time: (map + bucket fill + bucket traversal)
- Space: (buckets array size , total elements )
Visualization
freq: {1:3, 2:2, 3:1} → Buckets:
index: 0 1 2 3 4 5 6
[ ][3][2][1][ ][ ][ ]
Traverse: index6..4 → skip, index3 → add 1 → res=[1]
index2 → add 2 → res=[1,2] → return
4. Code Deep Dive (Bucket Sort)
def topKFrequent(nums, k):
freq = {} # 1. Frequency dictionary
for num in nums: # 2. O(n) frequency count
freq[num] = freq.get(num, 0) + 1 # get() handles missing keys
n = len(nums)
buckets = [[] for _ in range(n+1)] # 3. Buckets: index = frequency
for num, count in freq.items(): # 4. O(m) bucket assignment
buckets[count].append(num) # Append num to its frequency bucket
res = []
for count in range(n, 0, -1): # 5. Descend from max frequency
for num in buckets[count]: # 6. Process entire bucket
res.append(num) # Append numbers at this frequency
if len(res) == k: # 7. Early termination at k elements
return res # Uniqueness ensures no partial bucket
return res # Fallback (k=0 impossible by constraints)
Edge Case Handling
- Example 1 (k=2):
Line 2:freq
becomes{1:3, 2:2, 3:1}
Line 4:buckets[3]=[1]
,buckets[2]=[2]
,buckets[1]=[3]
Line 7: Loopcount=6→3
: Atcount=3
,res=[1]
count=2
: Append2
→res=[1,2]
→ return - Example 2 (k=1):
freq={1:1}
,buckets[1]=[1]
→res=[1]
at first iteration - k = m (all distinct):
All numbers inbuckets[1]
→ appended in order untilres
reaches size
5. Complexity War Room
Hardware-Aware Analysis
- elements:
- Frequency map: time, $\approx 100$ns/element → 10 ms total
- Buckets: Array of 100,001 pointers ( 800 KB), fits in L2/L3 cache
- Traversal: with cache locality → $\sim$5 ms
- Total memory: (buckets) + (map) 1.2 MB, negligible
Industry Comparison Table
Approach | Time | Space | Readability | Interview Viability |
---|---|---|---|---|
Brute Force | 10 | ❌ (fails ) | ||
Hash Map + Sort | 9 | ✅ (m bounded) | ||
Min-Heap | 8 | ✅ (m bounded) | ||
Bucket Sort | 9 | ✅✅ Optimal |
6. Pro Mode Extras
Variants
- Non-Unique Answer (Ties):
- Problem: Return all elements with frequency
- Solution: In bucket sort, collect entire bucket at after elements
- Streaming Data:
- Maintain frequency map + min-heap of size
- On each element: update map, recompute top- in
- Top K in Sliding Window:
- LC 347 + sliding window constraints
- Use frequency map + heap with window-removals
Interview Cheat Sheet
- First Mention: Time/Space complexity
- Constraints: Leverage bounded or uniqueness
- Verification: Walk through examples
- Optimization Path:
Brute Force → Hash Sort → Heap → Bucket Sort - Edge Guards: , ,