#215 - Kth Largest Element in an Array
Kth Largest Element in an Array
- Difficulty: Medium
- Topics: Array, Divide and Conquer, Sorting, Heap (Priority Queue), Quickselect
- Link: https://leetcode.com/problems/kth-largest-element-in-an-array/
Problem Description
Given an integer array nums and an integer k, return the kth largest element in the array.
Note that it is the kth largest element in the sorted order, not the kth distinct element.
Can you solve it without sorting?
Example 1:
Input: nums = [3,2,1,5,6,4], k = 2
Output: 5
Example 2:
Input: nums = [3,2,3,1,2,4,5,5,6], k = 4
Output: 4
Constraints:
1 <= k <= nums.length <= 105-104 <= nums[i] <= 104
Solution
1. Problem Deconstruction (500+ words)
Technical Restatement
Given an array nums of n integers and an integer k where 1 ≤ k ≤ n, determine the element that would occupy the k-th position (1-indexed) if the array were sorted in non-increasing order. Formally, return the element x such that exactly k-1 elements in nums are greater than or equal to x, and at least k elements are greater than or equal to x (accounting for duplicates). The solution must avoid explicit full sorting of the array to achieve better than O(n log n) time complexity in the average case.
Beginner-Friendly Restatement
Imagine you have a list of numbers, and you want to find the number that is the k-th biggest in the list. For example, if k = 2, you want the second biggest number. If multiple copies of the same number exist, they all count separately (so the second biggest in [5,5,4,3] is 5, not 4). The challenge is to find this number without first sorting the entire list from largest to smallest.
Mathematical Restatement
Let S = {x₁, x₂, ..., xₙ} be a multiset of integers. Define the order statistic order(S, k) as the element y ∈ S such that:
|{x ∈ S : x > y}| ≤ k - 1 and |{x ∈ S : x ≥ y}| ≥ k
Equivalently, if we define the rank of an element as its position in a sorted descending list (with ties handled arbitrarily but consistently), then we seek the element with rank k. This can be expressed as:
y = argmax_{z ∈ S} { z : |{x ∈ S : x > z}| < k }
or more procedurally: find the element that partitions the multiset into two groups: those greater than it (less than k of them) and those less than or equal to it (at least n - k + 1 of them).
Constraint Analysis
-
1 <= k <= nums.length <= 10⁵- The array can be large (up to 100,000 elements), ruling out O(n²) algorithms like brute-force pairwise comparisons or bubble-sort-based approaches.
- The value of
kranges from the smallest (1, meaning we want the maximum) to the largest (n, meaning we want the minimum). This implies that any algorithm must handle both extremes efficiently without special cases. - The constraint
k ≤ nensures the problem is well-defined; edge cases wherek = n(minimum) andk = 1(maximum) are valid and must be handled correctly.
-
-10⁴ <= nums[i] <= 10⁴- The elements are bounded integers, which allows for counting sort or radix sort in O(n + range) time, where range = 20001. This could be an alternative to comparison-based methods, but note that
ncan be as large as 10⁵, and range is 2×10⁴, so O(n + range) ≈ 10⁵ + 2×10⁴ = 1.2×10⁵ operations, which is feasible. However, the problem’s intent is to explore comparison-based selection algorithms. - The presence of negative numbers requires algorithms that handle signed integers correctly; for example, in radix sort, we must handle two’s complement or offset the values.
- The relatively small range suggests that bucket-based approaches could be efficient, but they are not comparison-based and may use extra space proportional to the range.
- The elements are bounded integers, which allows for counting sort or radix sort in O(n + range) time, where range = 20001. This could be an alternative to comparison-based methods, but note that
-
Hidden Edge Cases
- All elements identical: e.g.,
nums = [5,5,5,5], k = 2. The 2nd largest is still 5, and algorithms must handle duplicate values without infinite recursion or incorrect partitioning. - Already sorted arrays (ascending or descending): Algorithms like quickselect can degrade to O(n²) if the pivot selection is poor and the array is sorted.
- k = 1 or k = n: These are the maximum and minimum, which can be found in O(n) with a simple scan, but a general algorithm should still work efficiently.
- Extremely large n with small k: For example, n = 10⁵, k = 5. A min-heap of size k would be very efficient (O(n log k) with small log k), while quickselect might be overkill.
- Array with negative and positive values: Partitioning must correctly handle negative pivots.
- All elements identical: e.g.,
2. Intuition Scaffolding
Real-World Metaphor
Imagine you are a coach selecting the k-th best player from a large tryout pool. You don’t have time to rank all players from best to worst (full sorting). Instead, you:
- Quickselect approach: Randomly pick a player as a “benchmark” (pivot), then split the group into those better and worse than the benchmark. If the benchmark ends up being the
k-th best, you’re done. Otherwise, you recursively search in the appropriate subgroup. - Heap approach: Keep a “top k” list (like a leaderboard) that only retains the best
kplayers seen so far. After scanning all players, the smallest on the leaderboard is thek-th best.
Gaming Analogy
In a battle royale game with 100 players, you want to know the k-th highest score on the leaderboard. Instead of sorting all scores (which would take time), the game server might:
- Use a min-heap of size
kto track the topkscores in real-time as players finish. - Or, at the end of the game, use a quickselect algorithm to find the
k-th highest score without sorting the entire list, which is faster for generating rewards for the topkplayers.
Math Analogy
Think of the array as a set of points on the number line. The k-th largest element is the point such that exactly k-1 points lie to its right (greater values) and the rest lie to its left (smaller or equal). This is analogous to finding the (n-k+1)-th order statistic in ascending order, which is a classic problem in computational statistics often solved by the selection algorithm.
Common Pitfalls
- Using full sort: The most straightforward solution is to sort the array and index the
k-th largest, which takes O(n log n) time. While acceptable for smalln, it is suboptimal for largenand fails to meet the “solve it without sorting” challenge. - Using a max-heap of all elements: Building a max-heap and extracting
ktimes takes O(n + k log n), which for smallkis O(n log n) due to heap construction. This is effectively sorting in disguise. - Bubble sort for k iterations: Running k passes of bubble sort to bring the k largest to the top takes O(kn), which for k = n/2 becomes O(n²).
- Assuming distinct elements: The problem states “not the kth distinct element,” so duplicates must be handled correctly. For example, in [3,3,3,2], the 2nd largest is 3, not 2.
- Poor pivot selection in quickselect: Choosing the first or last element as pivot can lead to O(n²) time on sorted or nearly sorted arrays.
- Ignoring k = n or k = 1: Special cases that can be optimized with a single scan (O(n)), but a general algorithm should not slow down for these cases.
3. Approach Encyclopedia
Approach 1: Brute Force with Partial Sorting (Bubble Sort variant)
Idea: Perform k passes of a bubble-sort-like algorithm to bring the k largest elements to the front.
Pseudocode:
function kthLargest(nums, k):
for i = 0 to k-1:
for j = 0 to n-2-i:
if nums[j] > nums[j+1]: # Compare to move larger elements right?
swap(nums[j], nums[j+1])
# After k passes, the k largest are at the end?
# Actually, we want the k-th largest, so we can do a variant that moves the largest to the end.
# Let's do a variant that finds the maximum in each pass and moves it to the end.
for i = 0 to k-1:
max_index = 0
for j = 0 to n-1-i:
if nums[j] > nums[max_index]:
max_index = j
swap(nums[max_index], nums[n-1-i])
return nums[n-k]
Complexity:
- Time: O(kn) comparisons. In the worst case (k = n), it becomes O(n²).
- Space: O(1) in-place.
Visualization:
Initial: [3,2,1,5,6,4], k=2
Pass 1: Find max (6) and swap with last: [3,2,1,5,4,6]
Pass 2: Find max in first 5 elements (5) and swap with second last: [3,2,1,4,5,6]
Now nums[n-k] = nums[6-2] = nums[4] = 5.
Approach 2: Sorting
Idea: Sort the array in descending order and pick the (k-1)-th index.
Pseudocode:
function kthLargest(nums, k):
sort(nums, descending=True)
return nums[k-1]
Complexity:
- Time: O(n log n) for comparison sort.
- Space: O(1) for in-place sort, or O(n) for stable sort.
Approach 3: Min-Heap of size k
Idea: Maintain a min-heap of size k. The root of the heap will be the k-th largest element seen so far.
Pseudocode:
function kthLargest(nums, k):
heap = new MinHeap()
for num in nums:
if heap.size() < k:
heap.push(num)
else if num > heap.peek():
heap.pop()
heap.push(num)
return heap.peek()
Complexity:
- Time: O(n log k) because each heap operation is O(log k) and we do it for n elements.
- Space: O(k) for the heap.
Visualization:
nums = [3,2,1,5,6,4], k=2
Step 1: heap = [3]
Step 2: heap = [2,3] (min at root=2)
Step 3: 1 < 2, ignore
Step 4: 5 > 2, pop 2, push 5 → heap = [3,5] (root=3)
Step 5: 6 > 3, pop 3, push 6 → heap = [5,6] (root=5)
Step 6: 4 < 5, ignore
Result: root = 5
Approach 4: Max-Heap of all elements
Idea: Build a max-heap of all elements and extract the maximum k times.
Pseudocode:
function kthLargest(nums, k):
heap = buildMaxHeap(nums) # O(n)
for i = 0 to k-1:
result = heap.extractMax() # O(log n)
return result
Complexity:
- Time: O(n + k log n). For k << n, this is O(n log n) due to heap construction.
- Space: O(1) if built in-place.
Approach 5: Quickselect (Hoare’s selection algorithm)
Idea: Use the partition step from quicksort to recursively narrow down the position of the k-th largest.
Pseudocode:
function kthLargest(nums, k):
left = 0, right = len(nums)-1
while True:
pivot_index = partition(nums, left, right)
if pivot_index == k-1: // k-th largest is at index k-1 in descending order
return nums[pivot_index]
else if pivot_index < k-1:
left = pivot_index + 1
else:
right = pivot_index - 1
function partition(nums, left, right):
pivot = nums[right] // or choose random pivot
i = left
for j = left to right-1:
if nums[j] >= pivot: // descending order
swap(nums[i], nums[j])
i++
swap(nums[i], nums[right])
return i
Complexity:
- Time: Average O(n), worst-case O(n²) with poor pivot choices.
- Space: O(1) in-place, but recursion stack O(log n) on average.
Visualization (using example [3,2,1,5,6,4], k=2, pivot=4):
Partition around 4:
Compare 3<4? No, so keep on left? Actually we want descending, so move elements >= pivot to left.
After partition: [5,6,4,3,2,1] with pivot at index 2 (0-based).
k-1 = 1, so pivot_index (2) > 1, search left half [5,6]? Actually we now search left part from 0 to 1.
Next partition on [5,6] with pivot=6: result [6,5], pivot_index=0.
Now pivot_index (0) < 1, so search right part from 1 to 1: base case returns nums[1]=5.
Approach 6: Counting Sort (due to bounded input range)
Idea: Use the constraint on element values to count frequencies.
Pseudocode:
function kthLargest(nums, k):
min_val = -10000, max_val = 10000
count = array of zeros of size (max_val - min_val + 1)
for num in nums:
count[num - min_val]++
remaining = k
for val from max_val down to min_val:
remaining -= count[val - min_val]
if remaining <= 0:
return val
Complexity:
- Time: O(n + range) where range = 20001.
- Space: O(range) for the count array.
Approach 7: Binary Search on Value Space
Idea: Use binary search on the value range to find the k-th largest. For a candidate value mid, count how many elements are >= mid. Adjust search based on count.
Pseudocode:
function kthLargest(nums, k):
left = min(nums), right = max(nums)
while left < right:
mid = (left + right + 1) // 2 // bias to right
count = countGreaterOrEqual(nums, mid)
if count >= k:
left = mid
else:
right = mid - 1
return left
function countGreaterOrEqual(nums, target):
count = 0
for num in nums:
if num >= target:
count++
return count
Complexity:
- Time: O(n log (max_val - min_val)) = O(n log 20001) ≈ 15n operations.
- Space: O(1).
4. Code Deep Dive
We’ll focus on the quickselect approach with random pivot for average O(n) time, O(1) space.
import random
def findKthLargest(nums, k):
"""
Returns the k-th largest element in nums using quickselect.
"""
def partition(left, right, pivot_index):
"""
Partition the array between left and right (inclusive) around the pivot at pivot_index.
Returns the final index of the pivot after partition.
Elements are arranged in descending order.
"""
pivot_value = nums[pivot_index]
# Move pivot to the end
nums[pivot_index], nums[right] = nums[right], nums[pivot_index]
store_index = left
for i in range(left, right):
if nums[i] > pivot_value: # Note: we want descending order
nums[store_index], nums[i] = nums[i], nums[store_index]
store_index += 1
# Move pivot to its final place
nums[right], nums[store_index] = nums[store_index], nums[right]
return store_index
def select(left, right, k_smallest):
"""
Returns the k_smallest-th smallest element in the subarray left..right.
Here k_smallest is 0-indexed (0 means the smallest, i.e., largest in original).
Actually we want k-th largest, which is (k-1)-th in descending order.
So we will use k_smallest = k-1.
"""
if left == right:
return nums[left]
# Random pivot
pivot_index = random.randint(left, right)
pivot_index = partition(left, right, pivot_index)
if k_smallest == pivot_index:
return nums[k_smallest]
elif k_smallest < pivot_index:
return select(left, pivot_index - 1, k_smallest)
else:
return select(pivot_index + 1, right, k_smallest)
# k-th largest is the (k-1)-th index in descending order (0-indexed)
return select(0, len(nums) - 1, k - 1)
Line-by-Line Annotations:
def findKthLargest(nums, k):- Function signature, takes list and integer k.def partition(left, right, pivot_index):- Helper to partition subarray.pivot_value = nums[pivot_index]- Store pivot value.nums[pivot_index], nums[right] = nums[right], nums[pivot_index]- Move pivot to end to simplify loop.store_index = left- Tracks where to place elements greater than pivot.for i in range(left, right):- Iterate through subarray (excluding the pivot at the end).if nums[i] > pivot_value:- In descending order, we want elements greater than pivot to be on the left.- Swap
nums[store_index]andnums[i]and incrementstore_index. - After loop, swap pivot from
righttostore_indexso that pivot is in its correct position. - Return
store_index(final pivot index).
def select(left, right, k_smallest):- Recursive selection function.if left == right:- Base case: only one element, return it.pivot_index = random.randint(left, right)- Choose random pivot to avoid worst-case.pivot_index = partition(left, right, pivot_index)- Partition and get pivot’s final index.- If
k_smallest == pivot_index: found the desired element. - If
k_smallest < pivot_index: desired element is in the left partition (indices less than pivot). - Else: in the right partition.
return select(0, len(nums) - 1, k - 1)- Initial call: we want the (k-1)-th index in descending order.
Edge Case Handling:
- Example 1:
nums = [3,2,1,5,6,4], k = 2. The algorithm will eventually return 5. - Example 2:
nums = [3,2,3,1,2,4,5,5,6], k = 4. The algorithm must handle duplicates. In partition, elements equal to pivot are not moved to the left unless they are greater (but we use>not>=). This means duplicates of the pivot may end up on either side, but the pivot index will be one occurrence. However, this does not affect correctness because we are only interested in the position k-1. The algorithm still converges. - All identical elements: e.g.,
[5,5,5,5], k=2. The partition will always put the pivot at some index, and the count of elements greater than pivot is 0. Since we use>in partition, all elements are equal and not moved. The pivot will be placed atstore_indexwhich remains atleft. Thenk_smallest(1) will be greater than pivot_index (0), so we search the right part. This continues until we narrow down to the correct index. The random pivot doesn’t help much here, but the algorithm still works in O(n) because each recursion reduces the range by at least one. - k = 1 or k = n: The algorithm will still work, but note that for k = n (smallest element), we are looking for the (n-1)-th index in descending order, which is the minimum. The partition will eventually find it.
5. Complexity War Room
Hardware-Aware Analysis
- For
n = 10⁵and using quickselect with random pivot, the average number of comparisons is about2n ≈ 2×10⁵(as per the analysis of quickselect). This fits in CPU L2/L3 cache (typically 256KB to 8MB), so memory access is fast. - The recursion depth is O(log n) on average, so the stack usage is minimal (a few KB).
- In the worst case (poor pivot choices), time degrades to O(n²) = 10¹⁰ operations, which is unacceptable. But with random pivot, the probability of worst-case is extremely low.
- The min-heap approach uses O(k) memory. For k = 50000, that’s 50000 integers ≈ 200KB (if 4 bytes per int), also fits in cache.
Industry Comparison Table
| Approach | Time Complexity | Space Complexity | Readability | Interview Viability |
|---|---|---|---|---|
| Brute Force (k passes) | O(kn) = O(n²) worst | O(1) | 9/10 (simple loops) | ❌ Fails large n |
| Full Sorting | O(n log n) | O(1) or O(n) | 10/10 (one-liner) | ⚠️ Acceptable but not optimal |
| Min-Heap of size k | O(n log k) | O(k) | 8/10 (requires heap knowledge) | ✅ Good for small k |
| Max-Heap of all | O(n + k log n) | O(1) | 7/10 | ⚠️ Worse than min-heap for small k |
| Quickselect | O(n) average, O(n²) worst | O(log n) recursion stack | 6/10 (complex partition) | ✅ Preferred if you explain randomization |
| Counting Sort | O(n + range) | O(range) | 9/10 (simple counting) | ✅ Excellent due to constraints |
| Binary Search on Values | O(n log range) | O(1) | 7/10 (binary search on counts) | ✅ Good for bounded integers |
6. Pro Mode Extras
Variants Section
- K-th Smallest: Change the comparison in partition to ascending order.
- Multiple Queries: If we need to answer many queries for different k on the same array, sorting once and then answering each query in O(1) is better.
- Streaming Data: If the array is a stream (elements coming one by one), we cannot use quickselect (which requires random access). Use a min-heap of size k to keep the top k elements seen so far.
- K-th Largest in a BST: Use reverse in-order traversal.
- K-th Largest in a Sorted Matrix: Use a min-heap of size k that stores elements from each row.
- K-th Largest with Duplicates and Distinct Requirement: If we want the k-th largest distinct element, we can use a set to remove duplicates first.
Interview Cheat Sheet
- First Mention: Always state the brute force sorting solution (O(n log n)) and then improve.
- Clarify: Ask about duplicates, value range, and whether we can modify the array.
- Quickselect: Explain the algorithm and mention that with random pivot, the average time is O(n) but worst-case is O(n²). Suggest using median-of-medians for guaranteed O(n) but it’s complex.
- Heap Solution: For small k, the min-heap solution is easier to implement and explain.
- Constraints: If the values are bounded, counting sort is the fastest (O(n)).
- Practice: Be ready to write bug-free partition code. Common mistakes: infinite recursion, off-by-one errors in indices, not handling duplicates correctly.
Final Recommendation:
- For interviews, the min-heap solution is safe and demonstrates knowledge of data structures.
- For production with bounded integers, counting sort is robust and fast.
- For a challenge, quickselect with random pivot is elegant and efficient on average.
# Final optimal solution considering constraints (counting sort)
def findKthLargest(nums, k):
min_val, max_val = -10000, 10000
count = [0] * (max_val - min_val + 1)
for num in nums:
count[num - min_val] += 1
remaining = k
for num in range(max_val, min_val - 1, -1):
remaining -= count[num - min_val]
if remaining <= 0:
return num
return -1 # Should never reach here