Kth Largest Element in an Array

Difficulty: Medium
Topics: Array, Divide and Conquer, Sorting, Heap (Priority Queue), Quickselect
Link: https://leetcode.com/problems/kth-largest-element-in-an-array/

Problem Description

Given an integer array nums and an integer k, return the kth largest element in the array.

Note that it is the kth largest element in the sorted order, not the kth distinct element.

Can you solve it without sorting?

Example 1:

Input: nums = [3,2,1,5,6,4], k = 2
Output: 5

Example 2:

Input: nums = [3,2,3,1,2,4,5,5,6], k = 4
Output: 4

Constraints:

1 <= k <= nums.length <= 105
-104 <= nums[i] <= 104

Solution

1. Problem Deconstruction (500+ words)

Technical Restatement

Given an array nums of n integers and an integer k where 1 ≤ k ≤ n, determine the element that would occupy the k-th position (1-indexed) if the array were sorted in non-increasing order. Formally, return the element x such that exactly k-1 elements in nums are greater than or equal to x, and at least k elements are greater than or equal to x (accounting for duplicates). The solution must avoid explicit full sorting of the array to achieve better than O(n log n) time complexity in the average case.

Beginner-Friendly Restatement

Imagine you have a list of numbers, and you want to find the number that is the k-th biggest in the list. For example, if k = 2, you want the second biggest number. If multiple copies of the same number exist, they all count separately (so the second biggest in [5,5,4,3] is 5, not 4). The challenge is to find this number without first sorting the entire list from largest to smallest.

Mathematical Restatement

Let S = {x₁, x₂, ..., xₙ} be a multiset of integers. Define the order statistic order(S, k) as the element y ∈ S such that:

|{x ∈ S : x > y}| ≤ k - 1   and   |{x ∈ S : x ≥ y}| ≥ k

Equivalently, if we define the rank of an element as its position in a sorted descending list (with ties handled arbitrarily but consistently), then we seek the element with rank k. This can be expressed as:

y = argmax_{z ∈ S} { z : |{x ∈ S : x > z}| < k }

or more procedurally: find the element that partitions the multiset into two groups: those greater than it (less than k of them) and those less than or equal to it (at least n - k + 1 of them).

Constraint Analysis

1 <= k <= nums.length <= 10⁵
- The array can be large (up to 100,000 elements), ruling out O(n²) algorithms like brute-force pairwise comparisons or bubble-sort-based approaches.
- The value of k ranges from the smallest (1, meaning we want the maximum) to the largest (n, meaning we want the minimum). This implies that any algorithm must handle both extremes efficiently without special cases.
- The constraint k ≤ n ensures the problem is well-defined; edge cases where k = n (minimum) and k = 1 (maximum) are valid and must be handled correctly.
-10⁴ <= nums[i] <= 10⁴
- The elements are bounded integers, which allows for counting sort or radix sort in O(n + range) time, where range = 20001. This could be an alternative to comparison-based methods, but note that n can be as large as 10⁵, and range is 2×10⁴, so O(n + range) ≈ 10⁵ + 2×10⁴ = 1.2×10⁵ operations, which is feasible. However, the problem’s intent is to explore comparison-based selection algorithms.
- The presence of negative numbers requires algorithms that handle signed integers correctly; for example, in radix sort, we must handle two’s complement or offset the values.
- The relatively small range suggests that bucket-based approaches could be efficient, but they are not comparison-based and may use extra space proportional to the range.
Hidden Edge Cases
- All elements identical: e.g., nums = [5,5,5,5], k = 2. The 2nd largest is still 5, and algorithms must handle duplicate values without infinite recursion or incorrect partitioning.
- Already sorted arrays (ascending or descending): Algorithms like quickselect can degrade to O(n²) if the pivot selection is poor and the array is sorted.
- k = 1 or k = n: These are the maximum and minimum, which can be found in O(n) with a simple scan, but a general algorithm should still work efficiently.
- Extremely large n with small k: For example, n = 10⁵, k = 5. A min-heap of size k would be very efficient (O(n log k) with small log k), while quickselect might be overkill.
- Array with negative and positive values: Partitioning must correctly handle negative pivots.

2. Intuition Scaffolding

Real-World Metaphor

Imagine you are a coach selecting the k-th best player from a large tryout pool. You don’t have time to rank all players from best to worst (full sorting). Instead, you:

Quickselect approach: Randomly pick a player as a “benchmark” (pivot), then split the group into those better and worse than the benchmark. If the benchmark ends up being the k-th best, you’re done. Otherwise, you recursively search in the appropriate subgroup.
Heap approach: Keep a “top k” list (like a leaderboard) that only retains the best k players seen so far. After scanning all players, the smallest on the leaderboard is the k-th best.

Gaming Analogy

In a battle royale game with 100 players, you want to know the k-th highest score on the leaderboard. Instead of sorting all scores (which would take time), the game server might:

Use a min-heap of size k to track the top k scores in real-time as players finish.
Or, at the end of the game, use a quickselect algorithm to find the k-th highest score without sorting the entire list, which is faster for generating rewards for the top k players.

Math Analogy

Think of the array as a set of points on the number line. The k-th largest element is the point such that exactly k-1 points lie to its right (greater values) and the rest lie to its left (smaller or equal). This is analogous to finding the (n-k+1)-th order statistic in ascending order, which is a classic problem in computational statistics often solved by the selection algorithm.

Common Pitfalls

Using full sort: The most straightforward solution is to sort the array and index the k-th largest, which takes O(n log n) time. While acceptable for small n, it is suboptimal for large n and fails to meet the “solve it without sorting” challenge.
Using a max-heap of all elements: Building a max-heap and extracting k times takes O(n + k log n), which for small k is O(n log n) due to heap construction. This is effectively sorting in disguise.
Bubble sort for k iterations: Running k passes of bubble sort to bring the k largest to the top takes O(kn), which for k = n/2 becomes O(n²).
Assuming distinct elements: The problem states “not the kth distinct element,” so duplicates must be handled correctly. For example, in [3,3,3,2], the 2nd largest is 3, not 2.
Poor pivot selection in quickselect: Choosing the first or last element as pivot can lead to O(n²) time on sorted or nearly sorted arrays.
Ignoring k = n or k = 1: Special cases that can be optimized with a single scan (O(n)), but a general algorithm should not slow down for these cases.

3. Approach Encyclopedia

Approach 1: Brute Force with Partial Sorting (Bubble Sort variant)

Idea: Perform k passes of a bubble-sort-like algorithm to bring the k largest elements to the front.
Pseudocode:

function kthLargest(nums, k):
    for i = 0 to k-1:
        for j = 0 to n-2-i:
            if nums[j] > nums[j+1]:  # Compare to move larger elements right?
                swap(nums[j], nums[j+1])
    # After k passes, the k largest are at the end?
    # Actually, we want the k-th largest, so we can do a variant that moves the largest to the end.
    # Let's do a variant that finds the maximum in each pass and moves it to the end.
    for i = 0 to k-1:
        max_index = 0
        for j = 0 to n-1-i:
            if nums[j] > nums[max_index]:
                max_index = j
        swap(nums[max_index], nums[n-1-i])
    return nums[n-k]

Complexity:

Time: O(kn) comparisons. In the worst case (k = n), it becomes O(n²).
Space: O(1) in-place.
Visualization:

Initial: [3,2,1,5,6,4], k=2
Pass 1: Find max (6) and swap with last: [3,2,1,5,4,6]
Pass 2: Find max in first 5 elements (5) and swap with second last: [3,2,1,4,5,6]
Now nums[n-k] = nums[6-2] = nums[4] = 5.

Approach 2: Sorting

Idea: Sort the array in descending order and pick the (k-1)-th index.
Pseudocode:

function kthLargest(nums, k):
    sort(nums, descending=True)
    return nums[k-1]

Complexity:

Time: O(n log n) for comparison sort.
Space: O(1) for in-place sort, or O(n) for stable sort.

Approach 3: Min-Heap of size k

Idea: Maintain a min-heap of size k. The root of the heap will be the k-th largest element seen so far.
Pseudocode:

function kthLargest(nums, k):
    heap = new MinHeap()
    for num in nums:
        if heap.size() < k:
            heap.push(num)
        else if num > heap.peek():
            heap.pop()
            heap.push(num)
    return heap.peek()

Complexity:

Time: O(n log k) because each heap operation is O(log k) and we do it for n elements.
Space: O(k) for the heap.
Visualization:

nums = [3,2,1,5,6,4], k=2
Step 1: heap = [3]  
Step 2: heap = [2,3] (min at root=2)  
Step 3: 1 < 2, ignore  
Step 4: 5 > 2, pop 2, push 5 → heap = [3,5] (root=3)  
Step 5: 6 > 3, pop 3, push 6 → heap = [5,6] (root=5)  
Step 6: 4 < 5, ignore  
Result: root = 5

Approach 4: Max-Heap of all elements

Idea: Build a max-heap of all elements and extract the maximum k times.
Pseudocode:

function kthLargest(nums, k):
    heap = buildMaxHeap(nums)  # O(n)
    for i = 0 to k-1:
        result = heap.extractMax()  # O(log n)
    return result

Complexity:

Time: O(n + k log n). For k << n, this is O(n log n) due to heap construction.
Space: O(1) if built in-place.

Approach 5: Quickselect (Hoare’s selection algorithm)

Idea: Use the partition step from quicksort to recursively narrow down the position of the k-th largest.
Pseudocode:

function kthLargest(nums, k):
    left = 0, right = len(nums)-1
    while True:
        pivot_index = partition(nums, left, right)
        if pivot_index == k-1:  // k-th largest is at index k-1 in descending order
            return nums[pivot_index]
        else if pivot_index < k-1:
            left = pivot_index + 1
        else:
            right = pivot_index - 1

function partition(nums, left, right):
    pivot = nums[right]  // or choose random pivot
    i = left
    for j = left to right-1:
        if nums[j] >= pivot:  // descending order
            swap(nums[i], nums[j])
            i++
    swap(nums[i], nums[right])
    return i

Complexity:

Time: Average O(n), worst-case O(n²) with poor pivot choices.
Space: O(1) in-place, but recursion stack O(log n) on average.
Visualization (using example [3,2,1,5,6,4], k=2, pivot=4):

Partition around 4: 
Compare 3<4? No, so keep on left? Actually we want descending, so move elements >= pivot to left.
After partition: [5,6,4,3,2,1] with pivot at index 2 (0-based). 
k-1 = 1, so pivot_index (2) > 1, search left half [5,6]? Actually we now search left part from 0 to 1.
Next partition on [5,6] with pivot=6: result [6,5], pivot_index=0.
Now pivot_index (0) < 1, so search right part from 1 to 1: base case returns nums[1]=5.

Approach 6: Counting Sort (due to bounded input range)

Idea: Use the constraint on element values to count frequencies.
Pseudocode:

function kthLargest(nums, k):
    min_val = -10000, max_val = 10000
    count = array of zeros of size (max_val - min_val + 1)
    for num in nums:
        count[num - min_val]++
    remaining = k
    for val from max_val down to min_val:
        remaining -= count[val - min_val]
        if remaining <= 0:
            return val

Complexity:

Time: O(n + range) where range = 20001.
Space: O(range) for the count array.

Approach 7: Binary Search on Value Space

Idea: Use binary search on the value range to find the k-th largest. For a candidate value mid, count how many elements are >= mid. Adjust search based on count.
Pseudocode:

function kthLargest(nums, k):
    left = min(nums), right = max(nums)
    while left < right:
        mid = (left + right + 1) // 2  // bias to right
        count = countGreaterOrEqual(nums, mid)
        if count >= k:
            left = mid
        else:
            right = mid - 1
    return left

function countGreaterOrEqual(nums, target):
    count = 0
    for num in nums:
        if num >= target:
            count++
    return count

Complexity:

Time: O(n log (max_val - min_val)) = O(n log 20001) ≈ 15n operations.
Space: O(1).

4. Code Deep Dive

We’ll focus on the quickselect approach with random pivot for average O(n) time, O(1) space.

import random

def findKthLargest(nums, k):
    """
    Returns the k-th largest element in nums using quickselect.
    """
    def partition(left, right, pivot_index):
        """
        Partition the array between left and right (inclusive) around the pivot at pivot_index.
        Returns the final index of the pivot after partition.
        Elements are arranged in descending order.
        """
        pivot_value = nums[pivot_index]
        # Move pivot to the end
        nums[pivot_index], nums[right] = nums[right], nums[pivot_index]
        store_index = left
        for i in range(left, right):
            if nums[i] > pivot_value:  # Note: we want descending order
                nums[store_index], nums[i] = nums[i], nums[store_index]
                store_index += 1
        # Move pivot to its final place
        nums[right], nums[store_index] = nums[store_index], nums[right]
        return store_index

    def select(left, right, k_smallest):
        """
        Returns the k_smallest-th smallest element in the subarray left..right.
        Here k_smallest is 0-indexed (0 means the smallest, i.e., largest in original).
        Actually we want k-th largest, which is (k-1)-th in descending order.
        So we will use k_smallest = k-1.
        """
        if left == right:
            return nums[left]
        
        # Random pivot
        pivot_index = random.randint(left, right)
        pivot_index = partition(left, right, pivot_index)
        
        if k_smallest == pivot_index:
            return nums[k_smallest]
        elif k_smallest < pivot_index:
            return select(left, pivot_index - 1, k_smallest)
        else:
            return select(pivot_index + 1, right, k_smallest)
    
    # k-th largest is the (k-1)-th index in descending order (0-indexed)
    return select(0, len(nums) - 1, k - 1)

Line-by-Line Annotations:

def findKthLargest(nums, k): - Function signature, takes list and integer k.
def partition(left, right, pivot_index): - Helper to partition subarray.
- pivot_value = nums[pivot_index] - Store pivot value.
- nums[pivot_index], nums[right] = nums[right], nums[pivot_index] - Move pivot to end to simplify loop.
- store_index = left - Tracks where to place elements greater than pivot.
- for i in range(left, right): - Iterate through subarray (excluding the pivot at the end).
- if nums[i] > pivot_value: - In descending order, we want elements greater than pivot to be on the left.
- Swap nums[store_index] and nums[i] and increment store_index.
- After loop, swap pivot from right to store_index so that pivot is in its correct position.
- Return store_index (final pivot index).
def select(left, right, k_smallest): - Recursive selection function.
- if left == right: - Base case: only one element, return it.
- pivot_index = random.randint(left, right) - Choose random pivot to avoid worst-case.
- pivot_index = partition(left, right, pivot_index) - Partition and get pivot’s final index.
- If k_smallest == pivot_index: found the desired element.
- If k_smallest < pivot_index: desired element is in the left partition (indices less than pivot).
- Else: in the right partition.
return select(0, len(nums) - 1, k - 1) - Initial call: we want the (k-1)-th index in descending order.

Edge Case Handling:

Example 1: nums = [3,2,1,5,6,4], k = 2. The algorithm will eventually return 5.
Example 2: nums = [3,2,3,1,2,4,5,5,6], k = 4. The algorithm must handle duplicates. In partition, elements equal to pivot are not moved to the left unless they are greater (but we use > not >=). This means duplicates of the pivot may end up on either side, but the pivot index will be one occurrence. However, this does not affect correctness because we are only interested in the position k-1. The algorithm still converges.
All identical elements: e.g., [5,5,5,5], k=2. The partition will always put the pivot at some index, and the count of elements greater than pivot is 0. Since we use > in partition, all elements are equal and not moved. The pivot will be placed at store_index which remains at left. Then k_smallest (1) will be greater than pivot_index (0), so we search the right part. This continues until we narrow down to the correct index. The random pivot doesn’t help much here, but the algorithm still works in O(n) because each recursion reduces the range by at least one.
k = 1 or k = n: The algorithm will still work, but note that for k = n (smallest element), we are looking for the (n-1)-th index in descending order, which is the minimum. The partition will eventually find it.

5. Complexity War Room

Hardware-Aware Analysis

For n = 10⁵ and using quickselect with random pivot, the average number of comparisons is about 2n ≈ 2×10⁵ (as per the analysis of quickselect). This fits in CPU L2/L3 cache (typically 256KB to 8MB), so memory access is fast.
The recursion depth is O(log n) on average, so the stack usage is minimal (a few KB).
In the worst case (poor pivot choices), time degrades to O(n²) = 10¹⁰ operations, which is unacceptable. But with random pivot, the probability of worst-case is extremely low.
The min-heap approach uses O(k) memory. For k = 50000, that’s 50000 integers ≈ 200KB (if 4 bytes per int), also fits in cache.

Industry Comparison Table

Approach	Time Complexity	Space Complexity	Readability	Interview Viability
Brute Force (k passes)	O(kn) = O(n²) worst	O(1)	9/10 (simple loops)	❌ Fails large n
Full Sorting	O(n log n)	O(1) or O(n)	10/10 (one-liner)	⚠️ Acceptable but not optimal
Min-Heap of size k	O(n log k)	O(k)	8/10 (requires heap knowledge)	✅ Good for small k
Max-Heap of all	O(n + k log n)	O(1)	7/10	⚠️ Worse than min-heap for small k
Quickselect	O(n) average, O(n²) worst	O(log n) recursion stack	6/10 (complex partition)	✅ Preferred if you explain randomization
Counting Sort	O(n + range)	O(range)	9/10 (simple counting)	✅ Excellent due to constraints
Binary Search on Values	O(n log range)	O(1)	7/10 (binary search on counts)	✅ Good for bounded integers

6. Pro Mode Extras

Variants Section

K-th Smallest: Change the comparison in partition to ascending order.
Multiple Queries: If we need to answer many queries for different k on the same array, sorting once and then answering each query in O(1) is better.
Streaming Data: If the array is a stream (elements coming one by one), we cannot use quickselect (which requires random access). Use a min-heap of size k to keep the top k elements seen so far.
K-th Largest in a BST: Use reverse in-order traversal.
K-th Largest in a Sorted Matrix: Use a min-heap of size k that stores elements from each row.
K-th Largest with Duplicates and Distinct Requirement: If we want the k-th largest distinct element, we can use a set to remove duplicates first.

Interview Cheat Sheet

First Mention: Always state the brute force sorting solution (O(n log n)) and then improve.
Clarify: Ask about duplicates, value range, and whether we can modify the array.
Quickselect: Explain the algorithm and mention that with random pivot, the average time is O(n) but worst-case is O(n²). Suggest using median-of-medians for guaranteed O(n) but it’s complex.
Heap Solution: For small k, the min-heap solution is easier to implement and explain.
Constraints: If the values are bounded, counting sort is the fastest (O(n)).
Practice: Be ready to write bug-free partition code. Common mistakes: infinite recursion, off-by-one errors in indices, not handling duplicates correctly.

Final Recommendation:

For interviews, the min-heap solution is safe and demonstrates knowledge of data structures.
For production with bounded integers, counting sort is robust and fast.
For a challenge, quickselect with random pivot is elegant and efficient on average.

# Final optimal solution considering constraints (counting sort)
def findKthLargest(nums, k):
    min_val, max_val = -10000, 10000
    count = [0] * (max_val - min_val + 1)
    for num in nums:
        count[num - min_val] += 1
    remaining = k
    for num in range(max_val, min_val - 1, -1):
        remaining -= count[num - min_val]
        if remaining <= 0:
            return num
    return -1  # Should never reach here

#215 - Kth Largest Element in an Array

Kth Largest Element in an Array

Problem Description

Solution

1. Problem Deconstruction (500+ words)

Technical Restatement

Beginner-Friendly Restatement

Mathematical Restatement

Constraint Analysis

2. Intuition Scaffolding

Real-World Metaphor

Gaming Analogy

Math Analogy

Common Pitfalls

3. Approach Encyclopedia

Approach 1: Brute Force with Partial Sorting (Bubble Sort variant)

Approach 2: Sorting

Approach 3: Min-Heap of size k

Approach 4: Max-Heap of all elements

Approach 5: Quickselect (Hoare’s selection algorithm)

Approach 6: Counting Sort (due to bounded input range)

Approach 7: Binary Search on Value Space

4. Code Deep Dive

5. Complexity War Room

6. Pro Mode Extras

Variants Section

Interview Cheat Sheet