#4 - Median of Two Sorted Arrays
Median of Two Sorted Arrays
- Difficulty: Hard
- Topics: Array, Binary Search, Divide and Conquer
- Link: https://leetcode.com/problems/median-of-two-sorted-arrays/
Problem Description
Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.
The overall run time complexity should be O(log (m+n)).
Example 1:
Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000
Explanation: merged array = [1,2,3] and median is 2.
Example 2:
Input: nums1 = [1,2], nums2 = [3,4]
Output: 2.50000
Explanation: merged array = [1,2,3,4] and median is (2 + 3) / 2 = 2.5.
Constraints:
nums1.length == mnums2.length == n0 <= m <= 10000 <= n <= 10001 <= m + n <= 2000-106 <= nums1[i], nums2[i] <= 106
Solution
1. Problem Deconstruction (500+ words)
Technical Restatement
Given two sorted arrays A[0..m-1] and B[0..n-1], compute the median of the combined sorted sequence C formed by merging A and B. The median is defined as:
- If
(m+n)is odd:C[(m+n)//2](0-indexed). - If
(m+n)is even:(C[(m+n)//2 - 1] + C[(m+n)//2]) / 2.
The algorithm must achieveO(log(m+n))time complexity without explicitly merging the arrays.
Beginner-Friendly Restatement
We have two sorted lists of numbers. We want the middle number when both lists are combined into one big sorted list. If the total count is odd, the middle number is the median. If even, the median is the average of the two middle numbers. We must find this quickly without actually combining the lists (which would be slow for large lists).
Mathematical Restatement
Let A and B be sorted arrays of lengths m and n. Define k = (m+n)//2 (floor division). The median M is:
- Odd case:
M = f(k)wherefis thek-th order statistic ofA ∪ B. - Even case:
M = (f(k-1) + f(k))/2.
Equivalently, find indicesi(inA) andj(inB) such that:
i + j = (m+n+1)//2(left partition size).max(A[i-1], B[j-1]) ≤ min(A[i], B[j]).
ThenM = { max(A[i-1], B[j-1]) if odd; (max(A[i-1], B[j-1]) + min(A[i], B[j]))/2 if even }.
Constraint Analysis
nums1.length == m,nums2.length == n: Merely defines notation. No hidden impact.0 ≤ m, n ≤ 1000: Although small, the requiredO(log(m+n))complexity forces a logarithmic algorithm. A linear solution would pass but violate the stated requirement.1 ≤ m+n ≤ 2000: Ensures at least one element exists. Edge cases: one array empty (m=0orn=0), the other non-empty.-10^6 ≤ nums1[i], nums2[i] ≤ 10^6: Values can be negative or zero. Median may be negative. Comparisons must handle negatives; sentinel values (e.g.,-∞,+∞) must be chosen outside this range.- Sorted arrays: Fundamental property enabling binary search and partition-based approaches.
O(log(m+n))time: Excludes linear merge or two-pointer scans. Suggests binary search or divide-and-conquer.
2. Intuition Scaffolding
Analogies
- Real-World Metaphor: Two stacks of graded exams sorted by score. To find the median score without shuffling them together, repeatedly compare the middle exam of each stack and discard the lower half of the stack whose middle score is smaller, because the median cannot be there.
- Gaming Analogy: Two sorted leaderboards for different game servers. To find the combined median rank, use a “tournament elimination” strategy: compare the median ranks of each leaderboard and eliminate the lower half of the leaderboard with the smaller median, since those players cannot be near the overall median.
- Math Analogy: Finding the median is equivalent to finding the -th smallest element for . In two sorted arrays, this reduces to a search problem where we can discard elements in each step by comparing the -th elements of each array.
Common Pitfalls
- Merging then median: Obvious but time and space.
- Two-pointer merge without storage: Still time.
- Averaging individual medians: Incorrect; the overall median is not necessarily the average of each array’s median.
- Binary search misalignment: Forgetting to adjust the partition in the second array when partitioning the first.
- Off-by-one in partition indices: Incorrect left/right boundary handling leads to missing elements or infinite loops.
- Ignoring empty arrays: Failing to handle cases where one array is empty.
- Incorrect handling of even/odd totals: Confusing floor/ceiling when computing left partition size.
3. Approach Encyclopedia
Approach 1: Brute Force Merge
- Pseudocode:
function medianBrute(A, B): C = merge(A, B) // standard merge total = len(C) if total % 2 == 1: return C[total//2] else: return (C[total//2 - 1] + C[total//2]) / 2 - Complexity Proof:
- Time: Merging requires visiting each element once → .
- Space: New array of size → .
- Visualization:
A: [1, 3] B: [2] Merge: [1, 2, 3] Index: 0 1 2 Total=3 (odd) → median at index 1 → 2.
Approach 2: Two-Poiter Merge (Optimized Space)
- Pseudocode:
function medianTwoPointer(A, B): m, n = len(A), len(B) total = m + n i = j = 0 prev = curr = 0 for k in range(total//2 + 1): prev = curr if i < m and (j >= n or A[i] ≤ B[j]): curr = A[i]; i += 1 else: curr = B[j]; j += 1 return curr if total % 2 == 1 else (prev + curr) / 2 - Complexity Proof:
- Time: Loop runs times → .
- Space: .
- Visualization:
A: [1,2], B: [3,4], total=4, half=2 → need 3 iterations. Iter1: prev=0, curr=1 (A[0]) Iter2: prev=1, curr=2 (A[1]) Iter3: prev=2, curr=3 (B[0]) Stop. Even total → (2+3)/2 = 2.5.
Approach 3: Binary Search on Partition (Optimal)
- Core Insight: Partition both arrays so that left half contains the first elements and every element on the left ≤ every element on the right.
- Pseudocode:
function medianOptimal(A, B): if len(A) > len(B): swap A, B m, n = len(A), len(B) total = m + n half = (total + 1) // 2 left, right = 0, m while left ≤ right: i = (left + right) // 2 // A partition j = half - i // B partition A_left = A[i-1] if i>0 else -∞ A_right = A[i] if i<m else ∞ B_left = B[j-1] if j>0 else -∞ B_right = B[j] if j<n else ∞ if A_left ≤ B_right and B_left ≤ A_right: if total % 2 == 1: return max(A_left, B_left) else: return (max(A_left, B_left) + min(A_right, B_right)) / 2 elif A_left > B_right: right = i - 1 else: left = i + 1 raise Error - Complexity Proof:
- Time: Binary search on smaller array of length → .
- Space: .
- Visualization:
Example: A=[1,3], B=[2] m=2, n=1, total=3, half=2. left=0, right=2 → i=1, j=1. A_left=1, A_right=3, B_left=2, B_right=∞. Condition: 1≤∞ and 2≤3 → true. Total odd → max(1,2)=2.
4. Code Deep Dive
def findMedianSortedArrays(nums1, nums2):
# Ensure nums1 is the smaller array for minimal binary search steps
if len(nums1) > len(nums2):
nums1, nums2 = nums2, nums1
m, n = len(nums1), len(nums2)
total = m + n
half = (total + 1) // 2 # left partition size (ceil)
left, right = 0, m
while left <= right:
i = (left + right) // 2 # partition point in nums1
j = half - i # partition point in nums2
# Handle boundaries: use -inf/inf for empty partitions
nums1_left = nums1[i-1] if i > 0 else float('-inf')
nums1_right = nums1[i] if i < m else float('inf')
nums2_left = nums2[j-1] if j > 0 else float('-inf')
nums2_right = nums2[j] if j < n else float('inf')
# Check partition correctness
if nums1_left <= nums2_right and nums2_left <= nums1_right:
# Valid partition found
if total % 2 == 1:
# Odd total: median is max of left side
return max(nums1_left, nums2_left)
else:
# Even total: median is average of max left and min right
return (max(nums1_left, nums2_left) + min(nums1_right, nums2_right)) / 2
elif nums1_left > nums2_right:
# nums1's left element too large → move partition left
right = i - 1
else:
# nums2's left element too large → move partition right
left = i + 1
# Unreachable for valid sorted inputs
raise ValueError("Input arrays are not sorted or invalid.")
Edge Case Handling
- Empty array: If
nums1is empty (m=0), binary search loop runs withleft=0, right=0.i=0,j=half. Boundary values becomenums1_left=-inf,nums1_right=inf, andnums2_left,nums2_rightfromnums2. Condition satisfied, median computed fromnums2alone. - All elements in one array smaller: Example
A=[1,2],B=[3,4]. Algorithm eventually partitionsAentirely into left (i=2,j=0), givingnums1_left=2,nums2_left=-inf,nums1_right=inf,nums2_right=3. Condition holds, even total →(max(2,-inf)+min(inf,3))/2 = (2+3)/2. - Duplicate values: Condition uses
≤, ensuring correctness when equal values appear across partitions.
5. Complexity War Room
Hardware-Aware Analysis
- The algorithm uses constant memory (~6 integers, 4 floats) → fits in CPU registers or L1 cache.
- For max
m+n=2000, binary search depth ≤ iterations. Each iteration does 4 array accesses (indexed), which are sequential and cache-friendly. - Performance: On a 3GHz CPU, 10 iterations × ~10 cycles/iteration ≈ 100 cycles → < 1 microsecond.
Industry Comparison Table
| Approach | Time | Space | Readability | Interview Viability |
|---|---|---|---|---|
| Brute Merge | 10/10 | ❌ Fails requirement | ||
| Two-Pointer | 9/10 | ❌ Not logarithmic | ||
| Binary Search Partition | 7/10 | ✅ Optimal, expected |
6. Pro Mode Extras
Variants
- k-th Smallest in Two Sorted Arrays:
- Replace
halfwithk. Adjust boundary checks similarly. - Pseudocode:
function kthSmallest(A, B, k): if len(A) > len(B): swap A, B m, n = len(A), len(B) left, right = 0, m while left ≤ right: i = (left + right) // 2 j = k - i ... # same boundary logic if A_left ≤ B_right and B_left ≤ A_right: return max(A_left, B_left) elif A_left > B_right: right = i - 1 else: left = i + 1
- Replace
- Median of Three Sorted Arrays:
- Extend partition idea: choose two arrays to partition, derive third partition via remaining count. Complexity with careful implementation.
- Online Median with Insertions:
- Use two heaps (min-heap for right half, max-heap for left half). Insertion , median .
Interview Cheat Sheet
- Clarify: Confirm arrays are sorted, can be empty, contain negatives.
- Brute First: Mention merge approach, then explain why it’s suboptimal.
- Key Insight: Partitioning to avoid merging, binary search on smaller array.
- Walk Through Example: Use simple cases (e.g.,
[1,3],[2]) to demonstrate partition logic. - Edge Cases: Empty arrays, all elements in one array, duplicates, even/odd lengths.
- Complexity: Always state time and space, justify logarithmic behavior.
- Test: Verbally run through provided examples and edge cases.