← ~/lc-grind/posts

#4 - Median of Two Sorted Arrays

December 24, 2025

Median of Two Sorted Arrays

Problem Description

Given two sorted arrays nums1 and nums2 of size m and n respectively, return the median of the two sorted arrays.

The overall run time complexity should be O(log (m+n)).

Example 1:


Input: nums1 = [1,3], nums2 = [2]
Output: 2.00000
Explanation: merged array = [1,2,3] and median is 2.

Example 2:


Input: nums1 = [1,2], nums2 = [3,4]
Output: 2.50000
Explanation: merged array = [1,2,3,4] and median is (2 + 3) / 2 = 2.5.

Constraints:

  • nums1.length == m
  • nums2.length == n
  • 0 <= m <= 1000
  • 0 <= n <= 1000
  • 1 <= m + n <= 2000
  • -106 <= nums1[i], nums2[i] <= 106

Solution

1. Problem Deconstruction (500+ words)

Technical Restatement
Given two sorted arrays A[0..m-1] and B[0..n-1], compute the median of the combined sorted sequence C formed by merging A and B. The median is defined as:

  • If (m+n) is odd: C[(m+n)//2] (0-indexed).
  • If (m+n) is even: (C[(m+n)//2 - 1] + C[(m+n)//2]) / 2.
    The algorithm must achieve O(log(m+n)) time complexity without explicitly merging the arrays.

Beginner-Friendly Restatement
We have two sorted lists of numbers. We want the middle number when both lists are combined into one big sorted list. If the total count is odd, the middle number is the median. If even, the median is the average of the two middle numbers. We must find this quickly without actually combining the lists (which would be slow for large lists).

Mathematical Restatement
Let A and B be sorted arrays of lengths m and n. Define k = (m+n)//2 (floor division). The median M is:

  • Odd case: M = f(k) where f is the k-th order statistic of A ∪ B.
  • Even case: M = (f(k-1) + f(k))/2.
    Equivalently, find indices i (in A) and j (in B) such that:
  1. i + j = (m+n+1)//2 (left partition size).
  2. max(A[i-1], B[j-1]) ≤ min(A[i], B[j]).
    Then M = { max(A[i-1], B[j-1]) if odd; (max(A[i-1], B[j-1]) + min(A[i], B[j]))/2 if even }.

Constraint Analysis

  • nums1.length == m, nums2.length == n: Merely defines notation. No hidden impact.
  • 0 ≤ m, n ≤ 1000: Although small, the required O(log(m+n)) complexity forces a logarithmic algorithm. A linear solution would pass but violate the stated requirement.
  • 1 ≤ m+n ≤ 2000: Ensures at least one element exists. Edge cases: one array empty (m=0 or n=0), the other non-empty.
  • -10^6 ≤ nums1[i], nums2[i] ≤ 10^6: Values can be negative or zero. Median may be negative. Comparisons must handle negatives; sentinel values (e.g., -∞, +∞) must be chosen outside this range.
  • Sorted arrays: Fundamental property enabling binary search and partition-based approaches.
  • O(log(m+n)) time: Excludes linear merge or two-pointer scans. Suggests binary search or divide-and-conquer.

2. Intuition Scaffolding

Analogies

  1. Real-World Metaphor: Two stacks of graded exams sorted by score. To find the median score without shuffling them together, repeatedly compare the middle exam of each stack and discard the lower half of the stack whose middle score is smaller, because the median cannot be there.
  2. Gaming Analogy: Two sorted leaderboards for different game servers. To find the combined median rank, use a “tournament elimination” strategy: compare the median ranks of each leaderboard and eliminate the lower half of the leaderboard with the smaller median, since those players cannot be near the overall median.
  3. Math Analogy: Finding the median is equivalent to finding the kk-th smallest element for k=(m+n)/2k=\lfloor (m+n)/2 \rfloor. In two sorted arrays, this reduces to a search problem where we can discard k/2k/2 elements in each step by comparing the k/2k/2-th elements of each array.

Common Pitfalls

  1. Merging then median: Obvious but O(m+n)O(m+n) time and space.
  2. Two-pointer merge without storage: Still O(m+n)O(m+n) time.
  3. Averaging individual medians: Incorrect; the overall median is not necessarily the average of each array’s median.
  4. Binary search misalignment: Forgetting to adjust the partition in the second array when partitioning the first.
  5. Off-by-one in partition indices: Incorrect left/right boundary handling leads to missing elements or infinite loops.
  6. Ignoring empty arrays: Failing to handle cases where one array is empty.
  7. Incorrect handling of even/odd totals: Confusing floor/ceiling when computing left partition size.

3. Approach Encyclopedia

Approach 1: Brute Force Merge

  • Pseudocode:
    function medianBrute(A, B):
        C = merge(A, B)          // standard merge
        total = len(C)
        if total % 2 == 1:
            return C[total//2]
        else:
            return (C[total//2 - 1] + C[total//2]) / 2
    
  • Complexity Proof:
    • Time: Merging requires visiting each element once → O(m+n)O(m+n).
    • Space: New array of size m+nm+nO(m+n)O(m+n).
  • Visualization:
    A: [1, 3]  
    B: [2]  
    Merge: [1, 2, 3]  
    Index:  0  1  2  
    Total=3 (odd) → median at index 1 → 2.
    

Approach 2: Two-Poiter Merge (Optimized Space)

  • Pseudocode:
    function medianTwoPointer(A, B):
        m, n = len(A), len(B)
        total = m + n
        i = j = 0
        prev = curr = 0
        for k in range(total//2 + 1):
            prev = curr
            if i < m and (j >= n or A[i] ≤ B[j]):
                curr = A[i]; i += 1
            else:
                curr = B[j]; j += 1
        return curr if total % 2 == 1 else (prev + curr) / 2
    
  • Complexity Proof:
    • Time: Loop runs total/2+1\lfloor total/2 \rfloor + 1 times → O(m+n)O(m+n).
    • Space: O(1)O(1).
  • Visualization:
    A: [1,2], B: [3,4], total=4, half=2 → need 3 iterations.
    Iter1: prev=0, curr=1 (A[0])
    Iter2: prev=1, curr=2 (A[1])
    Iter3: prev=2, curr=3 (B[0])
    Stop. Even total → (2+3)/2 = 2.5.
    

Approach 3: Binary Search on Partition (Optimal)

  • Core Insight: Partition both arrays so that left half contains the first (m+n)/2\lceil (m+n)/2 \rceil elements and every element on the left ≤ every element on the right.
  • Pseudocode:
    function medianOptimal(A, B):
        if len(A) > len(B): swap A, B
        m, n = len(A), len(B)
        total = m + n
        half = (total + 1) // 2
        left, right = 0, m
        while left ≤ right:
            i = (left + right) // 2   // A partition
            j = half - i               // B partition
            A_left  = A[i-1] if i>0 else -∞
            A_right = A[i]   if i<m else ∞
            B_left  = B[j-1] if j>0 else -∞
            B_right = B[j]   if j<n else ∞
            if A_left ≤ B_right and B_left ≤ A_right:
                if total % 2 == 1:
                    return max(A_left, B_left)
                else:
                    return (max(A_left, B_left) + min(A_right, B_right)) / 2
            elif A_left > B_right:
                right = i - 1
            else:
                left = i + 1
        raise Error
    
  • Complexity Proof:
    • Time: Binary search on smaller array of length min(m,n)\min(m,n)O(log(min(m,n)))=O(log(m+n))O(\log(\min(m,n))) = O(\log(m+n)).
    • Space: O(1)O(1).
  • Visualization:
    Example: A=[1,3], B=[2]
    m=2, n=1, total=3, half=2.
    left=0, right=2 → i=1, j=1.
    A_left=1, A_right=3, B_left=2, B_right=∞.
    Condition: 1≤∞ and 2≤3 → true.
    Total odd → max(1,2)=2.
    

4. Code Deep Dive

def findMedianSortedArrays(nums1, nums2):
    # Ensure nums1 is the smaller array for minimal binary search steps
    if len(nums1) > len(nums2):
        nums1, nums2 = nums2, nums1
    m, n = len(nums1), len(nums2)
    total = m + n
    half = (total + 1) // 2  # left partition size (ceil)

    left, right = 0, m
    while left <= right:
        i = (left + right) // 2  # partition point in nums1
        j = half - i              # partition point in nums2

        # Handle boundaries: use -inf/inf for empty partitions
        nums1_left = nums1[i-1] if i > 0 else float('-inf')
        nums1_right = nums1[i] if i < m else float('inf')
        nums2_left = nums2[j-1] if j > 0 else float('-inf')
        nums2_right = nums2[j] if j < n else float('inf')

        # Check partition correctness
        if nums1_left <= nums2_right and nums2_left <= nums1_right:
            # Valid partition found
            if total % 2 == 1:
                # Odd total: median is max of left side
                return max(nums1_left, nums2_left)
            else:
                # Even total: median is average of max left and min right
                return (max(nums1_left, nums2_left) + min(nums1_right, nums2_right)) / 2
        elif nums1_left > nums2_right:
            # nums1's left element too large → move partition left
            right = i - 1
        else:
            # nums2's left element too large → move partition right
            left = i + 1

    # Unreachable for valid sorted inputs
    raise ValueError("Input arrays are not sorted or invalid.")

Edge Case Handling

  • Empty array: If nums1 is empty (m=0), binary search loop runs with left=0, right=0. i=0, j=half. Boundary values become nums1_left=-inf, nums1_right=inf, and nums2_left, nums2_right from nums2. Condition satisfied, median computed from nums2 alone.
  • All elements in one array smaller: Example A=[1,2], B=[3,4]. Algorithm eventually partitions A entirely into left (i=2, j=0), giving nums1_left=2, nums2_left=-inf, nums1_right=inf, nums2_right=3. Condition holds, even total → (max(2,-inf)+min(inf,3))/2 = (2+3)/2.
  • Duplicate values: Condition uses , ensuring correctness when equal values appear across partitions.

5. Complexity War Room

Hardware-Aware Analysis

  • The algorithm uses constant memory (~6 integers, 4 floats) → fits in CPU registers or L1 cache.
  • For max m+n=2000, binary search depth ≤ log2(1000)10\log_2(1000) ≈ 10 iterations. Each iteration does 4 array accesses (indexed), which are sequential and cache-friendly.
  • Performance: On a 3GHz CPU, 10 iterations × ~10 cycles/iteration ≈ 100 cycles → < 1 microsecond.

Industry Comparison Table

Approach Time Space Readability Interview Viability
Brute Merge O(m+n)O(m+n) O(m+n)O(m+n) 10/10 ❌ Fails O(log)O(\log) requirement
Two-Pointer O(m+n)O(m+n) O(1)O(1) 9/10 ❌ Not logarithmic
Binary Search Partition O(log(min(m,n)))O(\log(\min(m,n))) O(1)O(1) 7/10 ✅ Optimal, expected

6. Pro Mode Extras

Variants

  1. k-th Smallest in Two Sorted Arrays:
    • Replace half with k. Adjust boundary checks similarly.
    • Pseudocode:
      function kthSmallest(A, B, k):
          if len(A) > len(B): swap A, B
          m, n = len(A), len(B)
          left, right = 0, m
          while left ≤ right:
              i = (left + right) // 2
              j = k - i
              ...  # same boundary logic
              if A_left ≤ B_right and B_left ≤ A_right:
                  return max(A_left, B_left)
              elif A_left > B_right:
                  right = i - 1
              else:
                  left = i + 1
      
  2. Median of Three Sorted Arrays:
    • Extend partition idea: choose two arrays to partition, derive third partition via remaining count. Complexity O(logplogq)O(\log p \cdot \log q) with careful implementation.
  3. Online Median with Insertions:
    • Use two heaps (min-heap for right half, max-heap for left half). Insertion O(logn)O(\log n), median O(1)O(1).

Interview Cheat Sheet

  • Clarify: Confirm arrays are sorted, can be empty, contain negatives.
  • Brute First: Mention merge approach, then explain why it’s suboptimal.
  • Key Insight: Partitioning to avoid merging, binary search on smaller array.
  • Walk Through Example: Use simple cases (e.g., [1,3], [2]) to demonstrate partition logic.
  • Edge Cases: Empty arrays, all elements in one array, duplicates, even/odd lengths.
  • Complexity: Always state time and space, justify logarithmic behavior.
  • Test: Verbally run through provided examples and edge cases.