#373 - Find K Pairs with Smallest Sums
Find K Pairs with Smallest Sums
- Difficulty: Medium
- Topics: Array, Heap (Priority Queue)
- Link: https://leetcode.com/problems/find-k-pairs-with-smallest-sums/
Problem Description
You are given two integer arrays nums1 and nums2 sorted in non-decreasing order and an integer k.
Define a pair (u, v) which consists of one element from the first array and one element from the second array.
Return the k pairs (u1, v1), (u2, v2), ..., (uk, vk) with the smallest sums.
Example 1:
Input: nums1 = [1,7,11], nums2 = [2,4,6], k = 3
Output: [[1,2],[1,4],[1,6]]
Explanation: The first 3 pairs are returned from the sequence: [1,2],[1,4],[1,6],[7,2],[7,4],[11,2],[7,6],[11,4],[11,6]
Example 2:
Input: nums1 = [1,1,2], nums2 = [1,2,3], k = 2
Output: [[1,1],[1,1]]
Explanation: The first 2 pairs are returned from the sequence: [1,1],[1,1],[1,2],[2,1],[1,2],[2,2],[1,3],[1,3],[2,3]
Constraints:
1 <= nums1.length, nums2.length <= 105-109 <= nums1[i], nums2[i] <= 109nums1andnums2both are sorted in non-decreasing order.1 <= k <= 104k <= nums1.length * nums2.length
Solution
1. Problem Deconstruction (500+ words)
Technical Restatement
Given two sorted arrays nums1 (length ) and nums2 (length ) in non-decreasing order, and an integer , we must compute the lexicographically smallest pairs where when ordered by their sum . Formally, output a list of distinct index pairs such that the sequence of sums is minimal and sorted non-decreasingly. If multiple pairs have the same sum, any ordering among them satisfies the problem, though typical implementations yield a deterministic order based on iteration.
Beginner-Friendly Explanation
Imagine you have two lists of numbers, each sorted from smallest to largest. You want to pick one number from the first list and one from the second list to make a pair. The “value” of a pair is just the sum of its two numbers. Your task is to find the pairs with the smallest sums. For example, if the first list is [1,7,11] and the second is [2,4,6], the pair (1,2) has sum 3, which is the smallest possible. The next smallest are (1,4) (sum 5) and (1,6) (sum 7). So for , you return those three pairs.
Mathematical Formulation
Let and be two sorted sequences (, ). Define the Cartesian product . Let be . We seek the first elements of the sequence where sorted by in non-decreasing order, i.e., . If , we take all pairs. The output is the set of pairs (or a list) corresponding to the smallest sums.
Constraint Analysis
-
1 <= nums1.length, nums2.length <= 1e5
This immediately rules out any algorithm with complexity , because , far beyond typical time limits. We need sub‑quadratic solutions, ideally or . -
-1e9 <= nums1[i], nums2[i] <= 1e9
Values can be negative, zero, or positive. However, both arrays are sorted non‑decreasing, so the smallest sums will generally involve the smallest elements from both arrays. Negative values don’t break the logic, but we must ensure our algorithm doesn’t assume positivity. -
nums1andnums2both are sorted in non-decreasing order
This is a critical structural property that allows efficient generation of pairs in increasing sum order. Without sorting, the problem would be much harder (like finding k smallest sums among all pairs from two unsorted arrays, which could be solved with a heap but with higher complexity). The sorted property enables strategies like merging or using a min‑heap on candidate frontiers. -
1 <= k <= 10^4
is relatively small compared to (which can be up to ). This suggests we can design an algorithm that depends more on than on . For example, we can explore only the first few elements of each array, because pairs involving large indices are unlikely to be among the smallest sums. -
k <= nums1.length * nums2.length
Guarantees there are at least pairs, so we don’t need to handle the case where separately (though a robust implementation might still check for early termination). -
Hidden edge cases
- Identical elements – When both arrays contain duplicates, multiple pairs can have the same sum. The output should include all required pairs, but the exact order among equal sums may not matter (as per examples). However, we must avoid duplicate pairs if indices are considered distinct? Actually, pairs are defined by values, not indices. In Example 2,
[1,1]appears twice because there are two 1s innums1and one 1 innums2, leading to two distinct pairs (from indices (0,0) and (1,0)). So we treat pairs as distinct based on their indices, even if values are the same. - All negative numbers – The smallest sum may be very negative. Our algorithm must correctly handle negative sums.
- One very small array – If or , the problem reduces to pairing the single element of one array with the first elements of the other (or all if larger).
- – Just the pair with smallest sum, which is always
(nums1[0], nums2[0])because arrays are sorted.
- Identical elements – When both arrays contain duplicates, multiple pairs can have the same sum. The output should include all required pairs, but the exact order among equal sums may not matter (as per examples). However, we must avoid duplicate pairs if indices are considered distinct? Actually, pairs are defined by values, not indices. In Example 2,
2. Intuition Scaffolding
Analogies Using Problem Tags (Tags: Array, Heap (Priority Queue), Merge)
-
Real‑World Metaphor – Merging Sorted Lists of Distances
Imagine you are a courier company with two sorted lists: list A contains distances from your depot to customers in one city (sorted nearest to farthest), list B contains distances to customers in another city. You need to assign one driver from each city to form a team for a joint delivery. The “cost” of a team is the sum of their distances from the depot. You want the cheapest teams. Naturally, you start by pairing the nearest customers from both cities, then gradually consider slightly farther ones. -
Gaming Analogy – Crafting Items with Two Resources
In a crafting game, you have two types of resources (e.g., wood and iron), each available in sorted qualities (from common to rare). Combining one wood and one iron makes a tool. The tool’s power is the sum of the qualities. You want to craft the weakest tools first (maybe to fulfill a quest). You’d start by combining the lowest‑quality wood with the lowest‑quality iron, then explore combinations that increment either resource. -
Math Analogy – Exploring a Young Tableau
The matrix where is a Young tableau (each row and column is sorted) because both arrays are sorted. Finding the smallest sums is equivalent to selecting the smallest entries from this matrix. This connects to algorithms for extracting elements from a sorted matrix, like using a min‑heap to perform a “k‑way merge” on the rows.
Common Pitfalls (5+ Intuitive‑but‑Wrong Approaches)
-
Brute‑Force All Pairs and Sort
Generating all pairs, sorting by sum, and taking the first is correct but infeasible for large due to time and memory. -
Fix One Array, Iterate the Other
For each element innums1, pair it withnums2[0](the smallest), then somehow try to advance. This misses pairs like(nums1[0], nums2[1])whennums1[1]is much larger. Example:nums1 = [1,100],nums2 = [2,3]. The second smallest sum is(1,3), not(100,2). -
Two‑Pointer “Merge” as in Merging Two Sorted Lists
In classic merging of two sorted lists, we comparea[i]+b[j]witha[i+1]+b[j]anda[i]+b[j+1], but the sums are not necessarily monotonic in both directions simultaneously. A naive two‑pointer that moves the pointer with the smaller next sum can fail because the matrix is not totally ordered along diagonals. Example:nums1 = [1,2],nums2 = [3,4]. Starting at(0,0) = 4. Next candidates:(1,0)=5and(0,1)=5. Which to pick? Both are equal, but after picking one, the other might be skipped if we only move one pointer. -
Using a Heap Without Deduplication
If we push both(i+1, j)and(i, j+1)into the heap without checking if they have been visited, we risk pushing duplicates (when coming from different paths). This can cause exponential heap size. We need a visited set or a strategy to push each candidate only once. -
Assuming Positive Numbers and Pruning
If we assume all numbers are positive, we could limit the search to the first elements of each array, because the sum of the ‑th smallest innums1andnums2might be a bound. However, with negative numbers, the smallest sum could involve elements beyond index . Example:nums1 = [-1000, 1, 2, ..., ],nums2 = [-1000, 1, 2, ...]. The pair(-1000, -1000)is the smallest, but if we only look at first elements (say ), we’d miss it. So pruning must be careful. -
Using Binary Search on the Sum
One might try to binary search on the sum value , count how many pairs have sum , and then extract the exact pairs. Counting pairs with sum can be done in using two pointers (because rows are sorted). However, extracting the smallest pairs after finding the threshold is tricky because we need to output them in sorted order, and there may be many pairs with the same sum. This approach can be made to work but is more complex and less efficient for small .
3. Approach Encyclopedia
We’ll explore four strategies in increasing sophistication:
Approach 1: Brute Force (Naive Cartesian Product)
- Generate all pairs, compute their sums, sort by sum, and take first .
- Pseudocode:
pairs = [] for i in range(len(nums1)): for j in range(len(nums2)): pairs.append( (nums1[i] + nums2[j], nums1[i], nums2[j]) ) sort pairs by first element (sum) return first k pairs (dropping the sum) - Complexity Proof:
- Time: for sorting. The double loop is , dominating when large.
- Space: to store all pairs.
- Visualization:
nums1 = [1,7,11], nums2 = [2,4,6] Matrix of sums: 2 4 6 1 3 5 7 7 9 11 13 11 13 15 17 All 9 pairs generated, sorted by sum: (1,2)=3, (1,4)=5, (1,6)=7, (7,2)=9, ...
Approach 2: Improved Brute Force with Bounded Heap
- Instead of generating all pairs, we can use a min‑heap of size at most to keep the smallest sums seen so far. We iterate over all pairs and push into the heap, maintaining size by popping the largest when size exceeds . This avoids storing all pairs but still visits all pairs.
- Pseudocode:
import heapq max_heap = [] # store (-sum, nums1[i], nums2[j]) for max-heap effect for i in range(len(nums1)): for j in range(len(nums2)): s = nums1[i] + nums2[j] if len(max_heap) < k: heapq.heappush(max_heap, (-s, nums1[i], nums2[j])) else: # if current sum smaller than the largest in heap (i.e., -max_heap[0][0]) if s < -max_heap[0][0]: heapq.heappushpop(max_heap, (-s, nums1[i], nums2[j])) # after loops, max_heap contains k smallest pairs (but stored as negatives) convert to list and sort by sum ascending - Complexity Proof:
- Time: because each pair involves a heap operation of .
- Space: for the heap.
- Visualization:
Same matrix as before, but we only keep a heap of size . After processing first three pairs, heap contains
[(-7,1,6), (-5,1,4), (-3,1,2)](max at top is -3). For next pair (7,2)=9, 9 > 3, so ignore. Continue; only pairs with sum < current max (3) would replace, but none.
Approach 3: Min‑Heap with Frontier (Optimal for General k)
- Leverage sorted property: the smallest pair is always
(nums1[0], nums2[0]). The next smallest could be either(nums1[0], nums2[1])or(nums1[1], nums2[0]). More generally, if we have taken pair(i,j), the next candidates are(i+1, j)and(i, j+1). We can use a min‑heap to store candidate pairs, keyed by sum. To avoid duplicates, we push(i+1, j)only whenj=0(or using a visited set). A common technique: initially push(i,0)for alliinrange(min(k, m))(or vice versa). Then repeatedly pop the smallest, and push(i, j+1)ifj+1exists. - Pseudocode (Standard Solution):
if not nums1 or not nums2: return [] heap = [] # push (nums1[i] + nums2[0], i, 0) for i in range(min(k, len(nums1))) for i in range(min(k, len(nums1))): heapq.heappush(heap, (nums1[i] + nums2[0], i, 0)) result = [] while heap and len(result) < k: s, i, j = heapq.heappop(heap) result.append([nums1[i], nums2[j]]) if j + 1 < len(nums2): heapq.heappush(heap, (nums1[i] + nums2[j+1], i, j+1)) return result - Complexity Proof:
- Time: because we perform at most pops and each pop may push one new candidate (so total heap operations ), each .
- Space: for the heap.
- Visualization:
For
nums1 = [1,7,11], nums2 = [2,4,6], k=3: Initial heap:[(1+2,0,0)=3, (7+2,1,0)=9, (11+2,2,0)=13]. Pop (3,0,0) → result[[1,2]]. Push (0,1):(1+4,0,1)=5→ heap now[(5,0,1), (9,1,0), (13,2,0)]. Pop (5,0,1) → result[[1,2],[1,4]]. Push (0,2):(1+6,0,2)=7→ heap[(7,0,2), (9,1,0), (13,2,0)]. Pop (7,0,2) → result[[1,2],[1,4],[1,6]]. Done.
Approach 4: Two‑Pointer / Merge with Multiple Pointers (Advanced)
- This is similar to merging sorted lists, where each list corresponds to a row of the matrix: row is
nums1[i] + nums2[j]for . Since each row is sorted, we can perform a k‑way merge using a heap of size (or ). That’s essentially Approach 3 with initializing heap with all rows. Alternatively, we can use a two‑pointer technique that moves along diagonals, but that’s more complex and not as efficient for arbitrary .
4. Code Deep Dive
We’ll annotate the optimal solution (Approach 3) line by line.
import heapq
def kSmallestPairs(nums1, nums2, k):
"""
Return the k pairs with the smallest sums from nums1 and nums2.
"""
# Edge case: if either array is empty, no pairs can be formed.
if not nums1 or not nums2:
return []
# Initialize a min-heap. Each element is a tuple (sum, i, j)
# where i is index in nums1, j is index in nums2.
heap = []
# We start by pushing pairs that combine each nums1[i] with the smallest
# element of nums2 (nums2[0]). This ensures we have a candidate from each
# "row" of the conceptual sum matrix.
# However, we only need at most k such initial candidates, because we only
# need k results. If k is smaller than len(nums1), pushing more is wasteful.
for i in range(min(k, len(nums1))):
# sum = nums1[i] + nums2[0], index i, index 0
heapq.heappush(heap, (nums1[i] + nums2[0], i, 0))
result = []
# Continue until we have k pairs or the heap is exhausted.
while heap and len(result) < k:
# Pop the smallest sum pair from the heap.
current_sum, i, j = heapq.heappop(heap)
# Append the actual pair values to the result.
result.append([nums1[i], nums2[j]])
# If there is a next element in nums2 for the same nums1[i],
# push the next candidate from the same row.
if j + 1 < len(nums2):
next_sum = nums1[i] + nums2[j + 1]
heapq.heappush(heap, (next_sum, i, j + 1))
return result
Line‑by‑Line Explanation:
if not nums1 or not nums2:– Guard against empty inputs. No pairs possible.heap = []– Python’s heap is a min‑heap by default. We’ll store(sum, i, j).for i in range(min(k, len(nums1))):– We push at mostkinitial candidates. Why? Because the firstksmallest pairs will never involvenums1[i]fori >= kif we only needkpairs? Actually, that’s not always true when there are negative numbers. However, with non‑decreasing arrays, the smallest sums will prioritize smallnums1elements. It’s safe to limit tokbecause any pair withi >= kwill havenums1[i] >= nums1[k-1], and pairing withnums2[0]gives sum at leastnums1[k-1] + nums2[0]. But there could be smaller sums from earlieriwith largerj. Still, the algorithm remains correct because we only push initial candidates; we might miss some rows, but if a rowi >= kcould produce a sum smaller than some from rows< k, that would contradict sorting ofnums1. Formal proof: The smallest element in rowiisnums1[i] + nums2[0]. Sincenums1is sorted,nums1[i] + nums2[0] >= nums1[k-1] + nums2[0]fori >= k. So the firstkrows already contain the smallest possible “first column” sums. Any pair(i,j)withi >= khas sum>= nums1[i] + nums2[0] >= nums1[k-1] + nums2[0]. Meanwhile, the heap initially contains all rows0..k-1with their first column. As we pop, we push next columns from those rows, which can be smaller thannums1[k-1]+nums2[0]. So we won’t miss any.heapq.heappush(heap, (nums1[i] + nums2[0], i, 0))– Push initial candidate from rowi.while heap and len(result) < k:– Main loop: we stop when we havekpairs or no more candidates.current_sum, i, j = heapq.heappop(heap)– Extract the current smallest sum candidate.result.append([nums1[i], nums2[j]])– Record the pair.if j + 1 < len(nums2):– If there is a next column in the same row, push that candidate. This ensures we explore the row incrementally.heapq.heappush(heap, (next_sum, i, j + 1))– Add the next candidate from rowi.return result– Return the collected pairs.
Edge Case Handling:
- Example 1: Works as illustrated.
- Example 2 with duplicates:
nums1 = [1,1,2], nums2 = [1,2,3], k=2. Initial heap:i=0: (1+1=2,0,0),i=1: (1+1=2,1,0)(sincek=2, we only push first two rows). Heap contains two entries with sum 2. First pop gives(2,0,0)→[1,1]. Push next from row 0:(1+2=3,0,1). Heap now(2,1,0), (3,0,1). Second pop gives(2,1,0)→[1,1]. Done. Correct. - k larger than m*n: The loop will stop when heap empties before we collect k pairs. But constraint says
k <= m*n, so we’re safe. - Negative numbers: Works because sums are compared correctly.
- One very small array: If
nums1has only 1 element, we push only one initial candidate, then keep incrementingjuntil we have k pairs or run out.
5. Complexity War Room
Hardware‑Aware Analysis
- At , the heap contains at most elements. Each heap operation costs comparisons, negligible. Memory for heap: each tuple contains two integers and a sum (Python ints are objects, but overhead is about 28 bytes each). Total ~ bytes ≈ 1 MB, fits in L2/L3 cache. The algorithm is cache‑friendly because we access arrays sequentially.
- For the worst‑case , we never iterate over all pairs, only operations. This is efficient.
Industry Comparison Table
| Approach | Time Complexity | Space Complexity | Readability (1-10) | Interview Viability | Notes |
|---|---|---|---|---|---|
| Brute Force | 9 – simple loops | ❌ Fails large constraints | Only for tiny inputs | ||
| Bounded Max‑Heap | 7 – heap logic | ⚠️ May time out if huge (1e10) | Better but still slow for max bounds | ||
| Min‑Heap Frontier | 8 – requires reasoning about rows | ✅ Optimal for typical interview | Most recommended | ||
| Binary Search + Two Pointers | 6 – complex counting | ⚠️ Overkill, error‑prone | Useful if extremely large relative to |
is the range of sums.
6. Pro Mode Extras
Variants Section
-
K Smallest Sums in Two Unsorted Arrays
If arrays are not sorted, we can sort them first () then apply the same heap method. Alternatively, use a heap with all pairs, but that’s . Better to sort first. -
K Smallest Sums from Three or More Arrays
Generalization to arrays: Use a min‑heap storing tuples of indices and current sum. Start with(0,0,...,0)and generate neighbors by incrementing one index at a time. Complexity but with higher constant due to more dimensions. -
Find K Pairs with Largest Sums
Similar but use max‑heap or invert signs. Or, because arrays are sorted, the largest sums come from the ends. We can start from(m-1, n-1)and move backwards. -
LeetCode 373 vs LeetCode 378 (Kth Smallest Element in a Sorted Matrix)
Problem 378 is a special case where the matrix is square and each row and column sorted. Our problem is similar but the matrix is defined as sums of two arrays, which also yields a sorted‑row‑and‑column matrix. The same heap approach works for 378. -
Allow Repeated Elements (Same Index)?
If pairs could reuse the same element (like(nums1[i], nums1[i])), that’s a different problem. Not applicable here.
Interview Cheat Sheet
- First Step: Clarify constraints and ask about duplicates, negative numbers, and whether can exceed .
- Brute Force Mention: Briefly describe the naive solution to show understanding, then immediately explain why it’s inefficient.
- Key Insight: “Because both arrays are sorted, the pair sums form a sorted matrix (Young tableau), so we can use a min‑heap to traverse it like merging k sorted lists.”
- Initialization Trick: “We can initialize the heap with the first element of each row (or each column) to ensure we have the smallest candidate from each row.”
- Complexity: Always state time and space complexity clearly. For this problem: time, space.
- Edge Cases: Mention empty arrays, , duplicates, negative numbers.
- Testing: Walk through a small example (like given examples) to verify logic.
- Optimization Note: If is very large (close to ), the heap size may become large, but it’s still bounded by (if we initialize with rows). In practice, is at most , so fine.
Final Thought: This problem is a classic example of using a heap to traverse a implicitly defined sorted structure. Mastering it prepares you for many other “k‑smallest” problems.