← ~/lc-grind/posts

#76 - Minimum Window Substring

July 3, 2025

Minimum Window Substring

Problem Description

Given two strings s and t of lengths m and n respectively, return the minimum window substring of s such that every character in t (including duplicates) is included in the window. If there is no such substring, return the empty string "".

The testcases will be generated such that the answer is unique.

Example 1:


Input: s = "ADOBECODEBANC", t = "ABC"
Output: "BANC"
Explanation: The minimum window substring "BANC" includes 'A', 'B', and 'C' from string t.

Example 2:


Input: s = "a", t = "a"
Output: "a"
Explanation: The entire string s is the minimum window.

Example 3:


Input: s = "a", t = "aa"
Output: ""
Explanation: Both 'a's from t must be included in the window.
Since the largest window of s only has one 'a', return empty string.

Constraints:

  • m == s.length
  • n == t.length
  • 1 <= m, n <= 105
  • s and t consist of uppercase and lowercase English letters.

Follow up: Could you find an algorithm that runs in O(m + n) time?

Solution

1. Problem Deconstruction

Technical Restatement:
Given strings s (length m) and t (length n), find the minimal-length contiguous substring w of s such that for every character c in t, the frequency of c in w is ≥ its frequency in t. If no such w exists, return "". The solution must handle m, n ≤ 1e5 efficiently.

Beginner-Friendly Restatement:
You have a long string s and a shorter string t. Find the smallest part of s that contains all letters from t in the same or greater quantities. For example, if t has two 'a’s, the part of s must have at least two 'a’s. If impossible, return an empty string.

Mathematical Formulation:
Let:

  • s[0..m1]s[0..m-1] and t[0..n1]t[0..n-1] be input strings.
  • freqt(c)\text{freq}_t(c) = frequency of cc in tt.
  • freqw(c)\text{freq}_w(c) = frequency of cc in substring s[i..j]s[i..j].
    Find i,ji, j (0ij<m0 \leq i \leq j < m) minimizing (ji+1)(j-i+1) such that:

cΣ,freqw(c)freqt(c)\forall c \in \Sigma, \quad \text{freq}_w(c) \geq \text{freq}_t(c)

If no such i,ji,j exist, return \emptyset.

Constraint Analysis:

  • m, n ≤ 1e5:
    • Limits solutions to O(mlogm)O(m \log m) or better; O(m2)O(m^2) is infeasible (101010^{10} operations).
    • Edge: When n>mn > m, return "" immediately (window cannot contain nn characters).
  • Uppercase/lowercase English letters:
    • 52 distinct characters → use fixed-size arrays (size 128) for O(1)O(1) per-operation frequency tracking.
  • Answer is unique:
    • Simplifies output; no need to handle multiple valid minimal windows.
  • n1,m1n \geq 1, m \geq 1:
    • Edge: n>mn > m → impossible.
    • Edge: tt has duplicates (e.g., tt=“aa”) → window must satisfy character counts, not just presence.

2. Intuition Scaffolding

Analogies:

  1. Real-World Metaphor:
    Collecting rare coins (t) from a conveyor belt (s). Find the shortest belt segment containing all coins in required quantities.
  2. Gaming Analogy:
    Side-scroller game: Pass through a tunnel (s) and collect power-ups (t). Find the minimal tunnel segment containing all power-ups in exact quantities.
  3. Math Analogy:
    Minimize interval [i,j][i,j] where vector freqw\text{freq}_w dominates freqt\text{freq}_t component-wise.

Common Pitfalls:

  1. Ignoring duplicates: Treating tt=“aa” as requiring only one ‘a’.
  2. Suboptimal checks: Verifying window validity by full frequency scan per window (O(52)O(52)) without incremental tracking.
  3. Premature termination: Stopping after the first valid window without shrinking to find smaller ones.
  4. Pointer errors: Shrinking the window beyond validity or moving left incorrectly.
  5. Overlooking n>mn>m: Not handling the trivial impossibility case early.

3. Approach Encyclopedia

Brute Force (Naive)

Pseudocode:

min_len = INF  
for i in range(len(s)):  
    freq_window = [0]*128  
    for j in range(i, len(s)):  
        freq_window[ord(s[j])] += 1  
        if is_valid(freq_window, t_counts):  # Check all c: freq_window[c] >= t_counts[c]  
            if (j-i+1) < min_len:  
                min_len = j-i+1  
                min_start = i  
return s[min_start:min_start+min_len] if min_len != INF else ""  

Complexity Proof:

  • Time: O(m252)O(m^2 \cdot 52)O(m2)O(m^2). For m=1e5m=1e5, m2=1010m^2=10^{10} (infeasible).
  • Space: O(1)O(1) (fixed-size arrays).
    Visualization:
s: A D O B E C O D E B A N C  
t: A B C  
Brute force checks:  
[0:0] "A"       → ✗  
[0:1] "AD"      → ✗  
...  
[0:5] "ADOBEC"  → ✓ (len=6)  
[1:5] "DOBEC"   → ✗  
...  

Optimal: Sliding Window

Pseudocode:

if n > m: return ""  
t_counts = [0]*128; required = 0  
for c in t: t_counts[ord(c)] += 1  
for count in t_counts:  
    if count > 0: required += 1  // Distinct chars in t  
window_counts = [0]*128  
left = formed = 0; min_len = INF  
for right in range(m):  
    c = s[right]; window_counts[ord(c)] += 1  
    if window_counts[ord(c)] == t_counts[ord(c)]:  
        formed += 1  // Satisfy condition for c  
    while formed == required:  // Valid window  
        min_len = min(min_len, right-left+1)  
        if (right-left+1) == min_len: min_left = left  
        d = s[left]; window_counts[ord(d)] -= 1  
        if window_counts[ord(d)] < t_counts[ord(d)]:  
            formed -= 1  // Break condition  
        left += 1  // Shrink window  
return s[min_left:min_left+min_len] if min_len != INF else ""  

Complexity Proof:

  • Time: O(m+n)O(m + n).
    • O(n)O(n): Compute t_counts and required.
    • O(m)O(m): Each character enters/exits window once → 2m2m operations.
  • Space: O(1)O(1) (two size-128 arrays).
    Visualization:
s: A D O B E C O D E B A N C  
t: A B C (t_counts: A:1, B:1, C:1; required=3)  

Step 0: [A]        → formed=1 (A satisfied)  
...  
Step 5: [ADOBEC]   → formed=3 (valid), min_len=6  
Shrink: [DOBEC]    → formed=2 (A missing) → stop  
Step 10: [DOBECODEBA] → formed=2  
Step 11: [DOBECODEBAN] → formed=2  
Step 12: [DOBECODEBANC] → formed=3 (valid)  
Shrink:  
  [OBECODEBANC] → formed=3 → min_len=11  
  ...  
  [BANC]        → formed=3 → min_len=4  
  Remove 'B'    → formed=2 → stop  
Final: "BANC" (len=4)  

4. Code Deep Dive

Optimal Solution (Python):

def minWindow(s: str, t: str) -> str:  
    m, n = len(s), len(t)  
    if n > m:  
        return ""  
    t_counts = [0] * 128  
    for c in t:  
        t_counts[ord(c)] += 1  
    required = sum(1 for count in t_counts if count > 0)  
    window_counts = [0] * 128  
    left = formed = 0  
    min_len = float('inf')  
    min_left = 0  
    for right in range(m):  
        c = s[right]  
        idx = ord(c)  
        window_counts[idx] += 1  
        if window_counts[idx] == t_counts[idx]:  
            formed += 1  
        while formed == required and left <= right:  
            if right - left + 1 < min_len:  
                min_len = right - left + 1  
                min_left = left  
            d = s[left]  
            idx_d = ord(d)  
            window_counts[idx_d] -= 1  
            if window_counts[idx_d] < t_counts[idx_d]:  
                formed -= 1  
            left += 1  
    return "" if min_len == float('inf') else s[min_left:min_left+min_len]  

Line-by-Line Annotations:

  1. if n > m: return "": Early exit if impossible.
  2. t_counts initialization: Stores frequency of each char in t.
  3. required = ...: Counts distinct characters in t.
  4. Sliding window loop (right pointer):
    • window_counts[ord(c)] += 1: Include s[right] in window.
    • if window_counts[...] == t_counts[...]: formed += 1: Tracks when a char’s count is satisfied.
  5. Inner loop (shrinking left pointer):
    • if right-left+1 < min_len: Update minimal valid window.
    • window_counts[ord(d)] -= 1: Remove s[left] from window.
    • if window_counts[...] < t_counts[...]: formed -= 1: Breaks condition if removal invalidates a char.
  6. Return result: Returns minimal window or "".

Edge Case Handling:

  • Example 3 (s="a", t="aa"): n>m → return "" (line 1).
  • Example 2 (s="a", t="a"):
    • right=0: window_counts['a']=1 == t_counts['a']=1formed=1.
    • Enter inner loop: min_len=1, min_left=0.
    • Remove ‘a’: formed=0 → return "a".
  • Duplicate handling: Track exact counts (e.g., t="aa" requires window_counts['a']≥2).

5. Complexity War Room

Hardware-Aware Analysis:

  • At m=1e5m=1e5, n=1e5n=1e5:
    • Time: O(m+n)=2×105O(m+n) = 2 \times 10^5 operations (feasible).
    • Space: Two 128-integer arrays → 1KB (fits in L1 cache).

Industry Comparison Table:

Approach Time Space Readability Interview Viability
Brute Force O(m2)O(m^2) O(1)O(1) 9/10 ❌ (Fails large mm)
Sliding Window O(m+n)O(m+n) O(1)O(1) 8/10 ✅ Optimal

6. Pro Mode Extras

Variants:

  1. At Most kk Distinct Characters:
    Find minimal window containing ≤ kk distinct chars (LC 340). Use formed = count of distinct chars.
  2. With kk Mismatches:
    Minimal window where freqw(c)freqt(c)k|\text{freq}_w(c) - \text{freq}_t(c)| \leq k for all cc.
  3. Multiple Transactions:
    Extend to kk transactions (LC 123, 188).

Interview Cheat Sheet:

  • First Step: Clarify constraints (case sensitivity? empty inputs?).
  • Key Insight: Sliding window + frequency tracking + formed counter.
  • Mention: O(m+n)O(m+n) time, O(1)O(1) space (fixed alphabet).
  • Verify: Check n>mn>m and tt with duplicates.
  • Test: Examples 1-3 during coding.