← ~/lc-grind/posts

#3 - Longest Substring Without Repeating Characters

July 1, 2025

Longest Substring Without Repeating Characters

Problem Description

Given a string s, find the length of the longest substring without duplicate characters.

Example 1:


Input: s = "abcabcbb"
Output: 3
Explanation: The answer is "abc", with the length of 3.

Example 2:


Input: s = "bbbbb"
Output: 1
Explanation: The answer is "b", with the length of 1.

Example 3:


Input: s = "pwwkew"
Output: 3
Explanation: The answer is "wke", with the length of 3.
Notice that the answer must be a substring, "pwke" is a subsequence and not a substring.

Constraints:

  • 0 <= s.length <= 5 * 104
  • s consists of English letters, digits, symbols and spaces.

Solution

1. Problem Deconstruction

Rewrite the Problem

  1. Technical Definition
    Given a string s of length n, identify the maximum length L such that there exists a contiguous substring s[i:j] (where 0 ≤ i < j ≤ n) where all characters in the substring are unique. Formally:

    max{ji | k,l[i,j1], kl    s[k]s[l]}\max \left\{ j - i \ \middle| \ \forall k, l \in [i, j-1], \ k \neq l \implies s[k] \neq s[l] \right\}

  2. Beginner Explanation
    Find the longest continuous segment in a string where no character repeats. For example, in “apple”, “ap” (length=2) has no repeats, but “app” has a repeated ‘p’. The solution must efficiently scan the string even for very long inputs.

  3. Mathematical Formulation
    Let f(i, j) be a predicate indicating all characters in substring s[i..j) are distinct. The goal is to compute:

    max{ji | 0i<jnf(i,j)}\max \left\{ j - i \ \middle| \ 0 \leq i < j \leq n \land f(i, j) \right\}

    Alternatively, define a function g(i) = \min \{ j \mid j > i \land \neg f(i, j) \}. The solution reduces to:

    max0i<n(g(i)i)\max_{0 \leq i < n} \left( g(i) - i \right)

Constraint Analysis

  • 0 <= s.length <= 5 * 10⁴:

    • Limitation: Rules out O(n²) brute-force approaches (25e⁹ operations worst-case). Requires O(n) or O(n log n).
    • Edge Cases:
      • Empty string → Output 0.
      • Length=1 → Output 1 (trivial unique substring).
      • No repeating characters → Entire string length (L = n).
  • Character Set (Letters, Digits, Symbols, Spaces):

    • Limitation: Finite alphabet size (|Σ| ≤ 256 for extended ASCII). Enables O(1) space tracking via fixed-size arrays.
    • Edge Cases:
      • Non-letter characters (e.g., “a b c” has valid substring “a b c” if spaces are unique).
      • Case sensitivity? Problem states “characters”, so ‘A’ ≠ ‘a’.

2. Intuition Scaffolding

Analogies

  1. Real-World Metaphor (Conveyor Belt Manufacturing)
    Imagine inspecting items on a conveyor belt. You need the longest sequence of unique items. When a duplicate appears, you discard all items from the start up to the duplicate. Track the maximum unique sequence observed.

  2. Gaming Analogy (RPG Quest Items)
    Collect consecutive quest items without duplicates. If a duplicate item appears, drop all items collected after its first occurrence. The “inventory size” is the substring length.

  3. Math Analogy (Interval Injectivity)
    Find the longest interval where the character-position mapping is injective. Equivalently, maximize j - i while maintaining injectivity of s restricted to [i, j).

Common Pitfalls

  1. Naive Shrinking: Moving left pointer by 1 (not jump) after duplicate → O(n²) time.
  2. Global Tracking: Storing global character counts ignores window locality → fails for “abba”.
  3. Set-Only Approach: Using a set without indices → cannot compute optimal left jump.
  4. Edge Handling: Forgetting empty string or single-character cases.
  5. Character Encoding: Assuming only lowercase letters → fails for digits/symbols.

3. Approach Encyclopedia

Approach 1: Brute Force (Triple Nested Loop)

  • Pseudocode:
    max_len = 0  
    for start in range(len(s)):  
        seen = set()  
        for end in range(start, len(s)):  
            if s[end] in seen:  
                break  
            seen.add(s[end])  
            max_len = max(max_len, end - start + 1)  
    return max_len  
    
  • Complexity Proof:
    • Time: O(n²) worst-case (no duplicates). Sum of arithmetic series:

      i=0n1(ni)=n(n+1)2Θ(n2)\sum_{i=0}^{n-1} (n - i) = \frac{n(n+1)}{2} \in \Theta(n^2)

    • Space: O(min(n, |Σ|)) for the set.
  • Visualization:
    s = "abc"  
    Start=0: [a] → [a,b] → [a,b,c] → L=3  
    Start=1: [b] → [b,c] → L=2  
    Start=2: [c] → L=1  
    

Approach 2: Sliding Window (Hash Map Tracking)

  • Pseudocode:
    left = max_len = 0  
    last_seen = {}  # char → last index  
    for right, char in enumerate(s):  
        if char in last_seen and last_seen[char] >= left:  
            left = last_seen[char] + 1  # Jump left to after last duplicate  
        last_seen[char] = right  # Update last seen index  
        max_len = max(max_len, right - left + 1)  
    return max_len  
    
  • Complexity Proof:
    • Time: O(n). Each element processed once. last_seen operations O(1).
    • Space: O(|Σ|) for last_seen. Since |Σ| ≤ 256 → O(1).
  • Visualization (s="abac"):
    Step0: left=0, right=0 (a) → last_seen={a:0}, len=1  
    Step1: left=0, right=1 (b) → last_seen={a:0,b:1}, len=2  
    Step2: left=0, right=2 (a) → duplicate! left=1 → last_seen={a:2,b:1}, len=2  
    Step3: left=1, right=3 (c) → last_seen={a:2,b:1,c:3}, len=3  
    

Approach 3: Sliding Window (Fixed-Size Array)

  • Pseudocode:
    left = max_len = 0  
    last_index = [-1] * 256  # Initialize with -1  
    for right in range(len(s)):  
        ascii_val = ord(s[right])  
        if last_index[ascii_val] >= left:  # Duplicate in current window  
            left = last_index[ascii_val] + 1  
        last_index[ascii_val] = right  # Update last index  
        max_len = max(max_len, right - left + 1)  
    return max_len  
    
  • Complexity Proof:
    • Time: O(n) — single pass, constant-time array access.
    • Space: O(1) — fixed-size array (256 integers).
  • Visualization (s="aab"):
    Step0: right=0 (a) → last_index[97]=0, left=0, len=1  
    Step1: right=1 (a) → last_index[97]=0 ≥ left=0 → left=1, update last_index[97]=1, len=1  
    Step2: right=2 (b) → last_index[98]=-1 → len=2  
    

4. Code Deep Dive

Optimal Solution (Array-Based Sliding Window)

def lengthOfLongestSubstring(s: str) -> int:  
    n = len(s)  
    if n == 0:  # Edge: empty string  
        return 0  
      
    last_index = [-1] * 256  # Track last occurrence; -1 = not seen  
    left = max_len = 0  
      
    for right in range(n):  
        char_code = ord(s[right])  
        # If char seen in current window, jump left to last_index[char] + 1  
        if last_index[char_code] >= left:  
            left = last_index[char_code] + 1  
        last_index[char_code] = right  # Update last seen position  
        current_len = right - left + 1  
        if current_len > max_len:  
            max_len = current_len  
    return max_len  

Edge Case Handling

  • Example 1 (s="abcabcbb")
    • At right=3 (‘a’): last_index[97]=0 ≥ left=0left=1.
    • Window “bca” at right=3len=3.
    • Max remains 3 throughout.
  • Example 2 (s="bbbbb")
    • right=0: last_index[98]=-1left=0, len=1.
    • right=1: last_index[98]=0 ≥ left=0left=1, len=1.
    • Output=1.
  • Example 3 (s="pwwkew")
    • right=2 (‘w’): last_index[119]=1 ≥ left=0left=2.
    • At right=4: Window “wke” → len=3 (max).

5. Complexity War Room

Hardware-Aware Analysis

  • Time: O(n) → 5e⁴ operations ≈ 0.1ms (Python), negligible.
  • Space: 256 × 4 bytes (int array) = 1KB → Fits in L1 CPU cache.

Industry Comparison Table

Approach Time Space Readability Interview Viability
Brute Force O(n²) O(1) ★★★★★ ❌ (Fails n=5e4)
Sliding Window (Map) O(n) O( Σ )
Array Sliding O(n) O(1) ★★★★☆ Best

6. Pro Mode Extras

Variants

  1. At Most K Duplicates (LeetCode 340)

    • Track character frequencies. Shrink window when distinct chars > k.
    freq = defaultdict(int)  
    left = distinct = max_len = 0  
    for right, char in enumerate(s):  
        if freq[char] == 0: distinct += 1  
        freq[char] += 1  
        while distinct > k:  
            freq[s[left]] -= 1  
            if freq[s[left]] == 0: distinct -= 1  
            left += 1  
        max_len = max(max_len, right - left + 1)  
    
  2. Longest Substring with At Least K Repeating (LeetCode 395)

    • Divide and conquer on characters with frequency < k.
    if len(s) < k: return 0  
    for char in set(s):  
        if s.count(char) < k:  
            return max(self.longestSubstring(t, k) for t in s.split(char))  
    return len(s)  
    

Interview Cheat Sheet

  • First Mention: Always state time/space complexity upfront.
  • Sliding Window Template:
    1. Initialize left=0, tracking structure, max_len.  
    2. Iterate right over [0, n):  
        a. Update tracking with s[right].  
        b. While window invalid:  
            i. Remove s[left] from tracking.  
            ii. left++.  
        c. Update max_len.  
    
  • Test Rigorously: Empty, all unique, all same, “abba”, “tmmzuxt”.