← ~/lc-grind/posts

#242 - Valid Anagram

July 4, 2025

Valid Anagram

Problem Description

Given two strings s and t, return true if t is an anagram of s, and false otherwise.

Example 1:

Input: s = “anagram”, t = “nagaram”

Output: true

Example 2:

Input: s = “rat”, t = “car”

Output: false

Constraints:

  • 1 <= s.length, t.length <= 5 * 104
  • s and t consist of lowercase English letters.

Follow up: What if the inputs contain Unicode characters? How would you adapt your solution to such a case?

Solution

1. Problem Deconstruction

Rewrite the Problem

  1. Technical:
    Given two strings s and t, determine if t is a permutation of s under these conditions:

    • s=t=n|s| = |t| = n (string lengths are equal)
    • For every character cΣc \in \Sigma (alphabet), the frequency freqs(c)=freqt(c)\text{freq}_s(c) = \text{freq}_t(c)
    • Return true if conditions hold; false otherwise.
  2. Beginner:
    “Check if two words have exactly the same letters in the same quantities, ignoring order. For example, ‘listen’ and ‘silent’ are anagrams because both use 1 L, 1 I, 1 S, 1 E, 1 N, 1 T.”

  3. Mathematical:
    Let s=[s1,s2,,sn]s = [s_1, s_2, \ldots, s_n] and t=[t1,t2,,tn]t = [t_1, t_2, \ldots, t_n] be sequences of characters.
    Define δ(c,s)={i:si=c}\delta(c, s) = |\{i : s_i = c\}| (frequency of cc in ss).
    Then, tt is an anagram of ss iff:

    cΣ,δ(c,s)=δ(c,t)\forall c \in \Sigma, \delta(c, s) = \delta(c, t)

Constraint Analysis

  • 1n5×1041 \leq n \leq 5 \times 10^4:
    • Limits solutions to O(nlogn)O(n \log n) or better (brute force O(n2)O(n^2) fails at n=5e4n=5e425e925e9 operations).
    • Implies edge case: Early termination if st|s| \neq |t|.
  • Lowercase English letters:
    • Alphabet size Σ=26|\Sigma| = 26 → constant-space frequency counters possible.
    • Hidden edge: Non-alphabet characters excluded (simplifies mapping).
  • Follow-up (Unicode):
    • Σ|\Sigma| unbounded → requires dynamic frequency storage (hash tables).
    • Edge: Characters outside BMP (4-byte Unicode) must be handled.

2. Intuition Scaffolding

Analogies

  1. Real-World Metaphor:
    “Like verifying two identical sets of Scrabble tiles. Each tile must appear in both sets with identical counts, regardless of arrangement.”
  2. Gaming Analogy:
    “In RPG inventory management, check if two loot bags contain the same items in identical quantities, ignoring order.”
  3. Math Analogy:
    “Multiset equality: MS(s)=MS(t)\text{MS}(s) = \text{MS}(t), where MS\text{MS} denotes the multiset of characters.”

Common Pitfalls

  1. Length Mismatch Oversight:
    Not checking s=t|s| = |t| first → wasted computation.
  2. Case Sensitivity:
    Forgetting to normalize case (though constraints specify lowercase).
  3. Frequency Underflow:
    Decrementing counts below zero without validation → false negatives.
  4. Non-Existent Key Errors:
    Accessing uninitialized hash table keys for Unicode characters.
  5. Suboptimal Sorting:
    Using O(nlogn)O(n \log n) sort when O(n)O(n) counting suffices for fixed alphabets.

3. Approach Encyclopedia

Approach 1: Brute Force (Naive Check)

  • Pseudocode:
    function isAnagram(s, t):
        if len(s) != len(t): 
            return False
        for char in s:
            if count(char, s) != count(char, t):
                return False
        return True
    
  • Complexity Proof:
    • Time: O(n2)O(n^2) → For each of nn chars in ss, scan ss and tt (O(n)O(n) per char).
    • Space: O(1)O(1) (no auxiliary storage).
  • Visualization:
    s: "abc"    | t: "cba"
    Step 1: Count 'a' in s → 1, in t → 1 → OK
    Step 2: Count 'b' in s → 1, in t → 1 → OK
    Step 3: Count 'c' in s → 1, in t → 1 → OK → True
    

Approach 2: Sorting

  • Pseudocode:
    function isAnagram(s, t):
        return sorted(s) == sorted(t)  // After length check
    
  • Complexity Proof:
    • Time: O(nlogn)O(n \log n) → Dominated by sorting two strings.
    • Space: O(n)O(n) → Storage for sorted strings (non-in-place sort).
  • Visualization:
    s: "nagaram" → sorted: ['a','a','a','g','m','n','r']
    t: "anagram" → sorted: ['a','a','a','g','m','n','r'] → True
    

Approach 3: Fixed-Size Frequency Array (Optimal for English)

  • Pseudocode:
    function isAnagram(s, t):
        if len(s) != len(t): 
            return False
        count = [0] * 26
        for char in s:
            count[ord(char) - ord('a')] += 1
        for char in t:
            index = ord(char) - ord('a')
            count[index] -= 1
            if count[index] < 0:  // Early exit
                return False
        return True
    
  • Complexity Proof:
    • Time: O(n)O(n) → Two linear passes over ss and tt.
    • Space: O(1)O(1)26×sizeof(int)26 \times \text{sizeof(int)} (104 bytes for 26 ints).
  • Visualization:
    s: "rat" → count: [ a:1, r:1, t:1 ]
    t: "car" → 
        'c' → count[2] = 0-1 = -1 → return False
    

Approach 4: Hash Table (Optimal for Unicode)

  • Pseudocode:
    function isAnagram(s, t):
        if len(s) != len(t): 
            return False
        freq = {}
        for char in s:
            freq[char] = freq.get(char, 0) + 1
        for char in t:
            if char not in freq:  // Key exists?
                return False
            freq[char] -= 1
            if freq[char] == 0:
                del freq[char]  // Optional cleanup
        return len(freq) == 0  // All keys exhausted?
    
  • Complexity Proof:
    • Time: O(n)O(n) → Each pass is O(1)O(1) per char (hash table operations).
    • Space: O(k)O(k)kk = unique chars in ss (worst-case O(n)O(n) for Unicode).
  • Visualization:
    s: "😊a" → freq: { '😊':1, 'a':1 }
    t: "a😊" → 
        'a': freq{'😊':1, 'a':0} → 
        '😊': freq{} → len=0 → True
    

4. Code Deep Dive

Optimal Solution (Fixed Alphabet)

def isAnagram(s: str, t: str) -> bool:
    # Edge: Length mismatch (immediate failure)
    if len(s) != len(t):  # O(1) time
        return False
    
    # Initialize frequency array for 'a'-'z'
    count = [0] * 26  # O(26) space
    
    # Build frequency map for s
    for char in s:  # O(n)
        idx = ord(char) - ord('a')  # Map char to 0-25
        count[idx] += 1  # Increment count
    
    # Validate against t
    for char in t:  # O(n)
        idx = ord(char) - ord('a')
        count[idx] -= 1  # Decrement count
        # Underflow check: t has more 'char' than s
        if count[idx] < 0:  # Early exit
            return False
    
    # All counts balanced (no overflow possible due to length check)
    return True

Edge Case Handling

  • Example 1 (s=“anagram”, t=“nagaram”):
    • Lengths equal (7).
    • After processing s: a:3, n:1, g:1, r:1, m:1.
    • Processing t: Decrements preserve counts → no negatives → returns True.
  • Example 2 (s=“rat”, t=“car”):
    • Lengths equal (3).
    • After s: r:1, a:1, t:1.
    • First char in t: 'c'count[2] = 0-1 = -1 → returns False (line 15).

5. Complexity War Room

Hardware-Aware Analysis

  • Fixed-Array (English):
    • n=5e4n=5e4 → 100,000 iterations (2 passes).
    • Memory: 26 ints = 208 bytes (Python int=8 bytes) → fits in L1 cache.
  • Hash Table (Unicode):
    • Worst-case: n=5e4n=5e4 unique Unicode chars → 50,000 key-value pairs.
    • Memory: 50,000×(24 bytes/key+8 bytes/value)=1.6 MB\approx 50,000 \times (24 \text{ bytes/key} + 8 \text{ bytes/value}) = 1.6 \text{ MB} → fits in L3 cache.

Industry Comparison Table

Approach Time Space Readability Interview Viability
Brute Force O(n2)O(n^2) O(1)O(1) 10/10 ❌ Fails n=5e4n=5e4
Sorting O(nlogn)O(n \log n) O(n)O(n) 9/10 ⚠️ Suboptimal
Fixed Array O(n)O(n) O(1)O(1) 8/10 ✅ Optimal
Hash Table O(n)O(n) O(k)O(k) 7/10 ✅ Unicode Ready

6. Pro Mode Extras

Variants

  1. Group Anagrams (LC 49):

    • Problem: Group words by anagram families.
    • Solution: Use sorted word as hash key → groups = { sorted(word): [words] }.
    • Code Snippet:
      groups = {}
      for word in words:
          key = ''.join(sorted(word))
          groups.setdefault(key, []).append(word)
      
  2. Find All Anagrams (LC 438):

    • Problem: Find starting indices of all anagrams of p in s.
    • Solution: Sliding window with frequency array.
      • Initialize frequency array for p.
      • Use two pointers to maintain window window=p\|window\| = \|p\| in s.
      • Update frequency counts incrementally when sliding.

Interview Cheat Sheet

  • First Response:
    “Check lengths → unequal means immediate failure.”
  • Complexity Trade-offs:
    “For English, fixed array is optimal (O(1)O(1) space). For Unicode, hash tables scale.”
  • Edge Cases:
    1. Length mismatch
    2. Empty strings (n=0n=0 → trivially anagrams)
    3. Identical strings (anagrams)
    4. Case sensitivity (if constraints didn’t specify lowercase)
  • Optimization Insight:
    “Early termination on frequency underflow saves unnecessary iterations.”