Valid Anagram

Difficulty: Easy
Topics: Hash Table, String, Sorting
Link: https://leetcode.com/problems/valid-anagram/

Problem Description

Given two strings s and t, return true if t is an anagram of s, and false otherwise.

Example 1:

Input: s = “anagram”, t = “nagaram”

Output: true

Example 2:

Input: s = “rat”, t = “car”

Output: false

Constraints:

1 <= s.length, t.length <= 5 * 104
s and t consist of lowercase English letters.

Follow up: What if the inputs contain Unicode characters? How would you adapt your solution to such a case?

Solution

1. Problem Deconstruction

Rewrite the Problem

Technical:
Given two strings s and t, determine if t is a permutation of s under these conditions:
- $|s| = |t| = n$ (string lengths are equal)
- For every character $c \in \Sigma$ (alphabet), the frequency $\text{freq}_s(c) = \text{freq}_t(c)$
- Return true if conditions hold; false otherwise.
Beginner:
“Check if two words have exactly the same letters in the same quantities, ignoring order. For example, ‘listen’ and ‘silent’ are anagrams because both use 1 L, 1 I, 1 S, 1 E, 1 N, 1 T.”
Mathematical:
Let $s = [s_1, s_2, \ldots, s_n]$ and $t = [t_1, t_2, \ldots, t_n]$ be sequences of characters.
Define $\delta(c, s) = |\{i : s_i = c\}|$ (frequency of $c$ in $s$ ).
Then, $t$ is an anagram of $s$ iff:
$\forall c \in \Sigma, \delta(c, s) = \delta(c, t)$

Constraint Analysis

$1 \leq n \leq 5 \times 10^4$ :
- Limits solutions to $O(n \log n)$ or better (brute force $O(n^2)$ fails at $n=5e4$ → $25e9$ operations).
- Implies edge case: Early termination if $|s| \neq |t|$ .
Lowercase English letters:
- Alphabet size $|\Sigma| = 26$ → constant-space frequency counters possible.
- Hidden edge: Non-alphabet characters excluded (simplifies mapping).
Follow-up (Unicode):
- $|\Sigma|$ unbounded → requires dynamic frequency storage (hash tables).
- Edge: Characters outside BMP (4-byte Unicode) must be handled.

2. Intuition Scaffolding

Analogies

Real-World Metaphor:
“Like verifying two identical sets of Scrabble tiles. Each tile must appear in both sets with identical counts, regardless of arrangement.”
Gaming Analogy:
“In RPG inventory management, check if two loot bags contain the same items in identical quantities, ignoring order.”
Math Analogy:
“Multiset equality: $\text{MS}(s) = \text{MS}(t)$ , where $\text{MS}$ denotes the multiset of characters.”

Common Pitfalls

Length Mismatch Oversight:
Not checking $|s| = |t|$ first → wasted computation.
Case Sensitivity:
Forgetting to normalize case (though constraints specify lowercase).
Frequency Underflow:
Decrementing counts below zero without validation → false negatives.
Non-Existent Key Errors:
Accessing uninitialized hash table keys for Unicode characters.
Suboptimal Sorting:
Using $O(n \log n)$ sort when $O(n)$ counting suffices for fixed alphabets.

3. Approach Encyclopedia

Approach 1: Brute Force (Naive Check)

Pseudocode:

function isAnagram(s, t):
    if len(s) != len(t): 
        return False
    for char in s:
        if count(char, s) != count(char, t):
            return False
    return True

Complexity Proof:
- Time: $O(n^2)$ → For each of $n$ chars in $s$ , scan $s$ and $t$ ( $O(n)$ per char).
- Space: $O(1)$ (no auxiliary storage).

Visualization:

s: "abc"    | t: "cba"
Step 1: Count 'a' in s → 1, in t → 1 → OK
Step 2: Count 'b' in s → 1, in t → 1 → OK
Step 3: Count 'c' in s → 1, in t → 1 → OK → True

Approach 2: Sorting

Pseudocode:

function isAnagram(s, t):
    return sorted(s) == sorted(t)  // After length check

Complexity Proof:
- Time: $O(n \log n)$ → Dominated by sorting two strings.
- Space: $O(n)$ → Storage for sorted strings (non-in-place sort).

Visualization:

s: "nagaram" → sorted: ['a','a','a','g','m','n','r']
t: "anagram" → sorted: ['a','a','a','g','m','n','r'] → True

Approach 3: Fixed-Size Frequency Array (Optimal for English)

Pseudocode:

function isAnagram(s, t):
    if len(s) != len(t): 
        return False
    count = [0] * 26
    for char in s:
        count[ord(char) - ord('a')] += 1
    for char in t:
        index = ord(char) - ord('a')
        count[index] -= 1
        if count[index] < 0:  // Early exit
            return False
    return True

Complexity Proof:
- Time: $O(n)$ → Two linear passes over $s$ and $t$ .
- Space: $O(1)$ → $26 \times \text{sizeof(int)}$ (104 bytes for 26 ints).

Visualization:

s: "rat" → count: [ a:1, r:1, t:1 ]
t: "car" → 
    'c' → count[2] = 0-1 = -1 → return False

Approach 4: Hash Table (Optimal for Unicode)

Pseudocode:

function isAnagram(s, t):
    if len(s) != len(t): 
        return False
    freq = {}
    for char in s:
        freq[char] = freq.get(char, 0) + 1
    for char in t:
        if char not in freq:  // Key exists?
            return False
        freq[char] -= 1
        if freq[char] == 0:
            del freq[char]  // Optional cleanup
    return len(freq) == 0  // All keys exhausted?

Complexity Proof:
- Time: $O(n)$ → Each pass is $O(1)$ per char (hash table operations).
- Space: $O(k)$ → $k$ = unique chars in $s$ (worst-case $O(n)$ for Unicode).

Visualization:

s: "😊a" → freq: { '😊':1, 'a':1 }
t: "a😊" → 
    'a': freq{'😊':1, 'a':0} → 
    '😊': freq{} → len=0 → True

4. Code Deep Dive

Optimal Solution (Fixed Alphabet)

def isAnagram(s: str, t: str) -> bool:
    # Edge: Length mismatch (immediate failure)
    if len(s) != len(t):  # O(1) time
        return False
    
    # Initialize frequency array for 'a'-'z'
    count = [0] * 26  # O(26) space
    
    # Build frequency map for s
    for char in s:  # O(n)
        idx = ord(char) - ord('a')  # Map char to 0-25
        count[idx] += 1  # Increment count
    
    # Validate against t
    for char in t:  # O(n)
        idx = ord(char) - ord('a')
        count[idx] -= 1  # Decrement count
        # Underflow check: t has more 'char' than s
        if count[idx] < 0:  # Early exit
            return False
    
    # All counts balanced (no overflow possible due to length check)
    return True

Edge Case Handling

Example 1 (s=“anagram”, t=“nagaram”):
- Lengths equal (7).
- After processing s: a:3, n:1, g:1, r:1, m:1.
- Processing t: Decrements preserve counts → no negatives → returns True.
Example 2 (s=“rat”, t=“car”):
- Lengths equal (3).
- After s: r:1, a:1, t:1.
- First char in t: 'c' → count[2] = 0-1 = -1 → returns False (line 15).

5. Complexity War Room

Hardware-Aware Analysis

Fixed-Array (English):
- $n=5e4$ → 100,000 iterations (2 passes).
- Memory: 26 ints = 208 bytes (Python int=8 bytes) → fits in L1 cache.
Hash Table (Unicode):
- Worst-case: $n=5e4$ unique Unicode chars → 50,000 key-value pairs.
- Memory: $\approx 50,000 \times (24 \text{ bytes/key} + 8 \text{ bytes/value}) = 1.6 \text{ MB}$ → fits in L3 cache.

Industry Comparison Table

Approach	Time	Space	Readability	Interview Viability
Brute Force	$O(n^2)$	$O(1)$	10/10	❌ Fails $n=5e4$
Sorting	$O(n \log n)$	$O(n)$	9/10	⚠️ Suboptimal
Fixed Array	$O(n)$	$O(1)$	8/10	✅ Optimal
Hash Table	$O(n)$	$O(k)$	7/10	✅ Unicode Ready

6. Pro Mode Extras

Variants

Group Anagrams (LC 49):
- Problem: Group words by anagram families.
- Solution: Use sorted word as hash key → groups = { sorted(word): [words] }.
- Code Snippet:
```
groups = {}
for word in words:
    key = ''.join(sorted(word))
    groups.setdefault(key, []).append(word)
```
Find All Anagrams (LC 438):
- Problem: Find starting indices of all anagrams of p in s.
- Solution: Sliding window with frequency array.
  - Initialize frequency array for p.
  - Use two pointers to maintain window $\|window\| = \|p\|$ in s.
  - Update frequency counts incrementally when sliding.

Interview Cheat Sheet

First Response:
“Check lengths → unequal means immediate failure.”
Complexity Trade-offs:
“For English, fixed array is optimal ( $O(1)$ space). For Unicode, hash tables scale.”
Edge Cases:
1. Length mismatch
2. Empty strings ( $n=0$ → trivially anagrams)
3. Identical strings (anagrams)
4. Case sensitivity (if constraints didn’t specify lowercase)
Optimization Insight:
“Early termination on frequency underflow saves unnecessary iterations.”

#242 - Valid Anagram

Valid Anagram

Problem Description

Solution

1. Problem Deconstruction

Rewrite the Problem

Constraint Analysis

2. Intuition Scaffolding

Analogies

Common Pitfalls

3. Approach Encyclopedia

Approach 1: Brute Force (Naive Check)

Approach 2: Sorting

Approach 3: Fixed-Size Frequency Array (Optimal for English)

Approach 4: Hash Table (Optimal for Unicode)

4. Code Deep Dive

Optimal Solution (Fixed Alphabet)

Edge Case Handling

5. Complexity War Room

Hardware-Aware Analysis

Industry Comparison Table

6. Pro Mode Extras

Variants

Interview Cheat Sheet