Group Anagrams

Difficulty: Medium
Topics: Array, Hash Table, String, Sorting
Link: https://leetcode.com/problems/group-anagrams/

Problem Description

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

Example 1:

Input: strs = [“eat”,“tea”,“tan”,“ate”,“nat”,“bat”]

Output: [[“bat”],[“nat”,“tan”],[“ate”,“eat”,“tea”]]

Explanation:

There is no string in strs that can be rearranged to form "bat".
The strings "nat" and "tan" are anagrams as they can be rearranged to form each other.
The strings "ate", "eat", and "tea" are anagrams as they can be rearranged to form each other.

Example 2:

Input: strs = [“”]

Output: [[“”]]

Example 3:

Input: strs = [“a”]

Output: [[“a”]]

Constraints:

1 <= strs.length <= 104
0 <= strs[i].length <= 100
strs[i] consists of lowercase English letters.

Solution

1. Problem Deconstruction

Rewriting the Problem

Technical Version:
Given an array strs containing n strings, partition the array into disjoint groups where two strings belong to the same group if and only if they are anagrams (i.e., one can be rearranged into the other via character permutation). Return the groups as a list of string lists, order-agnostic.
Beginner Version:
You have a list of words. Group words that have the exact same letters (like “tea” and “eat”) together. Return a list of these groups. Words with unique letter combinations go into separate groups.
Mathematical Version:
Let $S = \{s_1, s_2, \dots, s_n\}$ be a set of strings. Define an equivalence relation $\sim$ where $s_i \sim s_j$ iff $\exists$ a bijection $f : \text{chars}(s_i) \to \text{chars}(s_j)$ preserving character counts. The solution is the quotient set $S/{\sim}$ :
$\text{Output} = \left[ [s_j]_{\sim} \mid \forall s_j \in S \right], \quad [s_j]_{\sim} = \{ s_k \in S \mid s_k \sim s_j \}$
Equivalently, define a canonical representation $c(s_i) = (\text{count}(a), \text{count}(b), \dots, \text{count}(z)) \in \mathbb{Z}^{26}$ . Then:
$\text{Group}(s_i) = \{ s_k \in S \mid c(s_k) = c(s_i) \}$

Constraint Analysis

$1 \leq \texttt{strs.length} \leq 10^4$ :
- Limitation: Rules out $O(n^2)$ solutions (e.g., pairwise comparisons). Worst-case $10^8$ operations exceeds Python’s practical limits (~ $10^7$ /sec).
- Edge Case: Large input requires $O(n \cdot k)$ or $O(n \cdot k \log k)$ solutions where $k$ is string length.
$0 \leq \texttt{strs[i].length} \leq 100$ :
- Limitation: Allows $O(k)$ per-string operations (e.g., sorting, counting). Worst-case total characters: $10^4 \times 100 = 10^6$ , manageable for linear methods.
- Edge Cases:
  - Empty string ("") must map to a group.
  - Length-1 strings (e.g., ["a"]) form singleton groups.
Lowercase English letters:
- Limitation: Fixed 26-letter alphabet enables $O(1)$ space for frequency vectors (size 26).
- Edge Case: All strings identical (e.g., ["abc","abc"]) → single group.

2. Intuition Scaffolding

Analogies

Real-World Metaphor:
Library Catalog System: Books (strings) with identical ISBN (canonical form) go in the same section. Sorting letters = ISBN generation.
Gaming Analogy:
Pokémon Evolution Stones: Each anagram group is a Pokémon species (e.g., Pikachu family). Different “stones” (character rearrangements) evolve them to the same final form.
Math Analogy:
Vector Space Partition: Strings → 26D frequency vectors. Grouping = clustering identical vectors in $\mathbb{Z}^{26}$ .

Common Pitfalls

Brute-Force Pairwise Checks: $O(n^2 \cdot k)$ → fails at scale.
Hashing Mutable Lists: Using unhashable list as dict key → runtime error.
Set of Characters: Ignores frequency (e.g., "aab" ≠ "abb").
Length Mismatch: Assuming equal-length strings → incorrect grouping.
Case Sensitivity: Overlooking lowercase constraint → wasted checks.

3. Approach Encyclopedia

Approach 1: Sorted String Key

What: Convert each string to its sorted version (canonical form).
Why: Anagrams have identical sorted forms; efficient grouping via hashing.

How:

groups = defaultdict(list)
for s in strs:
    sorted_s = ''.join(sorted(s))  # O(k log k)
    groups[sorted_s].append(s)
return list(groups.values())

Complexity Proof:
- Time: $O(n \cdot k \log k)$ (sorting $k$ -chars for $n$ strings).
- Space: $O(n \cdot k)$ (store all sorted strings).

Visualization:

Input: ["eat", "tea", "ate"]
Sorted: 
  "eat" → "aet"
  "tea" → "aet"
  "ate" → "aet"
Groups: {"aet": ["eat", "tea", "ate"]}

Approach 2: Frequency Vector Key (Optimal)

What: Represent each string by a 26-element frequency vector (count per char).
Why: Avoids $k \log k$ sorting; $O(k)$ per string.

How:

groups = defaultdict(list)
for s in strs:
    count = [0] * 26
    for char in s:  # O(k)
        count[ord(char) - ord('a')] += 1
    groups[tuple(count)].append(s)  # Tuple is hashable
return list(groups.values())

Complexity Proof:
- Time: $O(n \cdot k)$ ( $n$ strings × $k$ chars each).
- Space: $O(n \cdot 26)$ (keys) + $O(n \cdot k)$ (values) = $O(n \cdot k)$ .

Visualization:

"abc" → [1,1,1,0,0,...] → tuple([1,1,1,0,...])
"bac" → [1,1,1,0,0,...] → same tuple → same group.

4. Code Deep Dive

Optimal Solution (Frequency Vector)

from collections import defaultdict

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        groups = defaultdict(list)  # Map: frequency tuple → list of anagrams
        for s in strs:  # O(n)
            count = [0] * 26  # Initialize 26-dim frequency vector
            for char in s:  # O(k)
                # Map 'a'→0, 'b'→1, ..., 'z'→25
                idx = ord(char) - ord('a')
                count[idx] += 1
            # Convert list to tuple for immutability/hashing
            key = tuple(count)
            groups[key].append(s)
        return list(groups.values())  # Extract grouped lists

Edge Case Handling

strs = [""] (Example 2):
- s = "" → loop skipped → count = [0]*26 → key = (0,0,...,0) → group [""].
strs = ["a"] (Example 3):
- s = "a" → count[0] = 1 → key = (1,0,0,...,0) → group ["a"].
Large n (Constraint):
- $O(n \cdot k)$ time handles $n=10^4$ , $k=100$ efficiently ( $10^6$ ops).

5. Complexity War Room

Hardware-Aware Analysis

Memory:
- Keys: $n$ tuples × 26 integers × 4 bytes = $10^4 \times 26 \times 4 \approx 1$ MB (fits in CPU L3 cache).
- Values: $O(n \cdot k)$ worst-case (all strings unique) = $10^4 \times 100$ chars × 1 byte ≈ 1 MB.
Throughput:
- At $10^6$ ops (Python speed ~ $10^7$ ops/sec), runtime ≈ 0.1 sec.

Industry Comparison Table

Approach	Time	Space	Readability	Interview Viability
Brute Force	$O(n^2 \cdot k)$	$O(1)$	9/10	❌ (Fails $n=10^4$ )
Sorted String Key	$O(n \cdot k \log k)$	$O(n \cdot k)$	10/10	✅ (k small)
Frequency Vector	$O(n \cdot k)$	$O(n \cdot k)$	9/10	✅ Optimal

6. Pro Mode Extras

Variants

Unicode Support (LC Extended):
- Use frozenset(Counter(s).items()) as key. Handles arbitrary characters.
```
key = frozenset(Counter(s).items())  # O(k)
```
Group Shifted Strings (LC 249):
- Key: tuple((ord(c) - ord(s[0])) % 26 for c in s) for cyclic shifts.

Interview Cheat Sheet

First Mention: Always state time/space complexity upfront.
Clarify: “Are strings Unicode or [a-z]?” → dictates key strategy.
Optimize: If $k$ is large, frequency vector > sorting. If $k$ tiny, sorting may be faster.
Verify: Test with [""], ["a"], and ["abc","cba"] before coding.

#49 - Group Anagrams