#49 - Group Anagrams
Group Anagrams
- Difficulty: Medium
- Topics: Array, Hash Table, String, Sorting
- Link: https://leetcode.com/problems/group-anagrams/
Problem Description
Given an array of strings strs
, group the anagrams together. You can return the answer in any order.
Example 1:
Input: strs = [“eat”,“tea”,“tan”,“ate”,“nat”,“bat”]
Output: [[“bat”],[“nat”,“tan”],[“ate”,“eat”,“tea”]]
Explanation:
- There is no string in strs that can be rearranged to form
"bat"
. - The strings
"nat"
and"tan"
are anagrams as they can be rearranged to form each other. - The strings
"ate"
,"eat"
, and"tea"
are anagrams as they can be rearranged to form each other.
Example 2:
Input: strs = [“”]
Output: [[“”]]
Example 3:
Input: strs = [“a”]
Output: [[“a”]]
Constraints:
1 <= strs.length <= 104
0 <= strs[i].length <= 100
strs[i]
consists of lowercase English letters.
Solution
1. Problem Deconstruction
Rewriting the Problem
-
Technical Version:
Given an arraystrs
containingn
strings, partition the array into disjoint groups where two strings belong to the same group if and only if they are anagrams (i.e., one can be rearranged into the other via character permutation). Return the groups as a list of string lists, order-agnostic. -
Beginner Version:
You have a list of words. Group words that have the exact same letters (like “tea” and “eat”) together. Return a list of these groups. Words with unique letter combinations go into separate groups. -
Mathematical Version:
Let be a set of strings. Define an equivalence relation where iff a bijection preserving character counts. The solution is the quotient set :Equivalently, define a canonical representation . Then:
Constraint Analysis
-
:
- Limitation: Rules out solutions (e.g., pairwise comparisons). Worst-case operations exceeds Python’s practical limits (~/sec).
- Edge Case: Large input requires or solutions where is string length.
-
:
- Limitation: Allows per-string operations (e.g., sorting, counting). Worst-case total characters: , manageable for linear methods.
- Edge Cases:
- Empty string (
""
) must map to a group. - Length-1 strings (e.g.,
["a"]
) form singleton groups.
- Empty string (
-
Lowercase English letters:
- Limitation: Fixed 26-letter alphabet enables space for frequency vectors (size 26).
- Edge Case: All strings identical (e.g.,
["abc","abc"]
) → single group.
2. Intuition Scaffolding
Analogies
-
Real-World Metaphor:
Library Catalog System: Books (strings) with identical ISBN (canonical form) go in the same section. Sorting letters = ISBN generation. -
Gaming Analogy:
Pokémon Evolution Stones: Each anagram group is a Pokémon species (e.g., Pikachu family). Different “stones” (character rearrangements) evolve them to the same final form. -
Math Analogy:
Vector Space Partition: Strings → 26D frequency vectors. Grouping = clustering identical vectors in .
Common Pitfalls
- Brute-Force Pairwise Checks: → fails at scale.
- Hashing Mutable Lists: Using unhashable list as dict key → runtime error.
- Set of Characters: Ignores frequency (e.g.,
"aab"
≠"abb"
). - Length Mismatch: Assuming equal-length strings → incorrect grouping.
- Case Sensitivity: Overlooking lowercase constraint → wasted checks.
3. Approach Encyclopedia
Approach 1: Sorted String Key
- What: Convert each string to its sorted version (canonical form).
- Why: Anagrams have identical sorted forms; efficient grouping via hashing.
- How:
groups = defaultdict(list) for s in strs: sorted_s = ''.join(sorted(s)) # O(k log k) groups[sorted_s].append(s) return list(groups.values())
- Complexity Proof:
- Time: (sorting -chars for strings).
- Space: (store all sorted strings).
- Visualization:
Input: ["eat", "tea", "ate"] Sorted: "eat" → "aet" "tea" → "aet" "ate" → "aet" Groups: {"aet": ["eat", "tea", "ate"]}
Approach 2: Frequency Vector Key (Optimal)
- What: Represent each string by a 26-element frequency vector (count per char).
- Why: Avoids sorting; per string.
- How:
groups = defaultdict(list) for s in strs: count = [0] * 26 for char in s: # O(k) count[ord(char) - ord('a')] += 1 groups[tuple(count)].append(s) # Tuple is hashable return list(groups.values())
- Complexity Proof:
- Time: ( strings × chars each).
- Space: (keys) + (values) = .
- Visualization:
"abc" → [1,1,1,0,0,...] → tuple([1,1,1,0,...]) "bac" → [1,1,1,0,0,...] → same tuple → same group.
4. Code Deep Dive
Optimal Solution (Frequency Vector)
from collections import defaultdict
class Solution:
def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
groups = defaultdict(list) # Map: frequency tuple → list of anagrams
for s in strs: # O(n)
count = [0] * 26 # Initialize 26-dim frequency vector
for char in s: # O(k)
# Map 'a'→0, 'b'→1, ..., 'z'→25
idx = ord(char) - ord('a')
count[idx] += 1
# Convert list to tuple for immutability/hashing
key = tuple(count)
groups[key].append(s)
return list(groups.values()) # Extract grouped lists
Edge Case Handling
strs = [""]
(Example 2):s = ""
→ loop skipped →count = [0]*26
→key = (0,0,...,0)
→ group[""]
.
strs = ["a"]
(Example 3):s = "a"
→count[0] = 1
→key = (1,0,0,...,0)
→ group["a"]
.
- Large
n
(Constraint):- time handles , efficiently ( ops).
5. Complexity War Room
Hardware-Aware Analysis
- Memory:
- Keys: tuples × 26 integers × 4 bytes = MB (fits in CPU L3 cache).
- Values: worst-case (all strings unique) = chars × 1 byte ≈ 1 MB.
- Throughput:
- At ops (Python speed ~ ops/sec), runtime ≈ 0.1 sec.
Industry Comparison Table
Approach | Time | Space | Readability | Interview Viability |
---|---|---|---|---|
Brute Force | 9/10 | ❌ (Fails ) | ||
Sorted String Key | 10/10 | ✅ (k small) | ||
Frequency Vector | 9/10 | ✅ Optimal |
6. Pro Mode Extras
Variants
- Unicode Support (LC Extended):
- Use
frozenset(Counter(s).items())
as key. Handles arbitrary characters.
key = frozenset(Counter(s).items()) # O(k)
- Use
- Group Shifted Strings (LC 249):
- Key:
tuple((ord(c) - ord(s[0])) % 26 for c in s)
for cyclic shifts.
- Key:
Interview Cheat Sheet
- First Mention: Always state time/space complexity upfront.
- Clarify: “Are strings Unicode or [a-z]?” → dictates key strategy.
- Optimize: If is large, frequency vector > sorting. If tiny, sorting may be faster.
- Verify: Test with
[""]
,["a"]
, and["abc","cba"]
before coding.