← ~/lc-grind/posts

#49 - Group Anagrams

July 7, 2025

Group Anagrams

Problem Description

Given an array of strings strs, group the anagrams together. You can return the answer in any order.

Example 1:

Input: strs = [“eat”,“tea”,“tan”,“ate”,“nat”,“bat”]

Output: [[“bat”],[“nat”,“tan”],[“ate”,“eat”,“tea”]]

Explanation:

  • There is no string in strs that can be rearranged to form "bat".
  • The strings "nat" and "tan" are anagrams as they can be rearranged to form each other.
  • The strings "ate", "eat", and "tea" are anagrams as they can be rearranged to form each other.

Example 2:

Input: strs = [“”]

Output: [[“”]]

Example 3:

Input: strs = [“a”]

Output: [[“a”]]

Constraints:

  • 1 <= strs.length <= 104
  • 0 <= strs[i].length <= 100
  • strs[i] consists of lowercase English letters.

Solution

1. Problem Deconstruction

Rewriting the Problem

  1. Technical Version:
    Given an array strs containing n strings, partition the array into disjoint groups where two strings belong to the same group if and only if they are anagrams (i.e., one can be rearranged into the other via character permutation). Return the groups as a list of string lists, order-agnostic.

  2. Beginner Version:
    You have a list of words. Group words that have the exact same letters (like “tea” and “eat”) together. Return a list of these groups. Words with unique letter combinations go into separate groups.

  3. Mathematical Version:
    Let S={s1,s2,,sn}S = \{s_1, s_2, \dots, s_n\} be a set of strings. Define an equivalence relation \sim where sisjs_i \sim s_j iff \exists a bijection f:chars(si)chars(sj)f : \text{chars}(s_i) \to \text{chars}(s_j) preserving character counts. The solution is the quotient set S/S/{\sim}:

    Output=[[sj]sjS],[sj]={skSsksj}\text{Output} = \left[ [s_j]_{\sim} \mid \forall s_j \in S \right], \quad [s_j]_{\sim} = \{ s_k \in S \mid s_k \sim s_j \}

    Equivalently, define a canonical representation c(si)=(count(a),count(b),,count(z))Z26c(s_i) = (\text{count}(a), \text{count}(b), \dots, \text{count}(z)) \in \mathbb{Z}^{26}. Then:

    Group(si)={skSc(sk)=c(si)}\text{Group}(s_i) = \{ s_k \in S \mid c(s_k) = c(s_i) \}

Constraint Analysis

  1. 1strs.length1041 \leq \texttt{strs.length} \leq 10^4:

    • Limitation: Rules out O(n2)O(n^2) solutions (e.g., pairwise comparisons). Worst-case 10810^8 operations exceeds Python’s practical limits (~10710^7/sec).
    • Edge Case: Large input requires O(nk)O(n \cdot k) or O(nklogk)O(n \cdot k \log k) solutions where kk is string length.
  2. 0strs[i].length1000 \leq \texttt{strs[i].length} \leq 100:

    • Limitation: Allows O(k)O(k) per-string operations (e.g., sorting, counting). Worst-case total characters: 104×100=10610^4 \times 100 = 10^6, manageable for linear methods.
    • Edge Cases:
      • Empty string ("") must map to a group.
      • Length-1 strings (e.g., ["a"]) form singleton groups.
  3. Lowercase English letters:

    • Limitation: Fixed 26-letter alphabet enables O(1)O(1) space for frequency vectors (size 26).
    • Edge Case: All strings identical (e.g., ["abc","abc"]) → single group.

2. Intuition Scaffolding

Analogies

  1. Real-World Metaphor:
    Library Catalog System: Books (strings) with identical ISBN (canonical form) go in the same section. Sorting letters = ISBN generation.

  2. Gaming Analogy:
    Pokémon Evolution Stones: Each anagram group is a Pokémon species (e.g., Pikachu family). Different “stones” (character rearrangements) evolve them to the same final form.

  3. Math Analogy:
    Vector Space Partition: Strings → 26D frequency vectors. Grouping = clustering identical vectors in Z26\mathbb{Z}^{26}.

Common Pitfalls

  1. Brute-Force Pairwise Checks: O(n2k)O(n^2 \cdot k) → fails at scale.
  2. Hashing Mutable Lists: Using unhashable list as dict key → runtime error.
  3. Set of Characters: Ignores frequency (e.g., "aab""abb").
  4. Length Mismatch: Assuming equal-length strings → incorrect grouping.
  5. Case Sensitivity: Overlooking lowercase constraint → wasted checks.

3. Approach Encyclopedia

Approach 1: Sorted String Key

  • What: Convert each string to its sorted version (canonical form).
  • Why: Anagrams have identical sorted forms; efficient grouping via hashing.
  • How:
    groups = defaultdict(list)
    for s in strs:
        sorted_s = ''.join(sorted(s))  # O(k log k)
        groups[sorted_s].append(s)
    return list(groups.values())
    
  • Complexity Proof:
    • Time: O(nklogk)O(n \cdot k \log k) (sorting kk-chars for nn strings).
    • Space: O(nk)O(n \cdot k) (store all sorted strings).
  • Visualization:
    Input: ["eat", "tea", "ate"]
    Sorted: 
      "eat" → "aet"
      "tea" → "aet"
      "ate" → "aet"
    Groups: {"aet": ["eat", "tea", "ate"]}
    

Approach 2: Frequency Vector Key (Optimal)

  • What: Represent each string by a 26-element frequency vector (count per char).
  • Why: Avoids klogkk \log k sorting; O(k)O(k) per string.
  • How:
    groups = defaultdict(list)
    for s in strs:
        count = [0] * 26
        for char in s:  # O(k)
            count[ord(char) - ord('a')] += 1
        groups[tuple(count)].append(s)  # Tuple is hashable
    return list(groups.values())
    
  • Complexity Proof:
    • Time: O(nk)O(n \cdot k) (nn strings × kk chars each).
    • Space: O(n26)O(n \cdot 26) (keys) + O(nk)O(n \cdot k) (values) = O(nk)O(n \cdot k).
  • Visualization:
    "abc" → [1,1,1,0,0,...] → tuple([1,1,1,0,...])
    "bac" → [1,1,1,0,0,...] → same tuple → same group.
    

4. Code Deep Dive

Optimal Solution (Frequency Vector)

from collections import defaultdict

class Solution:
    def groupAnagrams(self, strs: List[str]) -> List[List[str]]:
        groups = defaultdict(list)  # Map: frequency tuple → list of anagrams
        for s in strs:  # O(n)
            count = [0] * 26  # Initialize 26-dim frequency vector
            for char in s:  # O(k)
                # Map 'a'→0, 'b'→1, ..., 'z'→25
                idx = ord(char) - ord('a')
                count[idx] += 1
            # Convert list to tuple for immutability/hashing
            key = tuple(count)
            groups[key].append(s)
        return list(groups.values())  # Extract grouped lists

Edge Case Handling

  • strs = [""] (Example 2):
    • s = "" → loop skipped → count = [0]*26key = (0,0,...,0) → group [""].
  • strs = ["a"] (Example 3):
    • s = "a"count[0] = 1key = (1,0,0,...,0) → group ["a"].
  • Large n (Constraint):
    • O(nk)O(n \cdot k) time handles n=104n=10^4, k=100k=100 efficiently (10610^6 ops).

5. Complexity War Room

Hardware-Aware Analysis

  • Memory:
    • Keys: nn tuples × 26 integers × 4 bytes = 104×26×4110^4 \times 26 \times 4 \approx 1 MB (fits in CPU L3 cache).
    • Values: O(nk)O(n \cdot k) worst-case (all strings unique) = 104×10010^4 \times 100 chars × 1 byte ≈ 1 MB.
  • Throughput:
    • At 10610^6 ops (Python speed ~10710^7 ops/sec), runtime ≈ 0.1 sec.

Industry Comparison Table

Approach Time Space Readability Interview Viability
Brute Force O(n2k)O(n^2 \cdot k) O(1)O(1) 9/10 ❌ (Fails n=104n=10^4)
Sorted String Key O(nklogk)O(n \cdot k \log k) O(nk)O(n \cdot k) 10/10 ✅ (k small)
Frequency Vector O(nk)O(n \cdot k) O(nk)O(n \cdot k) 9/10 Optimal

6. Pro Mode Extras

Variants

  1. Unicode Support (LC Extended):
    • Use frozenset(Counter(s).items()) as key. Handles arbitrary characters.
    key = frozenset(Counter(s).items())  # O(k)
    
  2. Group Shifted Strings (LC 249):
    • Key: tuple((ord(c) - ord(s[0])) % 26 for c in s) for cyclic shifts.

Interview Cheat Sheet

  • First Mention: Always state time/space complexity upfront.
  • Clarify: “Are strings Unicode or [a-z]?” → dictates key strategy.
  • Optimize: If kk is large, frequency vector > sorting. If kk tiny, sorting may be faster.
  • Verify: Test with [""], ["a"], and ["abc","cba"] before coding.