← ~/lc-grind/posts

#127 - Word Ladder

December 2, 2025

Word Ladder

Problem Description

A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> ... -> sk such that:

  • Every adjacent pair of words differs by a single letter.
  • Every si for 1 <= i <= k is in wordList. Note that beginWord does not need to be in wordList.
  • sk == endWord

Given two words, beginWord and endWord, and a dictionary wordList, return the number of words in the shortest transformation sequence from beginWord to endWord, or 0 if no such sequence exists.

Example 1:


Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
Output: 5
Explanation: One shortest transformation sequence is "hit" -> "hot" -> "dot" -> "dog" -> cog", which is 5 words long.

Example 2:


Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
Output: 0
Explanation: The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.

Constraints:

  • 1 <= beginWord.length <= 10
  • endWord.length == beginWord.length
  • 1 <= wordList.length <= 5000
  • wordList[i].length == beginWord.length
  • beginWord, endWord, and wordList[i] consist of lowercase English letters.
  • beginWord != endWord
  • All the words in wordList are unique.

Solution

1. Problem Deconstruction (500+ words)

Rewrite the Problem

1. Technical Version
Given strings beginWord and endWord (both length (L)) and a list wordList of unique strings each of length (L), define a graph (G = (V, E)) where:

  • (V = { \text{beginWord} } \cup \text{wordList})
  • An undirected edge ((u, v) \in E) exists iff the Hamming distance (d(u,v) = 1) (i.e., (u) and (v) differ in exactly one character).

A transformation sequence is a path in (G) from beginWord to endWord. The goal is to find the length (number of vertices) of the shortest such path. If endWordwordList or no path exists, return 0.

2. Beginner Version
You start with a word (e.g., “hit”) and want to reach another word (e.g., “cog”) by changing one letter at a time. Every time you change a letter, the new word must be in a given dictionary. You want to do this in as few steps as possible, counting the starting and ending words. If it’s impossible (the target word isn’t in the dictionary, or you can’t get there), return 0.

3. Mathematical Version
Let (\Sigma = {a,…,z}), (L) be word length, and (W \subseteq \Sigma^L) be the set of valid words (wordList). For (u, v \in \Sigma^L), define adjacency as (u \sim v \iff d_H(u,v)=1). Let (V = {\text{beginWord}} \cup W). We seek:
[ \min{ k \in \mathbb{N} : \exists v_0,\dots,v_k \in V \text{ with } v_0=\text{beginWord}, v_k=\text{endWord}, v_i \sim v_{i+1} \text{ for } 0\le i<k } ]
If the set is empty, return 0. Otherwise, return (k+1) (number of vertices).

Constraint Analysis

  • (1 \le \text{beginWord.length} \le 10): Small (L) permits (O(L \cdot 26)) neighbor generation per word without overhead.
  • (\text{endWord.length} = \text{beginWord.length}): Uniform length ensures Hamming distance is defined; no need to handle variable-length words.
  • (1 \le \text{wordList.length} \le 5000): Moderate size; (O(N^2)) explicit graph construction ((N \approx 5000)) yields ~25M pair comparisons, which may be borderline in Python but manageable in C++. Optimal BFS with neighbor generation scales as (O(N \cdot L \cdot 26) \approx 1.3M) operations, which is safe.
  • All words consist of lowercase English letters: Alphabet size 26 is fixed, simplifying neighbor generation.
  • (\text{beginWord} \ne \text{endWord}): Guarantees non-trivial search; otherwise answer is 1 (but given constraint, we ignore).
  • All words in wordList are unique: Enables (O(1)) set membership checks.

Hidden Edge Cases

  1. endWord absent from wordList → immediate return 0.
  2. beginWord may already be in wordList; BFS must handle duplicates gracefully.
  3. Words of length 1: changing the single character yields 25 neighbors; BFS still works.
  4. Dictionary may contain unreachable words; BFS will ignore them.
  5. Direct connection: if beginWord and endWord differ by one letter and endWordwordList, answer is 2.
  6. Empty wordList → return 0 unless beginWord == endWord (impossible per constraints).

2. Intuition Scaffolding

Generate 3 Analogies

  1. Real-World Metaphor: Imagine a network of airports (words). You can fly from one airport to another if their names differ by one letter. You want the fewest flights from your starting airport to the destination, only stopping at airports that are on the approved list (dictionary).
  2. Gaming Analogy: In a word‑morph puzzle game, you have to transform a starter word into a target word by swapping one tile (letter) at a time, using only valid dictionary words. Each move is a “step,” and you want the minimum steps to beat the level.
  3. Math Analogy: This is a shortest‑path problem in a subgraph of the Hamming graph (H(L,26)), where vertices are strings in (\Sigma^L) and edges connect strings at Hamming distance 1. The given dictionary selects an induced subgraph; we need the geodesic distance between two vertices.

Common Pitfalls Section

  1. Using DFS instead of BFS: DFS does not guarantee the shortest path in an unweighted graph.
  2. Forgetting to check endWord ∈ wordList: Missing this early exit leads to wasted computation.
  3. Building the graph explicitly by all‑pairs comparison: (O(N^2 L)) is inefficient for (N=5000).
  4. No visited set: Causes cycles and exponential blow‑up.
  5. Incorrect sequence length: Counting edges (transformations) instead of vertices (words). The answer must include both beginWord and endWord.
  6. Assuming beginWord ∈ wordList: It may not be, but must still be included as a graph node.
  7. Inefficient neighbor generation: Checking all dictionary words for Hamming distance 1 per node is (O(N L)) per word; better to generate all one‑letter variants ((O(L \cdot 26))) and test set membership.

3. Approach Encyclopedia

Approach 1: Brute Force BFS with Explicit Graph

Idea: Construct adjacency lists by comparing every pair of words, then run BFS.
Pseudocode:

def differ_by_one(a, b):
    count = 0
    for i in range(len(a)):
        if a[i] != b[i]:
            count += 1
            if count > 1: return False
    return count == 1

def ladderLength(beginWord, endWord, wordList):
    if endWord not in wordList: return 0
    words = [beginWord] + wordList
    adj = defaultdict(list)
    n = len(words)
    for i in range(n):
        for j in range(i+1, n):
            if differ_by_one(words[i], words[j]):
                adj[words[i]].append(words[j])
                adj[words[j]].append(words[i])
    # BFS
    q = deque([(beginWord, 1)])
    visited = set()
    while q:
        w, d = q.popleft()
        if w == endWord: return d
        visited.add(w)
        for nb in adj[w]:
            if nb not in visited:
                q.append((nb, d+1))
    return 0

Complexity Proof:

  • Time: (O(N^2 L)) where (N = |\text{wordList}|+1). Pairwise comparisons: (\binom{N}{2} \cdot L \approx \frac{N^2}{2} L). For (N=5001, L=10), ~125M operations. BFS adds (O(N + E)) but dominated.
  • Space: (O(N^2)) worst-case (complete graph), but actual graph is sparse; each word has at most (25L) neighbors.

Visualization:

Graph for ["hit","hot","dot","dog","lot","log","cog"]:

hit ── hot
hot ── dot, lot
dot ── hot, dog, lot
dog ── dot, log, cog
lot ── hot, dot, log
log ── dog, lot, cog
cog ── dog, log

BFS Levels:
Level1: hit (dist=1)
Level2: hot (dist=2)
Level3: dot, lot (dist=3)
Level4: dog, log (dist=4)
Level5: cog (dist=5)

Approach 2: BFS with Implicit Neighbor Generation (Optimal)

Idea: Avoid explicit graph; for each word, generate all one‑letter variants and check membership in a hash set.
Pseudocode:

def ladderLength(beginWord, endWord, wordList):
    wordSet = set(wordList)
    if endWord not in wordSet: return 0
    q = deque([(beginWord, 1)])
    visited = set([beginWord])
    while q:
        w, d = q.popleft()
        if w == endWord: return d
        for i in range(len(w)):
            for c in 'abcdefghijklmnopqrstuvwxyz':
                if c == w[i]: continue
                nw = w[:i] + c + w[i+1:]
                if nw in wordSet and nw not in visited:
                    visited.add(nw)
                    q.append((nw, d+1))
    return 0

Complexity Proof:

  • Time: Each word is processed once. For each of (N) words, generate (L \cdot 26) neighbors, each lookup (O(1)). Total (O(N \cdot L \cdot 26)). With (N \le 5000, L \le 10), ≤ 1.3M operations.
  • Space: (O(N)) for visited and wordSet.

Visualization:

BFS frontier expansion:
Initial: [hit]
Generate neighbors of hit: *it → (ait, bit, ..., zit) → only "hot" in set.
Level1: [hot]
Generate neighbors of hot: h*t → (hat, hbt, ..., hzt) → "hit" (visited), "dot", "lot"
Level2: [dot, lot]
... until reaching "cog".

Approach 3: Bidirectional BFS

Idea: Run two BFS simultaneously from start and end; when they meet, the path length is the sum of distances from both sides minus 1.
Pseudocode:

def ladderLength(beginWord, endWord, wordList):
    wordSet = set(wordList)
    if endWord not in wordSet: return 0
    if beginWord == endWord: return 1
    q_begin, q_end = deque([beginWord]), deque([endWord])
    dist_begin, dist_end = {beginWord: 1}, {endWord: 1}
    def expand(q, dist_this, dist_other):
        for _ in range(len(q)):
            w = q.popleft()
            for i in range(len(w)):
                for c in 'abcdefghijklmnopqrstuvwxyz':
                    if c == w[i]: continue
                    nw = w[:i] + c + w[i+1:]
                    if nw in dist_this: continue
                    if nw in dist_other:
                        return dist_this[w] + dist_other[nw]
                    if nw in wordSet:
                        dist_this[nw] = dist_this[w] + 1
                        q.append(nw)
        return None
    while q_begin and q_end:
        res = expand(q_begin, dist_begin, dist_end)
        if res: return res
        res = expand(q_end, dist_end, dist_begin)
        if res: return res
    return 0

Complexity Proof:

  • Time: Still (O(N \cdot L \cdot 26)) worst-case, but the search space is reduced because frontiers grow from both ends. In practice, often twice as fast as unilateral BFS.
  • Space: (O(N)) for the two distance dictionaries.

Visualization:

Begin side:         End side:
Level1: hit         Level1: cog
Level2: hot         Level2: dog, log
Level3: dot, lot    Level3: from dog → dot (meet!)
Meeting at "dot": dist_begin[dot]=3, dist_end[dog]=2 → total = 3+2 = 5 words.
Path: hit → hot → dot → dog → cog.

4. Code Deep Dive

Line‑by‑Line Annotations (Bidirectional BFS)

from collections import deque

def ladderLength(beginWord: str, endWord: str, wordList: List[str]) -> int:
    # Convert to set for O(1) membership tests
    wordSet = set(wordList)
    # Early exit if endWord unreachable
    if endWord not in wordSet:
        return 0
    # Trivial case (though constraints forbid equality)
    if beginWord == endWord:
        return 1

    # Two queues for simultaneous BFS
    q_begin = deque([beginWord])
    q_end   = deque([endWord])
    # Distance maps: word -> number of words from start/end to this word
    dist_begin = {beginWord: 1}
    dist_end   = {endWord: 1}

    # Helper to expand one level from a given frontier
    def expand(q, dist_this, dist_other):
        # Process all nodes at current level
        for _ in range(len(q)):
            w = q.popleft()
            # Try all one‑letter mutations
            for i in range(len(w)):
                for c in 'abcdefghijklmnopqrstuvwxyz':
                    if c == w[i]:
                        continue          # Skip same letter
                    nw = w[:i] + c + w[i+1:]   # New candidate
                    # If already visited from this side, skip
                    if nw in dist_this:
                        continue
                    # If visited from the other side, we have a meeting point
                    if nw in dist_other:
                        # Total words = distance from this side to nw (dist_this[w]+1)
                        # plus distance from other side to nw (dist_other[nw])
                        # minus 1 because nw is counted twice.
                        return dist_this[w] + dist_other[nw]
                    # Valid dictionary word? add to frontier
                    if nw in wordSet:
                        dist_this[nw] = dist_this[w] + 1
                        q.append(nw)
        return None   # No meeting at this level

    # Alternate expansions until one frontier empties (no path)
    while q_begin and q_end:
        # Expand begin side
        ans = expand(q_begin, dist_begin, dist_end)
        if ans:
            return ans
        # Expand end side
        ans = expand(q_end, dist_end, dist_begin)
        if ans:
            return ans
    return 0   # No connection

Edge Case Handling

  • Example 1: beginWord="hit", endWord="cog", wordList=["hot","dot","dog","lot","log","cog"]
    • wordSet contains "cog", proceed.
    • Bidirectional BFS meets at "dot" (or "log"). Suppose meeting at "dot": dist_begin[dot]=3, dist_end[dog]=2 → return 3+2=5.
  • Example 2: endWord="cog" not in wordList → immediate return 0.
  • Direct connection: beginWord="hit", endWord="hot" with wordList=["hot"].
    • BFS from begin generates "hot", finds it in dist_end (since end side started with "hot"). dist_begin[hit]=1, dist_end[hot]=1 → return 1+1=2.
  • No path: BFS exhausts all reachable words without meeting; while loop ends, return 0.

5. Complexity War Room

Hardware‑Aware Analysis

  • Time: Worst‑case 1.3M operations (5000 words × 10 letters × 26 mutations). On a 3 GHz CPU, that’s ~0.004 seconds in compiled code; Python overhead may raise to ~0.1 s, still well within limits.
  • Memory: wordSet stores ≤5000 strings of length ≤10. Assuming 50 bytes per string (Python overhead), total ~250 KB. Two distance dictionaries and queues add similar overhead, staying under 1 MB. This fits in L3 cache (typical 8‑16 MB), ensuring fast access.

Industry Comparison Table

Approach Time Complexity Space Complexity Readability Interview Viability
Brute Force (explicit graph) (O(N^2 L)) (O(N^2)) 7/10 – straightforward but verbose ❌ Fails large N within time
BFS with neighbor generation (O(N L \cdot 26)) (O(N)) 9/10 – clean and efficient ✅ Highly recommended
Bidirectional BFS (O(N L \cdot 26)) (faster in practice) (O(N)) 8/10 – more code but optimal ✅ Excellent to demonstrate

6. Pro Mode Extras

Variants Section

  1. Word Ladder II (LeetCode 126) – Return all shortest transformation sequences.
    • Solution: Perform BFS to build a DAG of predecessors (record parent links), then use DFS backtracking from endWord to reconstruct all paths of minimal length.
  2. Minimum Genetic Mutation (LeetCode 433) – Same concept with a gene string (A, C, G, T) and a list of allowed mutations.
  3. Weighted Word Ladder – Each letter change has a cost (e.g., typing distance). Use Dijkstra’s algorithm instead of BFS.
  4. Word Ladder with Wildcards – Preprocess wordList into a map: pattern → list of words. E.g., "h*t" → ["hot", "hit"]. During BFS, generate patterns by replacing each position with '*' and retrieve neighbors from the map. This reduces neighbor generation from (O(L \cdot 26)) to (O(L)) at the cost of preprocessing.

Interview Cheat Sheet

  • First sentence: “This is an unweighted shortest‑path problem, so BFS is appropriate.”
  • Key insight: “Generate neighbors by changing each character to all other 25 letters and checking a hash set.”
  • Optimization: “Bidirectional BFS cuts the search space roughly in half.”
  • Edge cases:
    1. endWord not in wordList → return 0 immediately.
    2. beginWord may equal endWord (if allowed).
    3. Dictionary may contain duplicates? Not per constraints.
  • Complexity: Always state both time ((O(N L \cdot 26))) and space ((O(N))).
  • Practice tip: Code the BFS with neighbor generation first; if time permits, explain bidirectional optimization.