#127 - Word Ladder
Word Ladder
- Difficulty: Hard
- Topics: Hash Table, String, Breadth-First Search
- Link: https://leetcode.com/problems/word-ladder/
Problem Description
A transformation sequence from word beginWord to word endWord using a dictionary wordList is a sequence of words beginWord -> s1 -> s2 -> ... -> sk such that:
- Every adjacent pair of words differs by a single letter.
- Every
sifor1 <= i <= kis inwordList. Note thatbeginWorddoes not need to be inwordList. sk == endWord
Given two words, beginWord and endWord, and a dictionary wordList, return the number of words in the shortest transformation sequence from beginWord to endWord, or 0 if no such sequence exists.
Example 1:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log","cog"]
Output: 5
Explanation: One shortest transformation sequence is "hit" -> "hot" -> "dot" -> "dog" -> cog", which is 5 words long.
Example 2:
Input: beginWord = "hit", endWord = "cog", wordList = ["hot","dot","dog","lot","log"]
Output: 0
Explanation: The endWord "cog" is not in wordList, therefore there is no valid transformation sequence.
Constraints:
1 <= beginWord.length <= 10endWord.length == beginWord.length1 <= wordList.length <= 5000wordList[i].length == beginWord.lengthbeginWord,endWord, andwordList[i]consist of lowercase English letters.beginWord != endWord- All the words in
wordListare unique.
Solution
1. Problem Deconstruction (500+ words)
Rewrite the Problem
1. Technical Version
Given strings beginWord and endWord (both length (L)) and a list wordList of unique strings each of length (L), define a graph (G = (V, E)) where:
- (V = { \text{beginWord} } \cup \text{wordList})
- An undirected edge ((u, v) \in E) exists iff the Hamming distance (d(u,v) = 1) (i.e., (u) and (v) differ in exactly one character).
A transformation sequence is a path in (G) from beginWord to endWord. The goal is to find the length (number of vertices) of the shortest such path. If endWord ∉ wordList or no path exists, return 0.
2. Beginner Version
You start with a word (e.g., “hit”) and want to reach another word (e.g., “cog”) by changing one letter at a time. Every time you change a letter, the new word must be in a given dictionary. You want to do this in as few steps as possible, counting the starting and ending words. If it’s impossible (the target word isn’t in the dictionary, or you can’t get there), return 0.
3. Mathematical Version
Let (\Sigma = {a,…,z}), (L) be word length, and (W \subseteq \Sigma^L) be the set of valid words (wordList). For (u, v \in \Sigma^L), define adjacency as (u \sim v \iff d_H(u,v)=1). Let (V = {\text{beginWord}} \cup W). We seek:
[
\min{ k \in \mathbb{N} : \exists v_0,\dots,v_k \in V \text{ with } v_0=\text{beginWord}, v_k=\text{endWord}, v_i \sim v_{i+1} \text{ for } 0\le i<k }
]
If the set is empty, return 0. Otherwise, return (k+1) (number of vertices).
Constraint Analysis
- (1 \le \text{beginWord.length} \le 10): Small (L) permits (O(L \cdot 26)) neighbor generation per word without overhead.
- (\text{endWord.length} = \text{beginWord.length}): Uniform length ensures Hamming distance is defined; no need to handle variable-length words.
- (1 \le \text{wordList.length} \le 5000): Moderate size; (O(N^2)) explicit graph construction ((N \approx 5000)) yields ~25M pair comparisons, which may be borderline in Python but manageable in C++. Optimal BFS with neighbor generation scales as (O(N \cdot L \cdot 26) \approx 1.3M) operations, which is safe.
- All words consist of lowercase English letters: Alphabet size 26 is fixed, simplifying neighbor generation.
- (\text{beginWord} \ne \text{endWord}): Guarantees non-trivial search; otherwise answer is 1 (but given constraint, we ignore).
- All words in
wordListare unique: Enables (O(1)) set membership checks.
Hidden Edge Cases
endWordabsent fromwordList→ immediate return 0.beginWordmay already be inwordList; BFS must handle duplicates gracefully.- Words of length 1: changing the single character yields 25 neighbors; BFS still works.
- Dictionary may contain unreachable words; BFS will ignore them.
- Direct connection: if
beginWordandendWorddiffer by one letter andendWord∈wordList, answer is 2. - Empty
wordList→ return 0 unlessbeginWord == endWord(impossible per constraints).
2. Intuition Scaffolding
Generate 3 Analogies
- Real-World Metaphor: Imagine a network of airports (words). You can fly from one airport to another if their names differ by one letter. You want the fewest flights from your starting airport to the destination, only stopping at airports that are on the approved list (dictionary).
- Gaming Analogy: In a word‑morph puzzle game, you have to transform a starter word into a target word by swapping one tile (letter) at a time, using only valid dictionary words. Each move is a “step,” and you want the minimum steps to beat the level.
- Math Analogy: This is a shortest‑path problem in a subgraph of the Hamming graph (H(L,26)), where vertices are strings in (\Sigma^L) and edges connect strings at Hamming distance 1. The given dictionary selects an induced subgraph; we need the geodesic distance between two vertices.
Common Pitfalls Section
- Using DFS instead of BFS: DFS does not guarantee the shortest path in an unweighted graph.
- Forgetting to check
endWord ∈ wordList: Missing this early exit leads to wasted computation. - Building the graph explicitly by all‑pairs comparison: (O(N^2 L)) is inefficient for (N=5000).
- No visited set: Causes cycles and exponential blow‑up.
- Incorrect sequence length: Counting edges (transformations) instead of vertices (words). The answer must include both
beginWordandendWord. - Assuming
beginWord ∈ wordList: It may not be, but must still be included as a graph node. - Inefficient neighbor generation: Checking all dictionary words for Hamming distance 1 per node is (O(N L)) per word; better to generate all one‑letter variants ((O(L \cdot 26))) and test set membership.
3. Approach Encyclopedia
Approach 1: Brute Force BFS with Explicit Graph
Idea: Construct adjacency lists by comparing every pair of words, then run BFS.
Pseudocode:
def differ_by_one(a, b):
count = 0
for i in range(len(a)):
if a[i] != b[i]:
count += 1
if count > 1: return False
return count == 1
def ladderLength(beginWord, endWord, wordList):
if endWord not in wordList: return 0
words = [beginWord] + wordList
adj = defaultdict(list)
n = len(words)
for i in range(n):
for j in range(i+1, n):
if differ_by_one(words[i], words[j]):
adj[words[i]].append(words[j])
adj[words[j]].append(words[i])
# BFS
q = deque([(beginWord, 1)])
visited = set()
while q:
w, d = q.popleft()
if w == endWord: return d
visited.add(w)
for nb in adj[w]:
if nb not in visited:
q.append((nb, d+1))
return 0
Complexity Proof:
- Time: (O(N^2 L)) where (N = |\text{wordList}|+1). Pairwise comparisons: (\binom{N}{2} \cdot L \approx \frac{N^2}{2} L). For (N=5001, L=10), ~125M operations. BFS adds (O(N + E)) but dominated.
- Space: (O(N^2)) worst-case (complete graph), but actual graph is sparse; each word has at most (25L) neighbors.
Visualization:
Graph for ["hit","hot","dot","dog","lot","log","cog"]:
hit ── hot
hot ── dot, lot
dot ── hot, dog, lot
dog ── dot, log, cog
lot ── hot, dot, log
log ── dog, lot, cog
cog ── dog, log
BFS Levels:
Level1: hit (dist=1)
Level2: hot (dist=2)
Level3: dot, lot (dist=3)
Level4: dog, log (dist=4)
Level5: cog (dist=5)
Approach 2: BFS with Implicit Neighbor Generation (Optimal)
Idea: Avoid explicit graph; for each word, generate all one‑letter variants and check membership in a hash set.
Pseudocode:
def ladderLength(beginWord, endWord, wordList):
wordSet = set(wordList)
if endWord not in wordSet: return 0
q = deque([(beginWord, 1)])
visited = set([beginWord])
while q:
w, d = q.popleft()
if w == endWord: return d
for i in range(len(w)):
for c in 'abcdefghijklmnopqrstuvwxyz':
if c == w[i]: continue
nw = w[:i] + c + w[i+1:]
if nw in wordSet and nw not in visited:
visited.add(nw)
q.append((nw, d+1))
return 0
Complexity Proof:
- Time: Each word is processed once. For each of (N) words, generate (L \cdot 26) neighbors, each lookup (O(1)). Total (O(N \cdot L \cdot 26)). With (N \le 5000, L \le 10), ≤ 1.3M operations.
- Space: (O(N)) for
visitedandwordSet.
Visualization:
BFS frontier expansion:
Initial: [hit]
Generate neighbors of hit: *it → (ait, bit, ..., zit) → only "hot" in set.
Level1: [hot]
Generate neighbors of hot: h*t → (hat, hbt, ..., hzt) → "hit" (visited), "dot", "lot"
Level2: [dot, lot]
... until reaching "cog".
Approach 3: Bidirectional BFS
Idea: Run two BFS simultaneously from start and end; when they meet, the path length is the sum of distances from both sides minus 1.
Pseudocode:
def ladderLength(beginWord, endWord, wordList):
wordSet = set(wordList)
if endWord not in wordSet: return 0
if beginWord == endWord: return 1
q_begin, q_end = deque([beginWord]), deque([endWord])
dist_begin, dist_end = {beginWord: 1}, {endWord: 1}
def expand(q, dist_this, dist_other):
for _ in range(len(q)):
w = q.popleft()
for i in range(len(w)):
for c in 'abcdefghijklmnopqrstuvwxyz':
if c == w[i]: continue
nw = w[:i] + c + w[i+1:]
if nw in dist_this: continue
if nw in dist_other:
return dist_this[w] + dist_other[nw]
if nw in wordSet:
dist_this[nw] = dist_this[w] + 1
q.append(nw)
return None
while q_begin and q_end:
res = expand(q_begin, dist_begin, dist_end)
if res: return res
res = expand(q_end, dist_end, dist_begin)
if res: return res
return 0
Complexity Proof:
- Time: Still (O(N \cdot L \cdot 26)) worst-case, but the search space is reduced because frontiers grow from both ends. In practice, often twice as fast as unilateral BFS.
- Space: (O(N)) for the two distance dictionaries.
Visualization:
Begin side: End side:
Level1: hit Level1: cog
Level2: hot Level2: dog, log
Level3: dot, lot Level3: from dog → dot (meet!)
Meeting at "dot": dist_begin[dot]=3, dist_end[dog]=2 → total = 3+2 = 5 words.
Path: hit → hot → dot → dog → cog.
4. Code Deep Dive
Line‑by‑Line Annotations (Bidirectional BFS)
from collections import deque
def ladderLength(beginWord: str, endWord: str, wordList: List[str]) -> int:
# Convert to set for O(1) membership tests
wordSet = set(wordList)
# Early exit if endWord unreachable
if endWord not in wordSet:
return 0
# Trivial case (though constraints forbid equality)
if beginWord == endWord:
return 1
# Two queues for simultaneous BFS
q_begin = deque([beginWord])
q_end = deque([endWord])
# Distance maps: word -> number of words from start/end to this word
dist_begin = {beginWord: 1}
dist_end = {endWord: 1}
# Helper to expand one level from a given frontier
def expand(q, dist_this, dist_other):
# Process all nodes at current level
for _ in range(len(q)):
w = q.popleft()
# Try all one‑letter mutations
for i in range(len(w)):
for c in 'abcdefghijklmnopqrstuvwxyz':
if c == w[i]:
continue # Skip same letter
nw = w[:i] + c + w[i+1:] # New candidate
# If already visited from this side, skip
if nw in dist_this:
continue
# If visited from the other side, we have a meeting point
if nw in dist_other:
# Total words = distance from this side to nw (dist_this[w]+1)
# plus distance from other side to nw (dist_other[nw])
# minus 1 because nw is counted twice.
return dist_this[w] + dist_other[nw]
# Valid dictionary word? add to frontier
if nw in wordSet:
dist_this[nw] = dist_this[w] + 1
q.append(nw)
return None # No meeting at this level
# Alternate expansions until one frontier empties (no path)
while q_begin and q_end:
# Expand begin side
ans = expand(q_begin, dist_begin, dist_end)
if ans:
return ans
# Expand end side
ans = expand(q_end, dist_end, dist_begin)
if ans:
return ans
return 0 # No connection
Edge Case Handling
- Example 1:
beginWord="hit", endWord="cog", wordList=["hot","dot","dog","lot","log","cog"]wordSetcontains"cog", proceed.- Bidirectional BFS meets at
"dot"(or"log"). Suppose meeting at"dot":dist_begin[dot]=3,dist_end[dog]=2→ return3+2=5.
- Example 2:
endWord="cog"not inwordList→ immediatereturn 0. - Direct connection:
beginWord="hit", endWord="hot"withwordList=["hot"].- BFS from begin generates
"hot", finds it indist_end(since end side started with"hot").dist_begin[hit]=1,dist_end[hot]=1→ return1+1=2.
- BFS from begin generates
- No path: BFS exhausts all reachable words without meeting;
whileloop ends, return0.
5. Complexity War Room
Hardware‑Aware Analysis
- Time: Worst‑case 1.3M operations (5000 words × 10 letters × 26 mutations). On a 3 GHz CPU, that’s ~0.004 seconds in compiled code; Python overhead may raise to ~0.1 s, still well within limits.
- Memory:
wordSetstores ≤5000 strings of length ≤10. Assuming 50 bytes per string (Python overhead), total ~250 KB. Two distance dictionaries and queues add similar overhead, staying under 1 MB. This fits in L3 cache (typical 8‑16 MB), ensuring fast access.
Industry Comparison Table
| Approach | Time Complexity | Space Complexity | Readability | Interview Viability |
|---|---|---|---|---|
| Brute Force (explicit graph) | (O(N^2 L)) | (O(N^2)) | 7/10 – straightforward but verbose | ❌ Fails large N within time |
| BFS with neighbor generation | (O(N L \cdot 26)) | (O(N)) | 9/10 – clean and efficient | ✅ Highly recommended |
| Bidirectional BFS | (O(N L \cdot 26)) (faster in practice) | (O(N)) | 8/10 – more code but optimal | ✅ Excellent to demonstrate |
6. Pro Mode Extras
Variants Section
- Word Ladder II (LeetCode 126) – Return all shortest transformation sequences.
- Solution: Perform BFS to build a DAG of predecessors (record parent links), then use DFS backtracking from
endWordto reconstruct all paths of minimal length.
- Solution: Perform BFS to build a DAG of predecessors (record parent links), then use DFS backtracking from
- Minimum Genetic Mutation (LeetCode 433) – Same concept with a gene string (A, C, G, T) and a list of allowed mutations.
- Weighted Word Ladder – Each letter change has a cost (e.g., typing distance). Use Dijkstra’s algorithm instead of BFS.
- Word Ladder with Wildcards – Preprocess
wordListinto a map:pattern → list of words. E.g.,"h*t" → ["hot", "hit"]. During BFS, generate patterns by replacing each position with'*'and retrieve neighbors from the map. This reduces neighbor generation from (O(L \cdot 26)) to (O(L)) at the cost of preprocessing.
Interview Cheat Sheet
- First sentence: “This is an unweighted shortest‑path problem, so BFS is appropriate.”
- Key insight: “Generate neighbors by changing each character to all other 25 letters and checking a hash set.”
- Optimization: “Bidirectional BFS cuts the search space roughly in half.”
- Edge cases:
endWordnot inwordList→ return 0 immediately.beginWordmay equalendWord(if allowed).- Dictionary may contain duplicates? Not per constraints.
- Complexity: Always state both time ((O(N L \cdot 26))) and space ((O(N))).
- Practice tip: Code the BFS with neighbor generation first; if time permits, explain bidirectional optimization.