#72 - Edit Distance
Edit Distance
- Difficulty: Medium
- Topics: String, Dynamic Programming
- Link: https://leetcode.com/problems/edit-distance/
Problem Description
Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.
You have the following three operations permitted on a word:
- Insert a character
- Delete a character
- Replace a character
Example 1:
Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation:
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')
Example 2:
Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation:
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')
Constraints:
0 <= word1.length, word2.length <= 500word1andword2consist of lowercase English letters.
Solution
Edit Distance - Comprehensive Solution
1. Problem Deconstruction
Rewrite the Problem
1.1 Technical Version
Given two strings word1 and word2, compute the Levenshtein distance between them—the minimum number of atomic edit operations (single-character insertions, deletions, or substitutions) required to transform word1 into word2. Formally, find a sequence of edits such that applying to word1 yields word2 and is minimized.
1.2 Beginner Version
Imagine you have two words, like “cat” and “dog.” You want to change the first word into the second by only doing three types of actions:
- Add a letter anywhere.
- Remove a letter from anywhere.
- Replace a letter with another letter.
Each action costs 1 step. Your goal is to find the smallest number of steps needed to change the first word into the second.
1.3 Mathematical Version
Let , be strings over alphabet . Define as the edit distance between prefixes and . Then:
where is 1 if characters differ, else 0. The goal is .
Constraint Analysis
| Constraint | Implication | Hidden Edge Cases |
|---|---|---|
0 <= word1.length, word2.length <= 500 |
- Total possible operations bounded by DP. - time acceptable (max operations). - memory (~1MB for integers) acceptable but can be optimized. |
- Empty strings: distance equals length of non‑empty string. - One string length 0 → distance is length of other. |
| Lowercase English letters only | - Alphabet size constant (26). - No Unicode or case‑sensitivity concerns. |
- All characters from small set; no special handling needed. |
| No constraints on character frequency | - Characters may repeat arbitrarily. - No guarantee of common subsequences. |
- Worst‑case: completely different strings → distance = max(m,n). |
Why Constraints Matter:
- Upper bound 500 means DP is fine (500×500 = 250,000 operations).
- No need for fancier algorithms (e.g., Ukkonen’s ), though they could be mentioned.
- Memory optimization possible: space by storing only two rows.
2. Intuition Scaffolding
Generate 3 Analogies
-
Real‑World Metaphor – Text Editing
Imagine you’re typing a document and need to match a reference text. You can:- Delete a typo (cost 1).
- Insert a missing word (cost 1).
- Replace a word with a correct one (cost 1).
The edit distance is the minimal keystrokes to transform your draft into the reference.
-
Gaming Analogy – Spell Crafting
In a fantasy game, you have a rune sequenceword1and need to produce sequenceword2using a magic wand that can:- Remove a rune (Delete).
- Draw a new rune (Insert).
- Alter a rune’s shape (Replace).
Each action consumes 1 mana; you want the minimal mana cost.
-
Math Analogy – Pathfinding in a Grid
Imagine an grid. Start at , end at . Moving right (insert) or down (delete) costs 1. Moving diagonally costs 0 if characters match (free), else 1 (replace). The edit distance is the cheapest path cost from top‑left to bottom‑right.
Common Pitfalls Section
-
Greedy on First Mismatch
- Wrong: At first mismatch, choose the cheapest operation locally (e.g., replace if possible).
- Why fails: Local optimal ≠ global optimal. Example:
"ab"→"ba". Greedy might replacea→b, thenb→a(cost 2), but optimal is deletea, insertaat end (cost 2? Actually, replace both is 2; but better: deletea, inserta? Let’s check:ab→b(delete a) →ba(insert a) cost 2. Same cost. But considerhorse→ros; greedy might fail.)
-
Only Using Longest Common Subsequence (LCS)
- Wrong: Distance = m + n - 2*LCS_length.
- Why fails: That formula counts only inserts/deletes, not replacements. Replace can be cheaper:
"ab"→"bc". LCS length 1 → distance = 2+2-2=2, but actual distance is 2? Wait:ab→bc: replace a→b, replace b→c = 2. Matches. But for"abc"→"acb", LCS length 2 → distance = 3+3-4=2, actual: delete b, insert b = 2. Works? Actually, edit distance:abc→ac(delete b) →acb(insert b) = 2. So maybe LCS works? No: LCS formula gives same as insert/delete only. Replace can be simulated by delete+insert (cost 2) or replace (cost 1). So LCS underestimates when replace is better. Example:"a"→"b". LCS=0 → distance=2, but optimal is 1 (replace). So LCS formula gives upper bound, not exact.
-
Always Prefer Replace Over Delete+Insert
- Wrong: If characters differ, always replace because it’s one operation vs two.
- Why fails: Replace might block better alignment later. Example:
"abc"→"ab". Replacecwith nothing? Not allowed; you must deletec. So delete is necessary.
-
Ignoring Empty String Cases
- Wrong: Start loops from 1 without handling length‑0 strings.
- Why fails: If one string is empty, distance is length of other. Missing base case leads to index errors.
-
Using Global Min‑Cost for Each Character
- Wrong: For each char in
word1, map to cheapest matching char inword2. - Why fails: Order matters; matching must preserve sequence. Example:
"kitten"→"sitting". Mappingt→t(good), bute→?might mismatch order.
- Wrong: For each char in
3. Approach Encyclopedia
3.1 Brute Force (Recursive Exploration)
What: Enumerate all possible sequences of edits via recursion.
Why: Foundation for understanding; exponential complexity shows need for DP.
How:
function dist(i, j):
if i == 0: return j // insert j chars
if j == 0: return i // delete i chars
if word1[i-1] == word2[j-1]:
return dist(i-1, j-1) // match
else:
return 1 + min(
dist(i-1, j), // delete from word1
dist(i, j-1), // insert into word1
dist(i-1, j-1) // replace
)
Call dist(len(word1), len(word2)).
Complexity Proof:
- Branching factor 3 (except when match, reduces to 1).
- Worst‑case: no matches, each call branches 3 ways.
- Recurrence: (simplified). Depth , so time, space (call stack).
Visualization:
Recursion tree for "ab" → "ba":
dist(2,2) → min( dist(1,2), dist(2,1), dist(1,1) )
Each expands further until base cases (i=0 or j=0).
3.2 Recursive with Memoization (Top‑Down DP)
What: Cache results of dist(i,j) in a 2D array to avoid recomputation.
Why: Reduces exponential time to , same recurrence as brute force but with memo.
How:
memo = 2D array initialized to -1
function dist(i, j):
if memo[i][j] != -1: return memo[i][j]
// base cases as before
compute result, store in memo[i][j], return
Complexity Proof:
- Number of distinct states: .
- Each state computed once in (after recursive calls).
- Time: . Space: for memo + recursion stack.
Visualization:
Memo table fill order (example for m=3, n=3):
(0,0) (0,1) (0,2) (0,3)
(1,0) (1,1) (1,2) (1,3)
... filled via recursion dependencies (top‑down).
3.3 Bottom‑Up Dynamic Programming
What: Iteratively fill a DP table dp[i][j] from base cases upwards.
Why: Avoids recursion overhead; natural tabulation.
How:
- Create
dpof size(m+1) x (n+1). - Initialize first row/column:
dp[i][0] = i,dp[0][j] = j. - For
ifrom 1 to m,jfrom 1 to n:- If
word1[i-1] == word2[j-1]:dp[i][j] = dp[i-1][j-1]. - Else:
dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]).
- If
Complexity Proof:
- Nested loops: iterations × iterations × work → time.
- Space: for the table.
Visualization:
Example: "horse" (m=5), "ros" (n=3)
dp table (5+1)x(3+1):
Ø r o s
Ø 0 1 2 3
h 1 1 2 3
o 2 2 1 2
r 3 2 2 2
s 4 3 3 2
e 5 4 4 3
Answer: dp[5][3] = 3.
3.4 Space‑Optimized DP (Two‑Rows)
What: Since dp[i][j] depends only on previous row (i-1) and current row’s previous column, keep only two rows.
Why: Reduces space to (if we store shorter string horizontally).
How:
- Use arrays
prev(size n+1) andcurr(size n+1). - Iterate over rows, update
currusingprev, then swap.
Complexity Proof:
- Time: unchanged.
- Space: . If m < n, swap strings to get .
Visualization:
For each i (row):
prev: [0,1,2,3] // row i-1
curr: compute from prev and word1[i-1], word2
After compute, prev = curr for next iteration.
4. Code Deep Dive
Optimal Solution (Space‑Optimized DP)
def minDistance(word1: str, word2: str) -> int:
m, n = len(word1), len(word2)
# Ensure we use O(min(m,n)) space by making word2 the shorter one
if m < n:
word1, word2 = word2, word1
m, n = n, m
# Now word2 length = n, word1 length = m, with m >= n
# prev represents dp[i-1][*], curr represents dp[i][*]
prev = list(range(n + 1)) # dp[0][j] = j
for i in range(1, m + 1):
# curr[0] = i (delete all i chars of word1 prefix)
curr = [i] + [0] * n
for j in range(1, n + 1):
if word1[i - 1] == word2[j - 1]:
# characters match, carry previous diagonal
curr[j] = prev[j - 1]
else:
# 1 + min(delete, insert, replace)
curr[j] = 1 + min(prev[j], # delete from word1
curr[j - 1], # insert into word1
prev[j - 1]) # replace
prev = curr # move to next row
return prev[n]
Line‑by‑Line Annotations
| Line | Code | Explanation |
|---|---|---|
| 1 | def minDistance(...): |
Function signature. |
| 2 | m, n = len(word1), len(word2) |
Store lengths. |
| 3‑7 | if m < n: swap |
Space optimization trick: By ensuring word2 is shorter, our prev/curr arrays have size n+1 where n is the smaller length. Reduces memory when strings differ greatly in length. |
| 9 | prev = list(range(n + 1)) |
Initialize prev as row 0: dp[0][j] = j (insert j chars to empty string). |
| 10 | for i in range(1, m + 1): |
Iterate over rows (prefixes of word1). |
| 12 | curr = [i] + [0] * n |
curr[0] = i (delete all i chars). Remaining n entries to be computed. |
| 13 | for j in range(1, n + 1): |
Iterate over columns (prefixes of word2). |
| 14‑16 | if word1[i-1] == word2[j-1]: curr[j] = prev[j-1] |
Match case: No extra cost, take diagonal value. |
| 17‑20 | else: curr[j] = 1 + min(prev[j], curr[j-1], prev[j-1]) |
Mismatch: Choose cheapest among: - prev[j] = delete from word1 (go down in full table). - curr[j-1] = insert into word1 (go left). - prev[j-1] = replace (diagonal). |
| 21 | prev = curr |
Row becomes previous for next iteration. |
| 22 | return prev[n] |
After last row, prev holds dp[m][*]; answer at prev[n]. |
Edge Case Handling
| Example | Code Logic | Outcome |
|---|---|---|
word1 = "", word2 = "abc" |
m=0, n=3 → prev = [0,1,2,3], loop for i in 1..0 not executed, return prev[3] = 3. |
Correct: insert 3 chars. |
word1 = "horse", word2 = "ros" |
As in visualization, final prev[n] = 3. |
Matches example output. |
word1 = "abc", word2 = "abc" |
All matches → curr[j] = prev[j-1], diagonals propagate 0 cost. Final prev[n] = 0. |
Correct: identical strings. |
| Large strings (length 500 each) | Loops 500×500 = 250k iterations, memory ~1000 integers (8KB). | Well within limits. |
5. Complexity War Room
Hardware‑Aware Analysis
- Time: operations. On modern CPU (≈3GHz), each iteration is few instructions → ~0.1 ms.
- Space (optimized): Two arrays of length
n+1≤ 501 → 1002 integers. At 4 bytes each → ~4KB, fits in L1 cache (32KB). - Cache Behavior: Sequential access in inner loop (j) → excellent spatial locality. Prefetching works well.
Industry Comparison Table
| Approach | Time | Space | Readability | Interview Viability |
|---|---|---|---|---|
| Brute Force Recursion | stack | 8/10 (clear logic) | ❌ Fails for m,n > 10 | |
| Top‑Down DP (Memo) | memo + stack | 9/10 (easy recursive) | ✅ Acceptable, but may exceed stack for large m,n | |
| Bottom‑Up DP (Full Table) | 10/10 (standard) | ✅ Recommended for clarity | ||
| Space‑Optimized DP | 8/10 (slightly tricky) | ✅ Optimal choice for interviews |
6. Pro Mode Extras
Variants Section
-
Edit Distance with Different Operation Costs
- Assign costs
c_ins,c_del,c_rep. Update DP:
dp[i][j] = min(dp[i-1][j] + c_del, dp[i][j-1] + c_ins, dp[i-1][j-1] + (0 if match else c_rep))
- Assign costs
-
Damerau‑Levenshtein Distance
- Adds transposition (swap adjacent chars) as operation. DP needs to check
word1[i-2]==word2[j-1] and word1[i-1]==word2[j-2]for transposition cost.
- Adds transposition (swap adjacent chars) as operation. DP needs to check
-
Edit Distance with Only Insert/Delete (No Replace)
- Equivalent to length difference plus modifications? Actually, distance =
m + n - 2*LCS(since replace not allowed, mismatches must be deleted and re‑inserted).
- Equivalent to length difference plus modifications? Actually, distance =
-
Multi‑String Edit Distance
- NP‑hard for >2 strings; heuristics or dynamic programming on higher‑dimension table.
-
Approximate String Matching (Search in text)
- Find substring of text with minimal edit distance to pattern. Modify DP: initialize first row to 0, answer is
min(dp[m][j])over j.
- Find substring of text with minimal edit distance to pattern. Modify DP: initialize first row to 0, answer is
Interview Cheat Sheet
Must‑Mention Points:
- Define DP state –
dp[i][j]= edit distance between prefixesword1[:i]andword2[:j]. - Recurrence – Match, Insert, Delete, Replace cases.
- Base cases – empty string distances.
- Complexity – time , space or .
- Optimization – two‑row DP to save space.
Common Follow‑Ups:
- “How to reconstruct the edit sequence?” – Store backpointers during DP.
- “What if strings are extremely long?” – Use Hirschberg’s algorithm (divide‑and‑conquer) for time, space, or Myers’ bit‑parallel algorithm for small alphabets.
- “How to handle Unicode?” – Same logic, but character comparison may need normalization.
Red Flags to Avoid:
- Not handling empty strings.
- Confusing insert vs delete direction.
- Claiming greedy works.