Edit Distance

Difficulty: Medium
Topics: String, Dynamic Programming
Link: https://leetcode.com/problems/edit-distance/

Problem Description

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

Insert a character
Delete a character
Replace a character

Example 1:


Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation: 
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')

Example 2:


Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation: 
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')

Constraints:

0 <= word1.length, word2.length <= 500
word1 and word2 consist of lowercase English letters.

Solution

Edit Distance - Comprehensive Solution

1. Problem Deconstruction

Rewrite the Problem

1.1 Technical Version

Given two strings word1 and word2, compute the Levenshtein distance between them—the minimum number of atomic edit operations (single-character insertions, deletions, or substitutions) required to transform word1 into word2. Formally, find a sequence of edits $E = e_1, e_2, \dots, e_k$ such that applying $E$ to word1 yields word2 and $k$ is minimized.

1.2 Beginner Version

Imagine you have two words, like “cat” and “dog.” You want to change the first word into the second by only doing three types of actions:

Add a letter anywhere.
Remove a letter from anywhere.
Replace a letter with another letter.

Each action costs 1 step. Your goal is to find the smallest number of steps needed to change the first word into the second.

1.3 Mathematical Version

Let $A = a_1 a_2 \dots a_m$ , $B = b_1 b_2 \dots b_n$ be strings over alphabet $\Sigma$ . Define $d(i, j)$ as the edit distance between prefixes $A[1..i]$ and $B[1..j]$ . Then:

d(i, j) = \begin{cases} i & \text{if } j = 0 \\ j & \text{if } i = 0 \\ \min \begin{cases} d(i-1, j) + 1 & \text{(delete)} \\ d(i, j-1) + 1 & \text{(insert)} \\ d(i-1, j-1) + [a_i \neq b_j] & \text{(replace/match)} \end{cases} & \text{otherwise} \end{cases}

where $[a_i \neq b_j]$ is 1 if characters differ, else 0. The goal is $d(m, n)$ .

Constraint Analysis

Constraint	Implication	Hidden Edge Cases
`0 <= word1.length, word2.length <= 500`	- Total possible operations bounded by $O(mn)$ DP. - $O(mn)$ time acceptable (max $250k$ operations). - $O(mn)$ memory (~1MB for integers) acceptable but can be optimized.	- Empty strings: distance equals length of non‑empty string. - One string length 0 → distance is length of other.
Lowercase English letters only	- Alphabet size constant (26). - No Unicode or case‑sensitivity concerns.	- All characters from small set; no special handling needed.
No constraints on character frequency	- Characters may repeat arbitrarily. - No guarantee of common subsequences.	- Worst‑case: completely different strings → distance = max(m,n).

Why Constraints Matter:

Upper bound 500 means $O(mn)$ DP is fine (500×500 = 250,000 operations).
No need for fancier algorithms (e.g., Ukkonen’s $O(nd)$ ), though they could be mentioned.
Memory optimization possible: $O(\min(m, n))$ space by storing only two rows.

2. Intuition Scaffolding

Generate 3 Analogies

Real‑World Metaphor – Text Editing
Imagine you’re typing a document and need to match a reference text. You can:
- Delete a typo (cost 1).
- Insert a missing word (cost 1).
- Replace a word with a correct one (cost 1).
  The edit distance is the minimal keystrokes to transform your draft into the reference.
Gaming Analogy – Spell Crafting
In a fantasy game, you have a rune sequence word1 and need to produce sequence word2 using a magic wand that can:
- Remove a rune (Delete).
- Draw a new rune (Insert).
- Alter a rune’s shape (Replace).
  Each action consumes 1 mana; you want the minimal mana cost.
Math Analogy – Pathfinding in a Grid
Imagine an $(m+1) \times (n+1)$ grid. Start at $(0,0)$ , end at $(m,n)$ . Moving right (insert) or down (delete) costs 1. Moving diagonally costs 0 if characters match (free), else 1 (replace). The edit distance is the cheapest path cost from top‑left to bottom‑right.

Common Pitfalls Section

Greedy on First Mismatch
- Wrong: At first mismatch, choose the cheapest operation locally (e.g., replace if possible).
- Why fails: Local optimal ≠ global optimal. Example: "ab" → "ba". Greedy might replace a→b, then b→a (cost 2), but optimal is delete a, insert a at end (cost 2? Actually, replace both is 2; but better: delete a, insert a? Let’s check: ab → b (delete a) → ba (insert a) cost 2. Same cost. But consider horse → ros; greedy might fail.)
Only Using Longest Common Subsequence (LCS)
- Wrong: Distance = m + n - 2*LCS_length.
- Why fails: That formula counts only inserts/deletes, not replacements. Replace can be cheaper: "ab" → "bc". LCS length 1 → distance = 2+2-2=2, but actual distance is 2? Wait: ab → bc: replace a→b, replace b→c = 2. Matches. But for "abc" → "acb", LCS length 2 → distance = 3+3-4=2, actual: delete b, insert b = 2. Works? Actually, edit distance: abc → ac (delete b) → acb (insert b) = 2. So maybe LCS works? No: LCS formula gives same as insert/delete only. Replace can be simulated by delete+insert (cost 2) or replace (cost 1). So LCS underestimates when replace is better. Example: "a" → "b". LCS=0 → distance=2, but optimal is 1 (replace). So LCS formula gives upper bound, not exact.
Always Prefer Replace Over Delete+Insert
- Wrong: If characters differ, always replace because it’s one operation vs two.
- Why fails: Replace might block better alignment later. Example: "abc" → "ab". Replace c with nothing? Not allowed; you must delete c. So delete is necessary.
Ignoring Empty String Cases
- Wrong: Start loops from 1 without handling length‑0 strings.
- Why fails: If one string is empty, distance is length of other. Missing base case leads to index errors.
Using Global Min‑Cost for Each Character
- Wrong: For each char in word1, map to cheapest matching char in word2.
- Why fails: Order matters; matching must preserve sequence. Example: "kitten" → "sitting". Mapping t→t (good), but e→? might mismatch order.

3. Approach Encyclopedia

3.1 Brute Force (Recursive Exploration)

What: Enumerate all possible sequences of edits via recursion.
Why: Foundation for understanding; exponential complexity shows need for DP.
How:

function dist(i, j):
    if i == 0: return j  // insert j chars
    if j == 0: return i  // delete i chars
    if word1[i-1] == word2[j-1]:
        return dist(i-1, j-1)  // match
    else:
        return 1 + min(
            dist(i-1, j),    // delete from word1
            dist(i, j-1),    // insert into word1
            dist(i-1, j-1)   // replace
        )

Call dist(len(word1), len(word2)).

Complexity Proof:

Branching factor 3 (except when match, reduces to 1).
Worst‑case: no matches, each call branches 3 ways.
Recurrence: $T(i,j) = 3 \cdot T(i-1,j-1) + O(1)$ (simplified). Depth $\min(i,j)$ , so $O(3^{\min(m,n)})$ time, $O(\min(m,n))$ space (call stack).

Visualization:

Recursion tree for "ab" → "ba":

dist(2,2) → min( dist(1,2), dist(2,1), dist(1,1) )  
Each expands further until base cases (i=0 or j=0).

3.2 Recursive with Memoization (Top‑Down DP)

What: Cache results of dist(i,j) in a 2D array to avoid recomputation.
Why: Reduces exponential time to $O(mn)$ , same recurrence as brute force but with memo.
How:

memo = 2D array initialized to -1
function dist(i, j):
    if memo[i][j] != -1: return memo[i][j]
    // base cases as before
    compute result, store in memo[i][j], return

Complexity Proof:

Number of distinct states: $(m+1) \times (n+1)$ .
Each state computed once in $O(1)$ (after recursive calls).
Time: $O(mn)$ . Space: $O(mn)$ for memo + $O(\min(m,n))$ recursion stack.

Visualization:

Memo table fill order (example for m=3, n=3):
(0,0) (0,1) (0,2) (0,3)
(1,0) (1,1) (1,2) (1,3)
... filled via recursion dependencies (top‑down).

3.3 Bottom‑Up Dynamic Programming

What: Iteratively fill a DP table dp[i][j] from base cases upwards.
Why: Avoids recursion overhead; natural tabulation.
How:

Create dp of size (m+1) x (n+1).
Initialize first row/column: dp[i][0] = i, dp[0][j] = j.
For i from 1 to m, j from 1 to n:
- If word1[i-1] == word2[j-1]: dp[i][j] = dp[i-1][j-1].
- Else: dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]).

Complexity Proof:

Nested loops: $m$ iterations × $n$ iterations × $O(1)$ work → $O(mn)$ time.
Space: $O(mn)$ for the table.

Visualization:

Example: "horse" (m=5), "ros" (n=3)

dp table (5+1)x(3+1):

   Ø  r  o  s
Ø  0  1  2  3
h  1  1  2  3
o  2  2  1  2
r  3  2  2  2
s  4  3  3  2
e  5  4  4  3

Answer: dp[5][3] = 3.

3.4 Space‑Optimized DP (Two‑Rows)

What: Since dp[i][j] depends only on previous row (i-1) and current row’s previous column, keep only two rows.
Why: Reduces space to $O(\min(m, n))$ (if we store shorter string horizontally).
How:

Use arrays prev (size n+1) and curr (size n+1).
Iterate over rows, update curr using prev, then swap.

Complexity Proof:

Time: $O(mn)$ unchanged.
Space: $O(2 \cdot (n+1)) = O(n)$ . If m < n, swap strings to get $O(\min(m,n))$ .

Visualization:

For each i (row):
prev: [0,1,2,3]  // row i-1
curr: compute from prev and word1[i-1], word2
After compute, prev = curr for next iteration.

4. Code Deep Dive

Optimal Solution (Space‑Optimized DP)

def minDistance(word1: str, word2: str) -> int:
    m, n = len(word1), len(word2)
    # Ensure we use O(min(m,n)) space by making word2 the shorter one
    if m < n:
        word1, word2 = word2, word1
        m, n = n, m
    # Now word2 length = n, word1 length = m, with m >= n
    # prev represents dp[i-1][*], curr represents dp[i][*]
    prev = list(range(n + 1))  # dp[0][j] = j
    for i in range(1, m + 1):
        # curr[0] = i (delete all i chars of word1 prefix)
        curr = [i] + [0] * n
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                # characters match, carry previous diagonal
                curr[j] = prev[j - 1]
            else:
                # 1 + min(delete, insert, replace)
                curr[j] = 1 + min(prev[j],       # delete from word1
                                  curr[j - 1],   # insert into word1
                                  prev[j - 1])   # replace
        prev = curr  # move to next row
    return prev[n]

Line‑by‑Line Annotations

Line	Code	Explanation
1	`def minDistance(...):`	Function signature.
2	`m, n = len(word1), len(word2)`	Store lengths.
3‑7	`if m < n: swap`	Space optimization trick: By ensuring `word2` is shorter, our `prev/curr` arrays have size `n+1` where `n` is the smaller length. Reduces memory when strings differ greatly in length.
9	`prev = list(range(n + 1))`	Initialize `prev` as row 0: `dp[0][j] = j` (insert `j` chars to empty string).
10	`for i in range(1, m + 1):`	Iterate over rows (prefixes of `word1`).
12	`curr = [i] + [0] * n`	`curr[0] = i` (delete all `i` chars). Remaining `n` entries to be computed.
13	`for j in range(1, n + 1):`	Iterate over columns (prefixes of `word2`).
14‑16	`if word1[i-1] == word2[j-1]: curr[j] = prev[j-1]`	Match case: No extra cost, take diagonal value.
17‑20	`else: curr[j] = 1 + min(prev[j], curr[j-1], prev[j-1])`	Mismatch: Choose cheapest among: - `prev[j]` = delete from `word1` (go down in full table). - `curr[j-1]` = insert into `word1` (go left). - `prev[j-1]` = replace (diagonal).
21	`prev = curr`	Row becomes previous for next iteration.
22	`return prev[n]`	After last row, `prev` holds `dp[m][*]`; answer at `prev[n]`.

Edge Case Handling

Example	Code Logic	Outcome
`word1 = ""`, `word2 = "abc"`	`m=0, n=3` → `prev = [0,1,2,3]`, loop `for i in 1..0` not executed, return `prev[3] = 3`.	Correct: insert 3 chars.
`word1 = "horse"`, `word2 = "ros"`	As in visualization, final `prev[n] = 3`.	Matches example output.
`word1 = "abc", word2 = "abc"`	All matches → `curr[j] = prev[j-1]`, diagonals propagate 0 cost. Final `prev[n] = 0`.	Correct: identical strings.
Large strings (length 500 each)	Loops 500×500 = 250k iterations, memory ~1000 integers (8KB).	Well within limits.

5. Complexity War Room

Hardware‑Aware Analysis

Time: $O(mn) = 500 \times 500 = 250,000$ operations. On modern CPU (≈3GHz), each iteration is few instructions → ~0.1 ms.
Space (optimized): Two arrays of length n+1 ≤ 501 → 1002 integers. At 4 bytes each → ~4KB, fits in L1 cache (32KB).
Cache Behavior: Sequential access in inner loop (j) → excellent spatial locality. Prefetching works well.

Industry Comparison Table

Approach	Time	Space	Readability	Interview Viability
Brute Force Recursion	$O(3^{\min(m,n)})$	$O(\min(m,n))$ stack	8/10 (clear logic)	❌ Fails for m,n > 10
Top‑Down DP (Memo)	$O(mn)$	$O(mn)$ memo + stack	9/10 (easy recursive)	✅ Acceptable, but may exceed stack for large m,n
Bottom‑Up DP (Full Table)	$O(mn)$	$O(mn)$	10/10 (standard)	✅ Recommended for clarity
Space‑Optimized DP	$O(mn)$	$O(\min(m,n))$	8/10 (slightly tricky)	✅ Optimal choice for interviews

6. Pro Mode Extras

Variants Section

Edit Distance with Different Operation Costs
- Assign costs c_ins, c_del, c_rep. Update DP:
  dp[i][j] = min(dp[i-1][j] + c_del, dp[i][j-1] + c_ins, dp[i-1][j-1] + (0 if match else c_rep))
Damerau‑Levenshtein Distance
- Adds transposition (swap adjacent chars) as operation. DP needs to check word1[i-2]==word2[j-1] and word1[i-1]==word2[j-2] for transposition cost.
Edit Distance with Only Insert/Delete (No Replace)
- Equivalent to length difference plus modifications? Actually, distance = m + n - 2*LCS (since replace not allowed, mismatches must be deleted and re‑inserted).
Multi‑String Edit Distance
- NP‑hard for >2 strings; heuristics or dynamic programming on higher‑dimension table.
Approximate String Matching (Search in text)
- Find substring of text with minimal edit distance to pattern. Modify DP: initialize first row to 0, answer is min(dp[m][j]) over j.

Interview Cheat Sheet

Must‑Mention Points:

Define DP state – dp[i][j] = edit distance between prefixes word1[:i] and word2[:j].
Recurrence – Match, Insert, Delete, Replace cases.
Base cases – empty string distances.
Complexity – time $O(mn)$ , space $O(mn)$ or $O(\min(m,n))$ .
Optimization – two‑row DP to save space.

Common Follow‑Ups:

“How to reconstruct the edit sequence?” – Store backpointers during DP.
“What if strings are extremely long?” – Use Hirschberg’s algorithm (divide‑and‑conquer) for $O(mn)$ time, $O(\min(m,n))$ space, or Myers’ bit‑parallel algorithm for small alphabets.
“How to handle Unicode?” – Same logic, but character comparison may need normalization.

Red Flags to Avoid:

Not handling empty strings.
Confusing insert vs delete direction.
Claiming greedy works.

#72 - Edit Distance

Edit Distance

Problem Description

Solution

Edit Distance - Comprehensive Solution

1. Problem Deconstruction

Rewrite the Problem

1.1 Technical Version

1.2 Beginner Version

1.3 Mathematical Version

Constraint Analysis

2. Intuition Scaffolding

Generate 3 Analogies

Common Pitfalls Section

3. Approach Encyclopedia

3.1 Brute Force (Recursive Exploration)

3.2 Recursive with Memoization (Top‑Down DP)

3.3 Bottom‑Up Dynamic Programming

3.4 Space‑Optimized DP (Two‑Rows)

4. Code Deep Dive

Optimal Solution (Space‑Optimized DP)

Line‑by‑Line Annotations

Edge Case Handling

5. Complexity War Room

Hardware‑Aware Analysis

Industry Comparison Table

6. Pro Mode Extras

Variants Section

Interview Cheat Sheet