← ~/lc-grind/posts

#72 - Edit Distance

January 16, 2026

Edit Distance

Problem Description

Given two strings word1 and word2, return the minimum number of operations required to convert word1 to word2.

You have the following three operations permitted on a word:

  • Insert a character
  • Delete a character
  • Replace a character

Example 1:


Input: word1 = "horse", word2 = "ros"
Output: 3
Explanation: 
horse -> rorse (replace 'h' with 'r')
rorse -> rose (remove 'r')
rose -> ros (remove 'e')

Example 2:


Input: word1 = "intention", word2 = "execution"
Output: 5
Explanation: 
intention -> inention (remove 't')
inention -> enention (replace 'i' with 'e')
enention -> exention (replace 'n' with 'x')
exention -> exection (replace 'n' with 'c')
exection -> execution (insert 'u')

Constraints:

  • 0 <= word1.length, word2.length <= 500
  • word1 and word2 consist of lowercase English letters.

Solution

Edit Distance - Comprehensive Solution

1. Problem Deconstruction

Rewrite the Problem

1.1 Technical Version

Given two strings word1 and word2, compute the Levenshtein distance between them—the minimum number of atomic edit operations (single-character insertions, deletions, or substitutions) required to transform word1 into word2. Formally, find a sequence of edits E=e1,e2,,ekE = e_1, e_2, \dots, e_k such that applying EE to word1 yields word2 and kk is minimized.

1.2 Beginner Version

Imagine you have two words, like “cat” and “dog.” You want to change the first word into the second by only doing three types of actions:

  1. Add a letter anywhere.
  2. Remove a letter from anywhere.
  3. Replace a letter with another letter.

Each action costs 1 step. Your goal is to find the smallest number of steps needed to change the first word into the second.

1.3 Mathematical Version

Let A=a1a2amA = a_1 a_2 \dots a_m, B=b1b2bnB = b_1 b_2 \dots b_n be strings over alphabet Σ\Sigma. Define d(i,j)d(i, j) as the edit distance between prefixes A[1..i]A[1..i] and B[1..j]B[1..j]. Then:

d(i,j)={iif j=0jif i=0min{d(i1,j)+1(delete)d(i,j1)+1(insert)d(i1,j1)+[aibj](replace/match)otherwised(i, j) = \begin{cases} i & \text{if } j = 0 \\ j & \text{if } i = 0 \\ \min \begin{cases} d(i-1, j) + 1 & \text{(delete)} \\ d(i, j-1) + 1 & \text{(insert)} \\ d(i-1, j-1) + [a_i \neq b_j] & \text{(replace/match)} \end{cases} & \text{otherwise} \end{cases}

where [aibj][a_i \neq b_j] is 1 if characters differ, else 0. The goal is d(m,n)d(m, n).

Constraint Analysis

Constraint Implication Hidden Edge Cases
0 <= word1.length, word2.length <= 500 - Total possible operations bounded by O(mn)O(mn) DP.
- O(mn)O(mn) time acceptable (max 250k250k operations).
- O(mn)O(mn) memory (~1MB for integers) acceptable but can be optimized.
- Empty strings: distance equals length of non‑empty string.
- One string length 0 → distance is length of other.
Lowercase English letters only - Alphabet size constant (26).
- No Unicode or case‑sensitivity concerns.
- All characters from small set; no special handling needed.
No constraints on character frequency - Characters may repeat arbitrarily.
- No guarantee of common subsequences.
- Worst‑case: completely different strings → distance = max(m,n).

Why Constraints Matter:

  • Upper bound 500 means O(mn)O(mn) DP is fine (500×500 = 250,000 operations).
  • No need for fancier algorithms (e.g., Ukkonen’s O(nd)O(nd)), though they could be mentioned.
  • Memory optimization possible: O(min(m,n))O(\min(m, n)) space by storing only two rows.

2. Intuition Scaffolding

Generate 3 Analogies

  1. Real‑World Metaphor – Text Editing
    Imagine you’re typing a document and need to match a reference text. You can:

    • Delete a typo (cost 1).
    • Insert a missing word (cost 1).
    • Replace a word with a correct one (cost 1).
      The edit distance is the minimal keystrokes to transform your draft into the reference.
  2. Gaming Analogy – Spell Crafting
    In a fantasy game, you have a rune sequence word1 and need to produce sequence word2 using a magic wand that can:

    • Remove a rune (Delete).
    • Draw a new rune (Insert).
    • Alter a rune’s shape (Replace).
      Each action consumes 1 mana; you want the minimal mana cost.
  3. Math Analogy – Pathfinding in a Grid
    Imagine an (m+1)×(n+1)(m+1) \times (n+1) grid. Start at (0,0)(0,0), end at (m,n)(m,n). Moving right (insert) or down (delete) costs 1. Moving diagonally costs 0 if characters match (free), else 1 (replace). The edit distance is the cheapest path cost from top‑left to bottom‑right.

Common Pitfalls Section

  1. Greedy on First Mismatch

    • Wrong: At first mismatch, choose the cheapest operation locally (e.g., replace if possible).
    • Why fails: Local optimal ≠ global optimal. Example: "ab""ba". Greedy might replace a→b, then b→a (cost 2), but optimal is delete a, insert a at end (cost 2? Actually, replace both is 2; but better: delete a, insert a? Let’s check: abb (delete a) → ba (insert a) cost 2. Same cost. But consider horseros; greedy might fail.)
  2. Only Using Longest Common Subsequence (LCS)

    • Wrong: Distance = m + n - 2*LCS_length.
    • Why fails: That formula counts only inserts/deletes, not replacements. Replace can be cheaper: "ab""bc". LCS length 1 → distance = 2+2-2=2, but actual distance is 2? Wait: abbc: replace a→b, replace b→c = 2. Matches. But for "abc""acb", LCS length 2 → distance = 3+3-4=2, actual: delete b, insert b = 2. Works? Actually, edit distance: abcac (delete b) → acb (insert b) = 2. So maybe LCS works? No: LCS formula gives same as insert/delete only. Replace can be simulated by delete+insert (cost 2) or replace (cost 1). So LCS underestimates when replace is better. Example: "a""b". LCS=0 → distance=2, but optimal is 1 (replace). So LCS formula gives upper bound, not exact.
  3. Always Prefer Replace Over Delete+Insert

    • Wrong: If characters differ, always replace because it’s one operation vs two.
    • Why fails: Replace might block better alignment later. Example: "abc""ab". Replace c with nothing? Not allowed; you must delete c. So delete is necessary.
  4. Ignoring Empty String Cases

    • Wrong: Start loops from 1 without handling length‑0 strings.
    • Why fails: If one string is empty, distance is length of other. Missing base case leads to index errors.
  5. Using Global Min‑Cost for Each Character

    • Wrong: For each char in word1, map to cheapest matching char in word2.
    • Why fails: Order matters; matching must preserve sequence. Example: "kitten""sitting". Mapping t→t (good), but e→? might mismatch order.

3. Approach Encyclopedia

3.1 Brute Force (Recursive Exploration)

What: Enumerate all possible sequences of edits via recursion.
Why: Foundation for understanding; exponential complexity shows need for DP.
How:

function dist(i, j):
    if i == 0: return j  // insert j chars
    if j == 0: return i  // delete i chars
    if word1[i-1] == word2[j-1]:
        return dist(i-1, j-1)  // match
    else:
        return 1 + min(
            dist(i-1, j),    // delete from word1
            dist(i, j-1),    // insert into word1
            dist(i-1, j-1)   // replace
        )

Call dist(len(word1), len(word2)).

Complexity Proof:

  • Branching factor 3 (except when match, reduces to 1).
  • Worst‑case: no matches, each call branches 3 ways.
  • Recurrence: T(i,j)=3T(i1,j1)+O(1)T(i,j) = 3 \cdot T(i-1,j-1) + O(1) (simplified). Depth min(i,j)\min(i,j), so O(3min(m,n))O(3^{\min(m,n)}) time, O(min(m,n))O(\min(m,n)) space (call stack).

Visualization:

Recursion tree for "ab" → "ba":

dist(2,2) → min( dist(1,2), dist(2,1), dist(1,1) )  
Each expands further until base cases (i=0 or j=0).

3.2 Recursive with Memoization (Top‑Down DP)

What: Cache results of dist(i,j) in a 2D array to avoid recomputation.
Why: Reduces exponential time to O(mn)O(mn), same recurrence as brute force but with memo.
How:

memo = 2D array initialized to -1
function dist(i, j):
    if memo[i][j] != -1: return memo[i][j]
    // base cases as before
    compute result, store in memo[i][j], return

Complexity Proof:

  • Number of distinct states: (m+1)×(n+1)(m+1) \times (n+1).
  • Each state computed once in O(1)O(1) (after recursive calls).
  • Time: O(mn)O(mn). Space: O(mn)O(mn) for memo + O(min(m,n))O(\min(m,n)) recursion stack.

Visualization:

Memo table fill order (example for m=3, n=3):
(0,0) (0,1) (0,2) (0,3)
(1,0) (1,1) (1,2) (1,3)
... filled via recursion dependencies (top‑down).

3.3 Bottom‑Up Dynamic Programming

What: Iteratively fill a DP table dp[i][j] from base cases upwards.
Why: Avoids recursion overhead; natural tabulation.
How:

  1. Create dp of size (m+1) x (n+1).
  2. Initialize first row/column: dp[i][0] = i, dp[0][j] = j.
  3. For i from 1 to m, j from 1 to n:
    • If word1[i-1] == word2[j-1]: dp[i][j] = dp[i-1][j-1].
    • Else: dp[i][j] = 1 + min(dp[i-1][j], dp[i][j-1], dp[i-1][j-1]).

Complexity Proof:

  • Nested loops: mm iterations × nn iterations × O(1)O(1) work → O(mn)O(mn) time.
  • Space: O(mn)O(mn) for the table.

Visualization:

Example: "horse" (m=5), "ros" (n=3)

dp table (5+1)x(3+1):

   Ø  r  o  s
Ø  0  1  2  3
h  1  1  2  3
o  2  2  1  2
r  3  2  2  2
s  4  3  3  2
e  5  4  4  3

Answer: dp[5][3] = 3.

3.4 Space‑Optimized DP (Two‑Rows)

What: Since dp[i][j] depends only on previous row (i-1) and current row’s previous column, keep only two rows.
Why: Reduces space to O(min(m,n))O(\min(m, n)) (if we store shorter string horizontally).
How:

  • Use arrays prev (size n+1) and curr (size n+1).
  • Iterate over rows, update curr using prev, then swap.

Complexity Proof:

  • Time: O(mn)O(mn) unchanged.
  • Space: O(2(n+1))=O(n)O(2 \cdot (n+1)) = O(n). If m < n, swap strings to get O(min(m,n))O(\min(m,n)).

Visualization:

For each i (row):
prev: [0,1,2,3]  // row i-1
curr: compute from prev and word1[i-1], word2
After compute, prev = curr for next iteration.

4. Code Deep Dive

Optimal Solution (Space‑Optimized DP)

def minDistance(word1: str, word2: str) -> int:
    m, n = len(word1), len(word2)
    # Ensure we use O(min(m,n)) space by making word2 the shorter one
    if m < n:
        word1, word2 = word2, word1
        m, n = n, m
    # Now word2 length = n, word1 length = m, with m >= n
    # prev represents dp[i-1][*], curr represents dp[i][*]
    prev = list(range(n + 1))  # dp[0][j] = j
    for i in range(1, m + 1):
        # curr[0] = i (delete all i chars of word1 prefix)
        curr = [i] + [0] * n
        for j in range(1, n + 1):
            if word1[i - 1] == word2[j - 1]:
                # characters match, carry previous diagonal
                curr[j] = prev[j - 1]
            else:
                # 1 + min(delete, insert, replace)
                curr[j] = 1 + min(prev[j],       # delete from word1
                                  curr[j - 1],   # insert into word1
                                  prev[j - 1])   # replace
        prev = curr  # move to next row
    return prev[n]

Line‑by‑Line Annotations

Line Code Explanation
1 def minDistance(...): Function signature.
2 m, n = len(word1), len(word2) Store lengths.
3‑7 if m < n: swap Space optimization trick: By ensuring word2 is shorter, our prev/curr arrays have size n+1 where n is the smaller length. Reduces memory when strings differ greatly in length.
9 prev = list(range(n + 1)) Initialize prev as row 0: dp[0][j] = j (insert j chars to empty string).
10 for i in range(1, m + 1): Iterate over rows (prefixes of word1).
12 curr = [i] + [0] * n curr[0] = i (delete all i chars). Remaining n entries to be computed.
13 for j in range(1, n + 1): Iterate over columns (prefixes of word2).
14‑16 if word1[i-1] == word2[j-1]: curr[j] = prev[j-1] Match case: No extra cost, take diagonal value.
17‑20 else: curr[j] = 1 + min(prev[j], curr[j-1], prev[j-1]) Mismatch: Choose cheapest among:
- prev[j] = delete from word1 (go down in full table).
- curr[j-1] = insert into word1 (go left).
- prev[j-1] = replace (diagonal).
21 prev = curr Row becomes previous for next iteration.
22 return prev[n] After last row, prev holds dp[m][*]; answer at prev[n].

Edge Case Handling

Example Code Logic Outcome
word1 = "", word2 = "abc" m=0, n=3prev = [0,1,2,3], loop for i in 1..0 not executed, return prev[3] = 3. Correct: insert 3 chars.
word1 = "horse", word2 = "ros" As in visualization, final prev[n] = 3. Matches example output.
word1 = "abc", word2 = "abc" All matches → curr[j] = prev[j-1], diagonals propagate 0 cost. Final prev[n] = 0. Correct: identical strings.
Large strings (length 500 each) Loops 500×500 = 250k iterations, memory ~1000 integers (8KB). Well within limits.

5. Complexity War Room

Hardware‑Aware Analysis

  • Time: O(mn)=500×500=250,000O(mn) = 500 \times 500 = 250,000 operations. On modern CPU (≈3GHz), each iteration is few instructions → ~0.1 ms.
  • Space (optimized): Two arrays of length n+1 ≤ 501 → 1002 integers. At 4 bytes each → ~4KB, fits in L1 cache (32KB).
  • Cache Behavior: Sequential access in inner loop (j) → excellent spatial locality. Prefetching works well.

Industry Comparison Table

Approach Time Space Readability Interview Viability
Brute Force Recursion O(3min(m,n))O(3^{\min(m,n)}) O(min(m,n))O(\min(m,n)) stack 8/10 (clear logic) ❌ Fails for m,n > 10
Top‑Down DP (Memo) O(mn)O(mn) O(mn)O(mn) memo + stack 9/10 (easy recursive) ✅ Acceptable, but may exceed stack for large m,n
Bottom‑Up DP (Full Table) O(mn)O(mn) O(mn)O(mn) 10/10 (standard) ✅ Recommended for clarity
Space‑Optimized DP O(mn)O(mn) O(min(m,n))O(\min(m,n)) 8/10 (slightly tricky) Optimal choice for interviews

6. Pro Mode Extras

Variants Section

  1. Edit Distance with Different Operation Costs

    • Assign costs c_ins, c_del, c_rep. Update DP:
      dp[i][j] = min(dp[i-1][j] + c_del, dp[i][j-1] + c_ins, dp[i-1][j-1] + (0 if match else c_rep))
  2. Damerau‑Levenshtein Distance

    • Adds transposition (swap adjacent chars) as operation. DP needs to check word1[i-2]==word2[j-1] and word1[i-1]==word2[j-2] for transposition cost.
  3. Edit Distance with Only Insert/Delete (No Replace)

    • Equivalent to length difference plus modifications? Actually, distance = m + n - 2*LCS (since replace not allowed, mismatches must be deleted and re‑inserted).
  4. Multi‑String Edit Distance

    • NP‑hard for >2 strings; heuristics or dynamic programming on higher‑dimension table.
  5. Approximate String Matching (Search in text)

    • Find substring of text with minimal edit distance to pattern. Modify DP: initialize first row to 0, answer is min(dp[m][j]) over j.

Interview Cheat Sheet

Must‑Mention Points:

  1. Define DP statedp[i][j] = edit distance between prefixes word1[:i] and word2[:j].
  2. Recurrence – Match, Insert, Delete, Replace cases.
  3. Base cases – empty string distances.
  4. Complexity – time O(mn)O(mn), space O(mn)O(mn) or O(min(m,n))O(\min(m,n)).
  5. Optimization – two‑row DP to save space.

Common Follow‑Ups:

  • “How to reconstruct the edit sequence?” – Store backpointers during DP.
  • “What if strings are extremely long?” – Use Hirschberg’s algorithm (divide‑and‑conquer) for O(mn)O(mn) time, O(min(m,n))O(\min(m,n)) space, or Myers’ bit‑parallel algorithm for small alphabets.
  • “How to handle Unicode?” – Same logic, but character comparison may need normalization.

Red Flags to Avoid:

  • Not handling empty strings.
  • Confusing insert vs delete direction.
  • Claiming greedy works.