#1143 - Longest Common Subsequence
Longest Common Subsequence
- Difficulty: Medium
- Topics: String, Dynamic Programming
- Link: https://leetcode.com/problems/longest-common-subsequence/
Problem Description
Given two strings text1
and text2
, return the length of their longest common subsequence. If there is no common subsequence, return 0
.
A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.
- For example,
"ace"
is a subsequence of"abcde"
.
A common subsequence of two strings is a subsequence that is common to both strings.
Example 1:
Input: text1 = "abcde", text2 = "ace"
Output: 3
Explanation: The longest common subsequence is "ace" and its length is 3.
Example 2:
Input: text1 = "abc", text2 = "abc"
Output: 3
Explanation: The longest common subsequence is "abc" and its length is 3.
Example 3:
Input: text1 = "abc", text2 = "def"
Output: 0
Explanation: There is no such common subsequence, so the result is 0.
Constraints:
1 <= text1.length, text2.length <= 1000
text1
andtext2
consist of only lowercase English characters.
Solution
1. Problem Deconstruction
Technical Restatement
Given two strings text1
(length m
) and text2
(length n
), determine the maximum length of a subsequence common to both. A subsequence is derived by deleting zero or more characters from a string without altering the order of the remaining characters. The solution must return 0
if no common subsequence exists.
Beginner Explanation
Imagine you have two strings, like “apple” and “apricot”. The goal is to find the longest sequence of letters that appear in both in the same order, but not necessarily consecutively. For example, “ap” is a common subsequence. The answer is the length of the longest such sequence.
Mathematical Formulation
Let S = s₁s₂...sₘ
and T = t₁t₂...tₙ
be two strings. Find the maximum integer k
such that there exist indices i₁ < i₂ < ... < iₖ
(for S
) and j₁ < j₁ < ... < jₖ
(for T
) where s_{i_p} = t_{j_p}
for all 1 ≤ p ≤ k
.
Constraint Analysis
1 <= text1.length, text2.length <= 1000
- Limits solutions to polynomial time (e.g., O(mn) DP is feasible).
- Hidden edge case: strings of equal length with full overlap (e.g., “abc” vs “abc”).
- Lowercase English characters only
- Ensures 26 possible values per character, but does not directly impact logic.
- Edge case: strings with all identical characters (e.g., “aaaaa” vs “aa”).
2. Intuition Scaffolding
Analogies
- Real-World: DNA sequence alignment in bioinformatics – finding shared genetic markers in order.
- Gaming: Matching a hidden treasure map’s path through two forests with obstacles.
- Math: Maximizing alignment between two time series with ordered events.
Common Pitfalls
- Greedy Matching: Taking the first matching character may skip longer sequences (e.g., “abcde” vs “aec” → greedy “ae” vs optimal “ace”).
- Substring Confusion: Assuming characters must be contiguous.
- Global Caching: Storing all possible subsequences (exponential memory).
- Recursion Without Memoization: Leading to O(2^max(m,n)) time.
- Space Inefficiency: Using O(mn) space when O(n) suffices.
3. Approach Encyclopedia
Brute Force (Theoretical)
- Pseudocode:
def lcs(text1, text2, i, j): if i == 0 or j == 0: return 0 if text1[i-1] == text2[j-1]: return 1 + lcs(text1, text2, i-1, j-1) else: return max(lcs(text1, text2, i-1, j), lcs(text1, text2, i, j-1))
- Complexity: O(2^{m+n}) time (exponential), O(1) space (stack overflow likely).
Dynamic Programming (Optimal)
- Pseudocode:
Initialize DP table of size (m+1) x (n+1) For i from 1 to m: For j from 1 to n: If text1[i-1] == text2[j-1]: DP[i][j] = DP[i-1][j-1] + 1 Else: DP[i][j] = max(DP[i-1][j], DP[i][j-1]) Return DP[m][n]
- Complexity Proof:
- Time: O(mn) – nested loops over
m
andn
. - Space: O(mn) for the table. Optimizable to O(n) using a rolling array.
- Time: O(mn) – nested loops over
- Visualization:
a c e 0 0 0 0 a 1 1 1 b 1 1 1 c 1 2 2 d 1 2 2 e 1 2 3
4. Code Deep Dive
Optimal Solution Code
def longestCommonSubsequence(text1: str, text2: str) -> int:
m, n = len(text1), len(text2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if text1[i-1] == text2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
return dp[m][n]
Line-by-Line Explanation
- Initialization:
dp
table of size(m+1) x (n+1)
to handle empty subsequences. - Loop Indices:
i
andj
start at 1 to avoid out-of-bounds checks. - Character Match: If
text1[i-1] == text2[j-1]
, extend the LCS fromdp[i-1][j-1]
. - No Match: Take the best of excluding
text1[i-1]
ortext2[j-1]
.
Edge Case Handling
- Example 1:
text1="abcde", text2="ace"
dp[5][3]
correctly computes 3 via incremental matches.
- Example 3:
text1="abc", text2="def"
- All
dp[i][j]
remain 0.
- All
5. Complexity War Room
Hardware-Aware Analysis
- For
m = n = 1000
, the DP table uses ~4MB (1001*1001 integers × 4 bytes). Fits in CPU cache.
Industry Comparison Table
Approach | Time | Space | Readability | Interview Viability |
---|---|---|---|---|
Brute Force | O(2ⁿ⁺ᵐ) | O(1) | 5/10 | ❌ |
DP (2D) | O(mn) | O(mn) | 9/10 | ✅ |
DP (Optimized) | O(mn) | O(n) | 7/10 | ✅ |
6. Pro Mode Extras
Variants
- LCS with Three Strings: 3D DP table (
dp[i][j][k]
). - LCS with Edit Operations: Incorporate substitutions (similar to edit distance).
Interview Cheat Sheet
- First Mention: “This is a classic DP problem where overlapping subproblems are solved optimally.”
- Key Insight: “The DP state tracks the LCS length up to indices
i
andj
.” - Optimization Tip: “Use a rolling 1D array to reduce space complexity.”