← ~/lc-grind/posts

#1143 - Longest Common Subsequence

May 14, 2025

Longest Common Subsequence

Problem Description

Given two strings text1 and text2, return the length of their longest common subsequence. If there is no common subsequence, return 0.

A subsequence of a string is a new string generated from the original string with some characters (can be none) deleted without changing the relative order of the remaining characters.

  • For example, "ace" is a subsequence of "abcde".

A common subsequence of two strings is a subsequence that is common to both strings.

Example 1:


Input: text1 = "abcde", text2 = "ace" 
Output: 3  
Explanation: The longest common subsequence is "ace" and its length is 3.

Example 2:


Input: text1 = "abc", text2 = "abc"
Output: 3
Explanation: The longest common subsequence is "abc" and its length is 3.

Example 3:


Input: text1 = "abc", text2 = "def"
Output: 0
Explanation: There is no such common subsequence, so the result is 0.

Constraints:

  • 1 <= text1.length, text2.length <= 1000
  • text1 and text2 consist of only lowercase English characters.

Solution

1. Problem Deconstruction

Technical Restatement
Given two strings text1 (length m) and text2 (length n), determine the maximum length of a subsequence common to both. A subsequence is derived by deleting zero or more characters from a string without altering the order of the remaining characters. The solution must return 0 if no common subsequence exists.

Beginner Explanation
Imagine you have two strings, like “apple” and “apricot”. The goal is to find the longest sequence of letters that appear in both in the same order, but not necessarily consecutively. For example, “ap” is a common subsequence. The answer is the length of the longest such sequence.

Mathematical Formulation
Let S = s₁s₂...sₘ and T = t₁t₂...tₙ be two strings. Find the maximum integer k such that there exist indices i₁ < i₂ < ... < iₖ (for S) and j₁ < j₁ < ... < jₖ (for T) where s_{i_p} = t_{j_p} for all 1 ≤ p ≤ k.

Constraint Analysis

  1. 1 <= text1.length, text2.length <= 1000
    • Limits solutions to polynomial time (e.g., O(mn) DP is feasible).
    • Hidden edge case: strings of equal length with full overlap (e.g., “abc” vs “abc”).
  2. Lowercase English characters only
    • Ensures 26 possible values per character, but does not directly impact logic.
    • Edge case: strings with all identical characters (e.g., “aaaaa” vs “aa”).

2. Intuition Scaffolding

Analogies

  1. Real-World: DNA sequence alignment in bioinformatics – finding shared genetic markers in order.
  2. Gaming: Matching a hidden treasure map’s path through two forests with obstacles.
  3. Math: Maximizing alignment between two time series with ordered events.

Common Pitfalls

  1. Greedy Matching: Taking the first matching character may skip longer sequences (e.g., “abcde” vs “aec” → greedy “ae” vs optimal “ace”).
  2. Substring Confusion: Assuming characters must be contiguous.
  3. Global Caching: Storing all possible subsequences (exponential memory).
  4. Recursion Without Memoization: Leading to O(2^max(m,n)) time.
  5. Space Inefficiency: Using O(mn) space when O(n) suffices.

3. Approach Encyclopedia

Brute Force (Theoretical)

  • Pseudocode:
    def lcs(text1, text2, i, j):
        if i == 0 or j == 0:
            return 0
        if text1[i-1] == text2[j-1]:
            return 1 + lcs(text1, text2, i-1, j-1)
        else:
            return max(lcs(text1, text2, i-1, j), lcs(text1, text2, i, j-1))
    
  • Complexity: O(2^{m+n}) time (exponential), O(1) space (stack overflow likely).

Dynamic Programming (Optimal)

  • Pseudocode:
    Initialize DP table of size (m+1) x (n+1)  
    For i from 1 to m:  
        For j from 1 to n:  
            If text1[i-1] == text2[j-1]:  
                DP[i][j] = DP[i-1][j-1] + 1  
            Else:  
                DP[i][j] = max(DP[i-1][j], DP[i][j-1])  
    Return DP[m][n]  
    
  • Complexity Proof:
    • Time: O(mn) – nested loops over m and n.
    • Space: O(mn) for the table. Optimizable to O(n) using a rolling array.
  • Visualization:
       a c e  
     0 0 0 0  
    a 1 1 1  
    b 1 1 1  
    c 1 2 2  
    d 1 2 2  
    e 1 2 3  
    

4. Code Deep Dive

Optimal Solution Code

def longestCommonSubsequence(text1: str, text2: str) -> int:
    m, n = len(text1), len(text2)
    dp = [[0] * (n + 1) for _ in range(m + 1)]
    for i in range(1, m + 1):
        for j in range(1, n + 1):
            if text1[i-1] == text2[j-1]:
                dp[i][j] = dp[i-1][j-1] + 1
            else:
                dp[i][j] = max(dp[i-1][j], dp[i][j-1])
    return dp[m][n]

Line-by-Line Explanation

  1. Initialization: dp table of size (m+1) x (n+1) to handle empty subsequences.
  2. Loop Indices: i and j start at 1 to avoid out-of-bounds checks.
  3. Character Match: If text1[i-1] == text2[j-1], extend the LCS from dp[i-1][j-1].
  4. No Match: Take the best of excluding text1[i-1] or text2[j-1].

Edge Case Handling

  • Example 1: text1="abcde", text2="ace"
    • dp[5][3] correctly computes 3 via incremental matches.
  • Example 3: text1="abc", text2="def"
    • All dp[i][j] remain 0.

5. Complexity War Room

Hardware-Aware Analysis

  • For m = n = 1000, the DP table uses ~4MB (1001*1001 integers × 4 bytes). Fits in CPU cache.

Industry Comparison Table

Approach Time Space Readability Interview Viability
Brute Force O(2ⁿ⁺ᵐ) O(1) 5/10
DP (2D) O(mn) O(mn) 9/10
DP (Optimized) O(mn) O(n) 7/10

6. Pro Mode Extras

Variants

  1. LCS with Three Strings: 3D DP table (dp[i][j][k]).
  2. LCS with Edit Operations: Incorporate substitutions (similar to edit distance).

Interview Cheat Sheet

  • First Mention: “This is a classic DP problem where overlapping subproblems are solved optimally.”
  • Key Insight: “The DP state tracks the LCS length up to indices i and j.”
  • Optimization Tip: “Use a rolling 1D array to reduce space complexity.”