Is Subsequence

Difficulty: Easy
Topics: Two Pointers, String, Dynamic Programming
Link: https://leetcode.com/problems/is-subsequence/

Problem Description

Given two strings s and t, return true if s is a subsequence of t, or false otherwise.

A subsequence of a string is a new string that is formed from the original string by deleting some (can be none) of the characters without disturbing the relative positions of the remaining characters. (i.e., "ace" is a subsequence of "abcde" while "aec" is not).

Example 1:

Input: s = "abc", t = "ahbgdc"
Output: true

Example 2:

Input: s = "axc", t = "ahbgdc"
Output: false

Constraints:

0 <= s.length <= 100
0 <= t.length <= 104
s and t consist only of lowercase English letters.

Follow up: Suppose there are lots of incoming s, say s1, s2, ..., sk where k >= 109, and you want to check one by one to see if t has its subsequence. In this scenario, how would you change your code?

Solution

1. Problem Deconstruction (500+ words)

Technical Version: Given two strings $s$ and $t$ , determine if there exists an increasing sequence of indices $i_1 < i_2 < \dots < i_k$ in $t$ such that $t[i_j] = s[j]$ for all $j$ from $1$ to $|s|$ , where $|s|$ denotes the length of string $s$ . The algorithm must return a boolean value indicating whether such a sequence exists, with time and space complexity optimized for the given constraints.

Beginner Version: Imagine you have two strings. The first string (s) is like a pattern you’re trying to find in the second string (t). You’re allowed to remove some characters from t, but you must keep the remaining characters in the same order. If you can get exactly s by doing this, then s is a “subsequence” of t. Your task is to write a program that checks if this is possible.

Mathematical Version: Let $s = s_1s_2\dots s_m$ and $t = t_1t_2\dots t_n$ be strings over alphabet $\Sigma$ . We say $s$ is a subsequence of $t$ if there exists a strictly increasing function $f: \{1,\dots,m\} \rightarrow \{1,\dots,n\}$ such that for all $i \in \{1,\dots,m\}$ , we have $s_i = t_{f(i)}$ .

Constraint Analysis:

$0 \leq |s| \leq 100$ : This relatively small upper bound means we can consider algorithms with $O(|s| \times |t|)$ time complexity without performance issues.
$0 \leq |t| \leq 10^4$ : While larger than $|s|$ , this is still manageable for linear or near-linear algorithms. An $O(|t|)$ solution would handle the maximum input size efficiently.
Both strings consist only of lowercase English letters: This 26-character alphabet allows for potential optimizations using character indexing or precomputation.
Hidden edge cases: Empty string $s$ (always a subsequence), $s$ longer than $t$ (impossible), $s$ equals $t$ (trivially true), and all characters of $s$ appearing in $t$ but in wrong order.

2. Intuition Scaffolding

Real-World Metaphor: Imagine you’re a detective trying to verify if a suspect’s alibi matches security camera footage. The alibi (string s) lists specific events in order: “entered store → bought coffee → left store.” The security footage (string t) shows many events throughout the day. You need to check if you can find those specific events in the exact same order in the footage, even if other unrelated events happened in between.

Gaming Analogy: Think of a rhythm game where you must hit specific notes (string s) in the correct sequence during a song (string t). Many other notes appear on screen, but you only care about hitting the target notes in order. If you can tap all the target notes as they appear (in sequence), you succeed. Missing any target note or hitting them out of order means failure.

Math Analogy: Consider two sequences of numbers where we want to determine if the first sequence appears as an “ordered subset” of the second. This is equivalent to checking if we can find an order-preserving injection from the indices of s to the indices of t that preserves element equality.

Common Pitfalls:

Checking character counts only - Verifying that all characters in s appear in t is necessary but not sufficient (order matters)
Using sliding window incorrectly - A fixed window won’t work since matching characters can be arbitrarily far apart
Greedy matching from the end - This can miss valid subsequences that use earlier occurrences
Two-pointer confusion - Moving the wrong pointer when characters don’t match
Forgetting empty string case - The empty string is always a subsequence of any string
Assuming sorted strings - The problem doesn’t require strings to be sorted

3. Approach Encyclopedia

Approach 1: Two-Pointer Iteration (Optimal)

What: Use two pointers to traverse both strings simultaneously, matching characters in order.

Why: Leverages the natural ordering constraint of subsequences while maintaining $O(|t|)$ time complexity.

How:

Initialize s_index = 0, t_index = 0
While s_index < |s| AND t_index < |t|:
    If s[s_index] equals t[t_index]:
        Move s_index forward (found match)
    Always move t_index forward
Return TRUE if s_index reached end of s, else FALSE

Complexity Proof:

Time: $O(|t|)$ - Each character in t is visited at most once
Space: $O(1)$ - Only two integer variables needed

Visualization:

s = "abc", t = "ahbgdc"

Step 0: s_ptr↑(a) | t_ptr↑(a) → MATCH → s_ptr=1, t_ptr=1
Step 1: s_ptr↑(b) | t_ptr↑(h) → NO MATCH → t_ptr=2
Step 2: s_ptr↑(b) | t_ptr↑(b) → MATCH → s_ptr=2, t_ptr=3
Step 3: s_ptr↑(c) | t_ptr↑(g) → NO MATCH → t_ptr=4
Step 4: s_ptr↑(c) | t_ptr↑(d) → NO MATCH → t_ptr=5
Step 5: s_ptr↑(c) | t_ptr↑(c) → MATCH → s_ptr=3 DONE

Approach 2: Dynamic Programming (For Follow-up)

What: Preprocess t to create a next-character table for efficient repeated queries.

Why: Optimizes for the follow-up scenario with many s queries.

How:

Preprocessing:
Create dp[i][c] = next position of char c in t starting from index i
Fill table from right to left

Query:
For each char in s, use table to find next occurrence
If any char not found, return FALSE

Complexity Proof:

Preprocessing: $O(26 \times |t|) = O(|t|)$
Query: $O(|s|)$ per string
Space: $O(26 \times |t|) = O(|t|)$

4. Code Deep Dive

def isSubsequence(s: str, t: str) -> bool:
    # Edge case: empty s is always subsequence
    if not s:
        return True
    
    # Initialize pointers for both strings
    s_ptr, t_ptr = 0, 0
    
    # Traverse both strings
    while s_ptr < len(s) and t_ptr < len(t):
        # Current characters match - move s pointer
        if s[s_ptr] == t[t_ptr]:
            s_ptr += 1
        # Always move t pointer to search for next match
        t_ptr += 1
    
    # Success if we matched all characters in s
    return s_ptr == len(s)

Edge Case Handling:

Empty s (line 3-4): Returns True immediately
s longer than t: Loop terminates by t_ptr condition, returns False
No matches found: s_ptr never reaches end, returns False
Partial matches: Only returns True if ALL characters matched

5. Complexity War Room

Hardware-Aware Analysis:

At maximum |t| = 10,000, the two-pointer approach uses ~80KB for string storage + 24 bytes for pointers
Fits comfortably in L1 cache (typically 32-64KB)
Branch prediction benefits from sequential memory access pattern

Industry Comparison Table:

Approach	Time	Space	Readability	Interview Viability
Two-Pointer	$O(	t	)$	$O(1)$
DP Preprocessing	$O(	t	+ k	s
Recursive DFS	$O(2^{	t	})$	$O(
Binary Search*	$O(	s	\log	t

*After preprocessing character positions

6. Pro Mode Extras

Follow-up Solution for Multiple Queries:

class SubsequenceChecker:
    def __init__(self, t: str):
        self.t = t
        self.n = len(t)
        # Precompute next position for each character from each index
        self.dp = [[-1] * 26 for _ in range(self.n + 1)]
        
        # Fill DP table from right to left
        for i in range(self.n - 1, -1, -1):
            for c in range(26):
                self.dp[i][c] = self.dp[i + 1][c]
            self.dp[i][ord(t[i]) - ord('a')] = i
    
    def is_subsequence(self, s: str) -> bool:
        pos = 0
        for char in s:
            idx = ord(char) - ord('a')
            if pos > self.n or self.dp[pos][idx] == -1:
                return False
            pos = self.dp[pos][idx] + 1
        return True

Variants Section:

Number of Distinct Subsequences: Count how many distinct ways s appears as subsequence in t
Longest Common Subsequence: Find the longest string that is subsequence of both strings
Shortest Uncommon Subsequence: Find shortest string that is subsequence of s but not t

Interview Cheat Sheet:

First Mention: Always state time/space complexity before coding
Edge Cases: Empty strings, s longer than t, duplicate characters
Optimization Path: Two-pointer → Binary search → DP preprocessing
Testing Strategy: Test with examples + empty + full match + no match cases

#392 - Is Subsequence