← ~/lc-grind/posts

#139 - Word Break

May 15, 2025

Word Break

Problem Description

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.

Note that the same word in the dictionary may be reused multiple times in the segmentation.

Example 1:


Input: s = "leetcode", wordDict = ["leet","code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".

Example 2:


Input: s = "applepenapple", wordDict = ["apple","pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".
Note that you are allowed to reuse a dictionary word.

Example 3:


Input: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
Output: false

Constraints:

  • 1 <= s.length <= 300
  • 1 <= wordDict.length <= 1000
  • 1 <= wordDict[i].length <= 20
  • s and wordDict[i] consist of only lowercase English letters.
  • All the strings of wordDict are unique.

Solution

1. Problem Deconstruction

Technical Restatement
Given string s and dictionary wordDict (unique words), determine if s can be partitioned into contiguous substrings where each substring exists in wordDict. Words can be reused.

Beginner-Friendly Version
Can you split the input string into parts where each part is a valid word from the given list? You can use the same word multiple times.

Mathematical Formulation
Let s = s₁s₂...sₙ. Define dp[i] as true iff substring s[1..i] can be segmented using wordDict. Then:

dp[0] = true (base case: empty string)  
dp[i] = ⋁_{j=0}^{i-1} (dp[j] ∧ s[j+1..i] ∈ wordDict) for 1 ≤ i ≤ n  

Return dp[n].

Constraint Analysis

  1. 1 ≤ s.length ≤ 300:
    • Allows O(n²) solutions but rules out exponential brute force.
    • Edge case: Entire string is one dictionary word.
  2. 1 ≤ wordDict.length ≤ 1000:
    • Preprocessing to hash set for O(1) lookups is feasible.
  3. wordDict[i].length ≤ 20:
    • Enables optimization by limiting substring checks to last 20 characters.
  4. All words unique:
    • No need to handle duplicate entries in the dictionary.

2. Intuition Scaffolding

Real-World Analogy
Imagine assembling a jigsaw puzzle where each piece is a word from the dictionary. The puzzle is complete when all string characters are covered without gaps.

Gaming Analogy
Like unlocking levels in a game, where each level (substring) requires a specific key (dictionary word). The game is won if you unlock all levels sequentially.

Math Analogy
Covering the interval [0, n] with subintervals [j, i] where each interval’s label is a dictionary word.

Common Pitfalls

  1. Greedy Matching: Choosing longest word first may block valid splits.
  2. Overlooking Overlaps: Not checking all possible split points before current index.
  3. Ignoring Word Reuse: Assuming each word can be used only once.
  4. Case Sensitivity: Problem states all lowercase, but code must match exact casing.
  5. Empty String Edge Case: While constraints forbid s=“”, code must handle dp[0]=true.

3. Approach Encyclopedia

Brute Force (Recursive)
Check all possible splits recursively:

def wordBreak(s, wordDict):  
    word_set = set(wordDict)  
    def dfs(start):  
        if start == len(s): return True  
        for end in range(start+1, len(s)+1):  
            if s[start:end] in word_set and dfs(end):  
                return True  
        return False  
    return dfs(0)  

Complexity: O(2ⁿ) time (exponential splits), O(n) stack space.

Memoization
Cache results for subproblems:

from functools import lru_cache  
def wordBreak(s, wordDict):  
    word_set = set(wordDict)  
    @lru_cache  
    def dfs(start):  
        if start == len(s): return True  
        for end in range(start+1, len(s)+1):  
            if s[start:end] in word_set and dfs(end):  
                return True  
        return False  
    return dfs(0)  

Complexity: O(n²) time (n states, each checked up to n times), O(n) space.

Dynamic Programming (Optimal)
Iteratively build segmentation validity:

def wordBreak(s, wordDict):  
    word_set = set(wordDict)  
    max_len = max(map(len, word_set)) if word_set else 0  
    n = len(s)  
    dp = [False]*(n+1)  
    dp[0] = True  
    for i in range(1, n+1):  
        start = max(0, i - max_len)  
        for j in range(start, i):  
            if dp[j] and s[j:i] in word_set:  
                dp[i] = True  
                break  
    return dp[n]  

Complexity: O(n * L) time, where L = max word length. Space O(n).

Visualization

s = "catsand" → indices 0 1 2 3 4 5 6  
wordDict = ["cat","cats","sand"]  

DP Array Progress:  
[0]:T → [3]:T (cat) → [4]:T (cats) → [7]:T (sand)  
Final dp[6] = False (need dp[7] for full string).  

4. Code Deep Dive

Line-by-Line Explanation

def wordBreak(s, wordDict):  
    word_set = set(wordDict)  # O(m) conversion for O(1) lookups  
    max_len = max(map(len, word_set)) if word_set else 0  # Handle empty dict  
    n = len(s)  
    dp = [False]*(n+1)  # dp[0..n]  
    dp[0] = True  # Empty string base case  
    for i in range(1, n+1):  # Build dp[1] to dp[n]  
        start = max(0, i - max_len)  # Prune j checks beyond word lengths  
        for j in range(start, i):  
            if dp[j] and s[j:i] in word_set:  
                dp[i] = True  
                break  # Early exit if valid split found  
    return dp[n]  

Edge Case Handling

  • Example 2 (“applepenapple”):
    i=5 checks “apple” (0-5), i=8 checks “pen” (5-8), i=13 checks “apple” (8-13).
  • Example 3 (“catsandog”):
    Fails because “andog” can’t be split into valid words.

5. Complexity War Room

Hardware Considerations

  • For n=300, DP uses ~300 bytes (booleans), fitting in L1 cache.
  • Hash set lookups are O(1), optimized via Python’s built-in sets.

Approach Comparison

Method Time Space Readability Interview Viability
Brute Force O(2ⁿ) O(n) 7/10
Memoization O(n²) O(n) 8/10
DP (Optimal) O(n*L) O(n) 9/10 ✅✅

6. Pro Mode Extras

Variants

  1. Word Break II (LC 140): Return all possible sentences.
  2. Ternary Search Tree: Optimize dictionary lookups for streaming data.
  3. Multi-Language: Handle Unicode characters with trie compression.

Interview Cheat Sheet

  • First Mention: Time/space complexity of your approach.
  • Key Insight: DP[j] represents valid segmentation up to index j.
  • Gotcha: Always check if the dictionary is empty early.