#139 - Word Break
Word Break
- Difficulty: Medium
- Topics: Array, Hash Table, String, Dynamic Programming, Trie, Memoization
- Link: https://leetcode.com/problems/word-break/
Problem Description
Given a string s
and a dictionary of strings wordDict
, return true
if s
can be segmented into a space-separated sequence of one or more dictionary words.
Note that the same word in the dictionary may be reused multiple times in the segmentation.
Example 1:
Input: s = "leetcode", wordDict = ["leet","code"]
Output: true
Explanation: Return true because "leetcode" can be segmented as "leet code".
Example 2:
Input: s = "applepenapple", wordDict = ["apple","pen"]
Output: true
Explanation: Return true because "applepenapple" can be segmented as "apple pen apple".
Note that you are allowed to reuse a dictionary word.
Example 3:
Input: s = "catsandog", wordDict = ["cats","dog","sand","and","cat"]
Output: false
Constraints:
1 <= s.length <= 300
1 <= wordDict.length <= 1000
1 <= wordDict[i].length <= 20
s
andwordDict[i]
consist of only lowercase English letters.- All the strings of
wordDict
are unique.
Solution
1. Problem Deconstruction
Technical Restatement
Given string s
and dictionary wordDict
(unique words), determine if s
can be partitioned into contiguous substrings where each substring exists in wordDict
. Words can be reused.
Beginner-Friendly Version
Can you split the input string into parts where each part is a valid word from the given list? You can use the same word multiple times.
Mathematical Formulation
Let s = s₁s₂...sₙ
. Define dp[i]
as true iff substring s[1..i]
can be segmented using wordDict
. Then:
dp[0] = true (base case: empty string)
dp[i] = ⋁_{j=0}^{i-1} (dp[j] ∧ s[j+1..i] ∈ wordDict) for 1 ≤ i ≤ n
Return dp[n]
.
Constraint Analysis
- 1 ≤ s.length ≤ 300:
- Allows O(n²) solutions but rules out exponential brute force.
- Edge case: Entire string is one dictionary word.
- 1 ≤ wordDict.length ≤ 1000:
- Preprocessing to hash set for O(1) lookups is feasible.
- wordDict[i].length ≤ 20:
- Enables optimization by limiting substring checks to last 20 characters.
- All words unique:
- No need to handle duplicate entries in the dictionary.
2. Intuition Scaffolding
Real-World Analogy
Imagine assembling a jigsaw puzzle where each piece is a word from the dictionary. The puzzle is complete when all string characters are covered without gaps.
Gaming Analogy
Like unlocking levels in a game, where each level (substring) requires a specific key (dictionary word). The game is won if you unlock all levels sequentially.
Math Analogy
Covering the interval [0, n] with subintervals [j, i] where each interval’s label is a dictionary word.
Common Pitfalls
- Greedy Matching: Choosing longest word first may block valid splits.
- Overlooking Overlaps: Not checking all possible split points before current index.
- Ignoring Word Reuse: Assuming each word can be used only once.
- Case Sensitivity: Problem states all lowercase, but code must match exact casing.
- Empty String Edge Case: While constraints forbid s=“”, code must handle dp[0]=true.
3. Approach Encyclopedia
Brute Force (Recursive)
Check all possible splits recursively:
def wordBreak(s, wordDict):
word_set = set(wordDict)
def dfs(start):
if start == len(s): return True
for end in range(start+1, len(s)+1):
if s[start:end] in word_set and dfs(end):
return True
return False
return dfs(0)
Complexity: O(2ⁿ) time (exponential splits), O(n) stack space.
Memoization
Cache results for subproblems:
from functools import lru_cache
def wordBreak(s, wordDict):
word_set = set(wordDict)
@lru_cache
def dfs(start):
if start == len(s): return True
for end in range(start+1, len(s)+1):
if s[start:end] in word_set and dfs(end):
return True
return False
return dfs(0)
Complexity: O(n²) time (n states, each checked up to n times), O(n) space.
Dynamic Programming (Optimal)
Iteratively build segmentation validity:
def wordBreak(s, wordDict):
word_set = set(wordDict)
max_len = max(map(len, word_set)) if word_set else 0
n = len(s)
dp = [False]*(n+1)
dp[0] = True
for i in range(1, n+1):
start = max(0, i - max_len)
for j in range(start, i):
if dp[j] and s[j:i] in word_set:
dp[i] = True
break
return dp[n]
Complexity: O(n * L) time, where L = max word length. Space O(n).
Visualization
s = "catsand" → indices 0 1 2 3 4 5 6
wordDict = ["cat","cats","sand"]
DP Array Progress:
[0]:T → [3]:T (cat) → [4]:T (cats) → [7]:T (sand)
Final dp[6] = False (need dp[7] for full string).
4. Code Deep Dive
Line-by-Line Explanation
def wordBreak(s, wordDict):
word_set = set(wordDict) # O(m) conversion for O(1) lookups
max_len = max(map(len, word_set)) if word_set else 0 # Handle empty dict
n = len(s)
dp = [False]*(n+1) # dp[0..n]
dp[0] = True # Empty string base case
for i in range(1, n+1): # Build dp[1] to dp[n]
start = max(0, i - max_len) # Prune j checks beyond word lengths
for j in range(start, i):
if dp[j] and s[j:i] in word_set:
dp[i] = True
break # Early exit if valid split found
return dp[n]
Edge Case Handling
- Example 2 (“applepenapple”):
i=5
checks “apple” (0-5),i=8
checks “pen” (5-8),i=13
checks “apple” (8-13). - Example 3 (“catsandog”):
Fails because “andog” can’t be split into valid words.
5. Complexity War Room
Hardware Considerations
- For n=300, DP uses ~300 bytes (booleans), fitting in L1 cache.
- Hash set lookups are O(1), optimized via Python’s built-in sets.
Approach Comparison
Method | Time | Space | Readability | Interview Viability |
---|---|---|---|---|
Brute Force | O(2ⁿ) | O(n) | 7/10 | ❌ |
Memoization | O(n²) | O(n) | 8/10 | ✅ |
DP (Optimal) | O(n*L) | O(n) | 9/10 | ✅✅ |
6. Pro Mode Extras
Variants
- Word Break II (LC 140): Return all possible sentences.
- Ternary Search Tree: Optimize dictionary lookups for streaming data.
- Multi-Language: Handle Unicode characters with trie compression.
Interview Cheat Sheet
- First Mention: Time/space complexity of your approach.
- Key Insight: DP[j] represents valid segmentation up to index j.
- Gotcha: Always check if the dictionary is empty early.