Word Break

Hero Image

DT

Dhaval Trivedi

Co-founder, Airtribe

Problem Overview

In this article, we will explore the "Word Break" problem from LeetCode, a classic example that deals with string manipulation and utilizes dynamic programming for efficient solutions. The aim is to determine if a given string can be segmented into a space-separated sequence of one or more dictionary words. Here is the LeetCode link to the problem.

Problem Statement:

Given a string s and a dictionary of strings wordDict, return true if s can be segmented into a space-separated sequence of one or more dictionary words.

Example:

Input: s = "leetcode", wordDict = ["leet", "code"]
Output: true

Input: s = "applepenapple", wordDict = ["apple", "pen"]
Output: true

Understanding the Problem

The task is straightforward: we have to verify if it's possible to break up the input string into smaller strings, each of which must be found in the provided wordDict. The goal is then to return true if such a segmentation is possible, otherwise, return false.

A naive approach might try all possible segmentations, leading to inefficiency. Hence, a systematic, optimized approach using dynamic programming becomes essential.

Key DSA Concepts and Theory

Dynamic Programming

Dynamic Programming (DP) is an optimization technique that solves problems by breaking them down into simpler subproblems and storing the subproblem results to avoid recalculating them. This retains calculated results using either a top-down or bottom-up approach.

Memoization: Storing results of expensive function calls and returning the cached result when the same inputs occur again.

Tabulation: Solving subproblems first and using their solutions to build solutions to bigger subproblems. Often functions with overlapping subproblems and optimal substructure are candidates for dynamic programming.

In the Word Break problem, DP is used to check systematically if a subsection of the string can be formed using dictionary words, building up to solve the complete problem.

Solution Approach

The core idea is to use a boolean array dp where dp[i] is true if the string s up to index i can be segmented into dictionary words. This approach uses tabulation for dynamic programming.

Step-by-step Explanation:

  1. Initialization:

    • Create a boolean array dp of size n+1 (where n is the length of the string s) initialized to false.
    • Set dp[0] to true because an empty string can always be segmented trivially.
  2. Iteration:

    • Iterate over each index i from 1 to n.
    • For each i, iterate over each index j from 0 to i.
    • For each pair (i, j), check if dp[j] is true and the substring s[j:i] exists in wordDict. If yes, set dp[i] to true and break out of the loop as we found a valid segmentation.
  3. Result:

    • The value at dp[n] will give the answer – true if it is possible to segment the entire string, otherwise false.

C++ Code:

#include <iostream>
#include <vector>
#include <unordered_set>
using namespace std;

bool wordBreak(string s, vector<string>& wordDict) {
    unordered_set<string> wordSet(wordDict.begin(), wordDict.end());
    vector<bool> dp(s.size() + 1, false);
    dp[0] = true;
    
    for (int i = 1; i <= s.size(); ++i) {
        for (int j = 0; j < i; ++j) {
            if (dp[j] && wordSet.find(s.substr(j, i - j)) != wordSet.end()) {
                dp[i] = true;
                break;
            }
        }
    }
    return dp[s.size()];
}

Java Code:

import java.util.List;
import java.util.Set;
import java.util.HashSet;

public class WordBreak {
    public boolean wordBreak(String s, List<String> wordDict) {
        Set<String> wordSet = new HashSet<>(wordDict);
        boolean[] dp = new boolean[s.length() + 1];
        dp[0] = true;
        
        for (int i = 1; i <= s.length(); i++) {
            for (int j = 0; j < i; j++) {
                if (dp[j] && wordSet.contains(s.substring(j, i))) {
                    dp[i] = true;
                    break;
                }
            }
        }
        return dp[s.length()];
    }
}

Time and Space Complexity Analysis

  • Time Complexity: O(n^2 * m), where n is the length of the string s and m is the maximum length of a word in wordDict. This results from the nested loop over the string indices and the time taken for substring lookups and dictionary checks.

  • Space Complexity: O(n), due to the dp array used to store solutions to subproblems.

Common Mistakes to Avoid

  • Failing to initialize dp[0]: You must set dp[0] = true to account for the possibility of forming an empty prefix.
  • Overlooking substring bounds: When using s.substr(j, i-j), ensure the indices remain valid.
  • Inefficient lookups: Utilize a hash set for wordDict to enable O(1) average time complexity lookups instead of O(m).

Similar Problems on LeetCode

Additional Resources and References

This article should provide a solid overview of solving the Word Break problem using dynamic programming techniques with an emphasis on understanding the underlying concepts and considerations involved.