Longest Substring without repeat

Hero Image

DT

Dhaval Trivedi

Co-founder, Airtribe

Problem Overview

In this problem, we must find the longest substring without repeating characters from a given input string. A substring is a contiguous sequence of characters within a string. Our task is to determine the length of such a substring, which does not contain any repeating characters.

The problem statement is often phrased as:

Given a string s, find the length of the longest substring without repeating characters.

Understanding the Problem

Before jumping into code, let's break down the problem. We want to identify the maximum length of any substring in a string that doesn't contain duplicate characters. Consider the following example:

  • Input: "abcabcbb"
  • Output: 3
  • Explanation: The answer is "abc", with the length of 3.

Here's another example:

  • Input: "bbbbb"
  • Output: 1
  • Explanation: The answer is "b", with the length of 1.

These examples highlight that the longest substring can be part of different regions of the string and may not include every unique letter of the alphabet.

Key DSA Concepts and Theory

To solve this problem efficiently, we need to leverage the concept of:

  1. Sliding Window Technique: This is a commonly used technique when dealing with substring problems. It involves moving a 'window' over subsets of data to check or analyze those windows without having to rescind what is inside the window repeatedly. This way, we can maintain a subset of elements that potentially could be our answer while moving through the input.

  2. Hashing: Using a data structure like a hash set can help in verifying if an element exists or not in constant time. For this problem, having a quick lookup to check if a character has already been added to our current substrings is essential.

Solution Approach

To solve the problem efficiently, a two-pointer method with a hash map or set will be used, often referred to as the sliding window technique. We'll write the solution in both C++ and Java.

Steps:

  1. Use two pointers: left and right which will define the current window string.
  2. Start with left at 0 and right also at 0. As you expand right, whenever you encounter a duplicate character (one that already exists in the current scope of the window), you shift left to narrow down the window until no duplicates exist.
  3. Use a set to track characters in the current window and easily identify when a character should cause a contraction from the left side.
  4. Move right each time and update the length of the longest substring found if the current window's length exceeds it.

C++ Example

#include <iostream>
#include <unordered_set>
#include <string>

int lengthOfLongestSubstring(std::string s) {
    std::unordered_set<char> charSet;
    int left = 0, maxLen = 0;

    for (int right = 0; right < s.length(); right++) {
        while (charSet.find(s[right]) != charSet.end()) {
            charSet.erase(s[left]);
            left++;
        }
        charSet.insert(s[right]);
        maxLen = std::max(maxLen, right - left + 1);
    }
    return maxLen;
}

int main() {
    std::string s = "abcabcbb";
    std::cout << lengthOfLongestSubstring(s) << std::endl; // Output: 3
    return 0;
}

Java Example

import java.util.HashSet;

public class Solution {
    public int lengthOfLongestSubstring(String s) {
        HashSet<Character> charSet = new HashSet<>();
        int left = 0, maxLen = 0;

        for (int right = 0; right < s.length(); right++) {
            while (charSet.contains(s.charAt(right))) {
                charSet.remove(s.charAt(left));
                left++;
            }
            charSet.add(s.charAt(right));
            maxLen = Math.max(maxLen, right - left + 1);
        }
        return maxLen;
    }

    public static void main(String[] args) {
        Solution sol = new Solution();
        System.out.println(sol.lengthOfLongestSubstring("abcabcbb")); // Output: 3
    }
}

Time and Space Complexity Analysis

  • Time Complexity: O(n), where n is the length of the string. Each character is visited at most twice (once by right and once by left pointers).

  • Space Complexity: O(min(m, n)), where m is the size of the character set (like 26 for only lowercase English characters) because we store characters in a set, and n is the length of the string. This accounts for the characters currently in our window.

Common Mistakes to Avoid

  • Forgetting to move the left pointer forward when encountering a duplicate can lead to an infinite loop.
  • Not updating the maxLen variable with the potential new maximum length found in each iteration.
  • Make sure to check and handle strings with distinct characters and very short strings to prevent index errors.

Similar Problems on LeetCode

Here are some related problems that can enhance understanding of the sliding window technique:

Additional Resources and References

  • LeetCode discussion forum for alternative solutions and peer insights.
  • "Introduction to Algorithms" book by Thomas H. Cormen for foundational concepts related to data structures and algorithms.
  • Software-related StackExchange for elaborating on sliding window and hash set data structures.

By understanding and practicing this approach, one can master solving problems involving finding substrings efficiently in strings.