Woodstock Blog

a tech blog for general algorithmic interview questions

[Amazon] Longest Repeating Substring

Question

link

Finding the longest repeated substring.

Example: “banana” ==> “ana”

Solution

There are 2 solutions: Suffix array, and Suffix tree.

1. Suffix array. Simple code, explained here.

Bentley’s programming pearl book has the simplest implementation (less than 15 lines code) which sort all suffix, and then check common prefix length among adjacent suffix. The time complexity is O(n2logn) for sorting the suffix (which has avg length of O(n)).

A detailed step-by-step explanation:

str = banana, its suffixes are:
banana
anana
nana
ana
na
a

after sort, the suffix array looks like:

a
ana
anana
banana
na
nana

Then for each two adjacent suffixes, check the length of the common prefix.

The answer is “ana” (if overlapping is allowed, otherwise, should be “an”).

2. Suffix tree. Suggest by this post, Or this:

a good solution is to create a suffix tree for the given word and then find the deepest internal node in that tree (node with at least 2 descendants under it)…

For a nice PPT presentation about suffix tree, look here.

Code

Suffix array approach.

public String longestRepeat(String input) {
    int len = input.length();
    String[] suffixArray = new String[len];
    for (int i = 0; i < len; i++) {
        suffixArray[i] = input.substring(i);
    }
    // now sort the suffix array
    Arrays.sort(suffixArray);
    String longest = "";
    // start to compare neighborhood suffixes, and check LCP
    for (int i = 0; i < suffixArray.length - 1; i++) {
        String lcp = longestCommonPrefix(suffixArray[i], suffixArray[i + 1]);
        if (lcp.length() > longest.length()) {
            longest = lcp;
        }
    }
    return longest;
}

private String longestCommonPrefix(String s1, String s2) {
    int p = 0;
    while (p < s1.length() && p < s2.length()) {
        if (s1.charAt(p) != s2.charAt(p)) {
            break;
        }
        p++;
    }
    return s1.substring(0, p);
}