17.2 Variant 2 Compute the longest sequence of characters that is a subsequence of A and B

#1

Hi,

I have problems solving that, I’m able to use Levensthein distance algorithm to show what is the length of sequence but I’m unable to show the array.

int getLevDistDP(std::string const& curr, int currIdx, std::string const& orig, int origIdx, std::vector<std::vector<int>>& cache) {

if (currIdx < 0) {
        return 0;
    } else if (origIdx < 0) {
        return 0;
    }
    if (cache[currIdx][origIdx] == -1) {
        if (curr[currIdx] == orig[origIdx]) {
            cache[currIdx][origIdx] = 1 + getLevDistDP(curr, currIdx - 1, orig, origIdx - 1, cache);
        } else {
            int addToTheEnd = getLevDistDP(curr, currIdx, orig, origIdx - 1, cache);
            int removeFromEnd = getLevDistDP(curr, currIdx - 1, orig, origIdx, cache);
            int substituteLast = getLevDistDP(curr, currIdx - 1, orig, origIdx - 1, cache);
            cache[currIdx][origIdx] = std::max({addToTheEnd, removeFromEnd, substituteLast});
        }
    }
    return cache[currIdx][origIdx];
    }

int getLevensteinDistance(std::string const& current, std::string const& original) {
std::vector<std::vector<int>> cache(original.size(), std::vector<int>(current.size(), -1));
return getLevDistDP(current, current.size() - 1, original, original.size() - 1, cache);
}

Basically I don’t know how to convey the vector or string of characters which would be the result. Any help appreciated.

Thanks,

0 Likes

#2

@Madi

I recommend you pick two strings and manually compute the longest subsequence of characters. Then apply your algorithm to them, and print out the cache. Next, see if you notice a pattern in the cache, with respect to the values it contains for the characters of the manually computed sequence. Also, for each of those characters, see if you notice anything interesting about the values of adjacent positions in the cache.

Before dealing with reconstructing the subsequence though, you may want to first transform your top-down recursive solution into a bottom-up one.

0 Likes

#3

@lgtout Thanks for reply,
Well I can see that going backwards on the path that is optimal will have to go thou equal characters. So I rewrote algorithm to keep decision made (insert, delete, sunstitute), and than I used that to go backward:

enum Decision {
    INSERT,
    DELETE,
    SUB,
    NUL
};

struct decision {
    int val;
    Decision dec;
};

std::vector<char> traverseBackwards(std::string const& curr, std::string const& orig, int currIdx, int origIdx, std::vector<std::vector<decision>>& cache) {
    std::vector<char> result;
    while (currIdx>=0 && origIdx>=0) {
        if (curr[currIdx] == orig[origIdx]) {
                result.push_back(curr[currIdx]);
        }
 
        if (cache[currIdx][origIdx].dec == SUB) {
            --currIdx;--origIdx;

        } else if (cache[currIdx][origIdx].dec == DELETE) {
            --currIdx;
        } else {
            --origIdx;
        }

    }
    return result;
}
decision getLevDistDP(std::string const& curr, int currIdx, std::string const& orig, int origIdx, std::vector<std::vector<decision>>& cache) {

    if (currIdx < 0) {
        return decision{1 + origIdx, INSERT};
    } else if (origIdx < 0) {
        return decision{1 + currIdx, INSERT};
    }
    if (cache[currIdx][origIdx].val == -1) {
        if (curr[currIdx] == orig[origIdx]) {
            cache[currIdx][origIdx] = getLevDistDP(curr, currIdx - 1, orig, origIdx - 1, cache);
        } else {
            decision addToTheEnd = getLevDistDP(curr, currIdx, orig, origIdx - 1, cache);
            decision removeFromEnd = getLevDistDP(curr, currIdx - 1, orig, origIdx, cache);
            decision substituteLast = getLevDistDP(curr, currIdx - 1, orig, origIdx - 1, cache);
            int min = std::min({addToTheEnd.val, removeFromEnd.val, substituteLast.val});
            if (min == addToTheEnd.val) {
                cache[currIdx][origIdx] = decision{1 + min, INSERT};
            } else if ( min == removeFromEnd.val) {
                cache[currIdx][origIdx] = decision{1+min, DELETE};
            } else {
                cache[currIdx][origIdx] = decision{1+min, SUB};
            }
        }
    }
    return cache[currIdx][origIdx];
}

Calling traverseBackwards on computed hash should work, I guess. I this what you thought?

Thanks,

0 Likes

#4

If the solution passes your tests, then you can be satisfied that it works.

However:

(1) I’m not sure that the solution is dynamic programming yet - it’s current design is memoized recursion.

(2) It’s possible to reconstruct the subsequence without introducing a new data structure (decision) i.e. by only traversing the values of the cache. As such, it follows that it’s also possible to delay all computation related to finding the subsequence until after the cache is completely filled.

0 Likes

#5

Thanks for reply,

so maybe my approach is wrong from the very begining, my recursive solution looks like this:
std::vector<char> getLongestSubstringRecursive(std::string const& curr, int currIdx, std::string const& orig, int origIdx, std::vector result) {

    std::vector<char> res = result;
    if (currIdx < 0 || origIdx < 0) {
        return result;
    }


    if (curr[currIdx] == orig[origIdx]) {
        res.emplace(res.begin(), curr[currIdx]);
        return getLongestSubstringRecursive(curr, currIdx - 1, orig, origIdx - 1, res);
    }

    auto a = getLongestSubstringRecursive(curr, currIdx - 1, orig, origIdx - 1, res);
    auto b = getLongestSubstringRecursive(curr, currIdx, orig, origIdx - 1, res);
    auto c = getLongestSubstringRecursive(curr, currIdx-1, orig, origIdx, res);

    if (a.size() > b.size() && a.size() > c.size()){
        return a;
    } else if (b.size() > a.size() && b.size() > c.size()) {
        return b;
    } else {
        return c;
    }
    }

But I cannot switch to DP from it as I would have to memorize the result itself and as you said we should use cache not create additional data. Any hint regarding this first step?

Thanks,

0 Likes

#6

What does the cache store?

0 Likes

#7

sorry shouldn’t be there i edited the post

0 Likes

#8

Np. I just realized that the most recent recursive implementation computes strings whereas the initial one computed subsequence lengths. I didn’t notice the shift earlier because I’m not used to reading C/C++. I think you should return to the initial implementation and revisit the comments in my initial post.

Transform the top-down recursive solution into a bottom up dynamic programming one, manually compute the longest subsequence of any two strings that you select, and examine the values in the cache for some pattern related to the longest subsequence you manually computed.

0 Likes

#9

Ok I have followed your steps and I noticed that we do not need the additional data structure. Using my initial implementation i can just revisit cache going diagonal up when strings are equal and going left or top to the bigger hash value. Basically that is why I have implemented decisions to know If I should go up/left but I spotted that if values are equal it does not matter, as I will just follow one of the solutions, I can print what I have visited “later”, which gives me sequence in right order. So cache computation is the same i.e:

int getLevDistDP(std::string const& curr, int currIdx, std::string const& orig, int origIdx, std::vector<std::vector<int>>& cache) {

if (currIdx < 0) {
        return 0;
    } else if (origIdx < 0) {
        return 0;
    }
    if (cache[currIdx][origIdx] == -1) {
        if (curr[currIdx] == orig[origIdx]) {
            cache[currIdx][origIdx] = 1 + getLevDistDP(curr, currIdx - 1, orig, origIdx - 1, cache);
        } else {
            int addToTheEnd = getLevDistDP(curr, currIdx, orig, origIdx - 1, cache);
            int removeFromEnd = getLevDistDP(curr, currIdx - 1, orig, origIdx, cache);
            int substituteLast = getLevDistDP(curr, currIdx - 1, orig, origIdx - 1, cache);
            cache[currIdx][origIdx] = std::max({addToTheEnd, removeFromEnd, substituteLast});
        }
    }
    return cache[currIdx][origIdx];
}

And later we can just go backward on hash recursively:

void getCommonChar(std::string const& curr, int currIdx, std::string const& orig, int origIdx, std::vector<std::vector<int>>& cache) {
    if (curr[currIdx] == orig[origIdx]) {
        getCommonChar(curr, currIdx - 1, orig, origIdx - 1, cache);
        std::cout << curr[currIdx] << ", ";
    } else if (currIdx > 0 && (origIdx == 0 || (cache[currIdx - 1][origIdx] > cache[currIdx][origIdx - 1]))) {
        getCommonChar(curr, currIdx - 1, orig, origIdx, cache);
    } else if (origIdx > 0 && (currIdx == 0 || (cache[currIdx][origIdx - 1] >= cache[currIdx-1][origIdx]))) {
        getCommonChar(curr, currIdx, orig, origIdx - 1, cache);
    } else {
        return;
    }
}

I just still don’t have better recursive solution besides what I have published earlier, but my god feeling is that it should be done diffrently…

Thanks

0 Likes

#10

You’re on the right path.

Is it possible to reconstruct without recursion?

Why do you feel you need a better recursive solution? For which part do you feel you need a better solution? Is it for reconstructing the subsequence? Or for constructing the cache?

Is it possible to translate getLevDistDP from recursion to bottom-up dynamic programming? It looks like what you have now is bottom-up recursion. Is it possible to replace recursion with a loop?

0 Likes

#11

I think I see that,
DP solution:

void getLevDistDP(std::string const& curr, std::string const& orig, std::vector<std::vector<int>>& cache) {
    for (int currIdx = 1; currIdx <= curr.size(); ++currIdx) {
        for (int origIdx = 1; origIdx <= orig.size(); ++origIdx) {
            if (curr[currIdx-1] == orig[origIdx-1]) {
                cache[currIdx][origIdx] = cache[currIdx-1][origIdx-1] + 1;
            } else {
                cache[currIdx][origIdx] = std::max({cache[currIdx][origIdx-1], cache[currIdx-1][origIdx-1], cache[currIdx-1][origIdx]});
            }
        }
    }
}

Non recursive reading of characters:

std::stack<char> getCommonChar(std::string const& curr, int currIdx, std::string const& orig, int origIdx, std::vector<std::vector<int>>& cache) {
    std::stack<char> s;
    while (cache[currIdx][origIdx]) {
        if (curr[currIdx-1] == orig[origIdx-1]) {
            --currIdx; --origIdx;
            s.push(curr[currIdx]);
        } else if (origIdx > 0 && (currIdx == 0 || (cache[currIdx][origIdx - 1] > cache[currIdx-1][origIdx]))) {
            --origIdx;
        } else if (currIdx > 0 && (origIdx == 0 || (cache[currIdx - 1][origIdx] >= cache[currIdx][origIdx - 1]))) {
            --currIdx;
        }
    }
    return s;
}

Please forget about my concerns about recursive solution, I was thinking about sth different. Do you think this is OK?
It has pretty good complexity O(mn) time and O((m+1)(n+1)) space(m and n are length of strings),

Thanks,

0 Likes

#12

:slight_smile: It looks good to me. If it passes your unit tests, then you’re finished. Well done!

(P.S. Notice how much less code and simpler your final implementation is compared with the first one you posted.)

0 Likes