Alternative to 13.10?

#1

Hi
Regarding the problem 13.10, which is to find a minimum sequential subcover, when the set of string to match is made of distinct elements: what I came up with only uses trivial data structures, unlike the proposed solution which is quite elaborate.

Since I have rather limited faith in my guru-ness I though I’ll post it up here with a test driver and wait to hear what I overlooked. It may be that the authors skipped over this solution because of the string comparison adding a factor O(max len(s)), s in Q to the time complexity (but then again, on a hash hit, the strings are also compared to test for spurious hits, no ?).

Edit 1 : The faulty condition on subcover index update was edited to reflect the discussion further down.
Edit 2 : This solution is incorrect, see discussion further down

Thanks. :wink:

pair<int, int> Prob13_10(const vector<string> &Q, const vector<string> &A) {
	pair<int, int> min_idx(-1, -1);
	int l = -1, r = 0;
	int next_match = 0;

	while (r != A.size()) {
		while (r != A.size() && next_match < Q.size()) {
			bool reduce_left = (Q.size() > 1 && next_match == 1 && A[r] == Q[0]);
			if (A[r] == Q[next_match] ||reduce_left) {
				if (next_match==0 || reduce_left)
					l = r;
				if (!reduce_left)
					next_match++;
			}
			r++;
		}
            /* The next condition is wrong, see explanations further down 
		if (min_idx.first == -1 || r -1- l < min_idx.second - min_idx.first)
			min_idx = make_pair(l, r - 1);*/
            if (next_match==Q.size()&&(min_idx.first == -1 || r -1- l < min_idx.second - min_idx.first))
		min_idx = make_pair(l, r - 1);
		next_match = 0;
		l = r;
	}
	return min_idx;
}

void Prob13_10D() {
	
	string declaration("My paramount object in this struggle is to save the Union , and is not either to save or to destroy slavery . If I could save the Union without freeing any slave I would do it , and if I could save it by freeing all the slaves I would do it ; and if I could save it by freeing some and leaving others alone I would also do that");
	string buf; 
	stringstream ss(declaration); 
	vector<string> A,Q; 
	while (ss >> buf)
		A.push_back(buf);	

	Q.push_back("Union"); Q.push_back("save");
	pair<int, int> min = Prob13_10(Q, A);
	assert(min.first == 10 && min.second == 17);
	A.clear(); Q.clear();
	A.push_back("a"); A.push_back("a"); A.push_back("a"); A.push_back("a"); A.push_back("b"); A.push_back("c");
	Q.push_back("a"); Q.push_back("c"); 
	min = Prob13_10(Q, A);
	assert(min.first == 3 && min.second == 5);
    Q = { "0", "2", "9", "4", "6" };
    A = { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "2", "4", "6", "10", "10", "10", "3", "2", "1", "0" };
    min = Prob13_10(Q, A);
    assert(min.first == 0 && min.second == 12);
}
0 Likes

#2

Hey zebullon,

Would you mind to explain your algorithm as it is really hard to just understand your method by reading a code without comments :frowning:

Think this is an interview scenario, you must either write super clean code with comments or you have to explain it; otherwise, interviewers probably won’t have time to decipher what you write :wink:

0 Likes

#3

Sure, the idea is to process each elements in A, keeping trace of what is the next expected word to match from Q. Elements/words from A can either be

  • an irrelevant element, meaning that it is not a word in Q. We move to the next word in A.
  • the next element to be matched from Q (we need to match words from Q in sequence, so we know at each iteration which word we want to see). In that case we update the next word we want to match.
  • a repetition of the first word in Q, in that case we will achieve a smaller subcover by skipping the repetition. E.g if we want to match Q={ a, bob} , in A={ Hey a a bob}, the growing subcover being in bold. When processing the second “a”, we can achieve a smaller subcover by moving the left idx to the second “a”. This case is flagged by reduce_left . The fact that elements in Q are unique mean that only the leftmost element is involved in that tricky scenario. E.g Q={ a, bob, is} , and A={ Hey a bob bob bob is }, when parsing the second “bob” we cant move the left id past “a” as then the subcover would fail.

Then again from the fact that elements can not be repeated in Q, once a subcover is found we can directly start finding for the next from the right most index. To make this point more convincing, if words in Q can not be repeated a suffix (concatenate words in Q) of Q can not also be prefix at the same time (if words can be repeated, see Q={ A bob is A bob}, the suffix “A bob” is also a prefix “A bob”, thus matches can overlap). So we can resume matching from the rightmost index.

I hope it makes more sense now :wink:

0 Likes

#4

Hey zebullon,

I still had a hard time understanding your algorithm is correct or not, so I use some inputs to test that but it seems not working. Following is my input test case:

Q = {“0”, “2”, “9”, “4”, “6”} and A = {“0”, “1”, “2”, “3”, “4”, “5”, “6”, “7”, “8”, “9”, “2”, “4”, “6”, “10”, “10”, “10”, “3”, “2”, “1”, “0”}.

Please try it and let me know what is going on. Looking forward your reply!

0 Likes

#5

Thanks for the comment !
There was a silly mistake on the index update conditions, the line

if (min_idx.first == -1 || r -1- l < min_idx.second - min_idx.first)
			min_idx = make_pair(l, r - 1);

should be changed to

if (next_match==Q.size()&&(min_idx.first == -1 || r -1- l < min_idx.second - min_idx.first))
			min_idx = make_pair(l, r - 1);

Then the following test passes

Q = { "0", "2", "9", "4", "6" };
A = { "0", "1", "2", "3", "4", "5", "6", "7", "8", "9", "2", "4", "6", "10", "10", "10", "3", "2", "1", "0" };
min = Prob13_10(Q, A);
assert(min.first == 0 && min.second == 12);

I know that my explanation sucks, I have to work on it… I edited the original post for relevance sake.

0 Likes

#6

hi zebullon,

i just spoke with tsung-hsien about your program. to solve the problem, i believe it’s essential to record the most recent occurrences of the keywords in Q.

here’s another failing test case that shows more clearly the need for this:
A = { “A”, “X”, “B”, “A”, “X”, “Y”, “Z”, “B”, “C”}
Q = {“A”, “B”, “C”}

Your program does not record the second A and B, so it reports [0,8] as the optimum subarray, whereas it’s actually [3,8]

Let us know what you think,

Cheers,
Adnan

0 Likes

#7

You’re absolutely right, it is a case that breaks my solution.
More generally, any input of the form “…,[Q1,…,Qp,…,[Q1,…,Qn]]” (n cardinal of Q, p< n, assuming the growing covers Q1~Qp and Q1~Qn are correct) will fail my solution, since I never reconsider the starting point unless it’s the Q1,Q1,…,Q2 case.
I will edit the initial post to flag the answer as incorrect.

Thanks, at least I learned something :wink:

0 Likes