This variant asks: Given a string A and a regular expression r, what is the string in the language of r that is closest to A?
I just want to verify that I understand the question correctly:
Given regex “abcd” and string “aefbcgd”, the solution should return “bc”.
Is that correct? Thanks!
Hi @lgtout ,
The regular expression we discussed here shall contains “*” or “.” or “?” as those will create a set of strings. Then the problem here would become among those strings produced by the regular expression, which one is the closest one.
Ah, I see. That’s quite different from what I thought. Ok, thanks @tsunghsienlee.
If I apply your explanation, then this variant is a very trivial change to problem 16.2. All that’s necessary is to apply the regex and iterate through the resulting strings, applying the Levenshtein function (from the 16.2 solution) to each and finding the one with the least Levenshtein distance.
Am I missing something?
None of the other variants I’ve seen in the book add so little difficulty to the original problem as this one. Which makes me think I must be missing something in my interpretation of the problem statement.
I am not sure that is the right way as there are almost infinite strings when the regex contains those metacharacters (i.e., * and .). So the difficulty here is how to solve that problem without enumerating all possible combinations of strings.
I’m still having trouble understanding this problem. Could you give me a concrete example? Thanks.
Like A*b and AAb. One is regex and the other is pure string.
Thanks. Are these the inputs to the solution function? And what would be the result in this case?
Yes, and the output is the distance, which shall be an int.
Is the result in this case AAb, since this has a distance of 0?
Can the regex contain more than one of “.”, “*”, “+”?
How about combinations such as “.*”, “.+”?
Are “.*?+” the only metacharacters allowed?
@tsunghsienlee I think this problem, as worded in the book, would benefit from clearer wording in future editions. In particular, I think the phrase “…in the language of the regular expression r …” is at best ambiguous, at worst a misdirection.
There will be a overhaul of variants in the future. Because we think variants is a good way for readers keep practicing what they want. However, we still have some urgent things we would like to focus on now so we decide to keep this what we have now.
Personally, I love having the variants. And I hope they stay in future editions. But perhaps because most people don’t do the variants, issues with them may have a tendency of not being caught as early as issues with the “main” problems.
I love variants in the way it provides a way for readers to practice similar problems with the knowledge they just learn. However, lacking solution due to space is a problem here. Therefore we would like move it to problems with test data by using our newly built testing framework. However, it would take time for us to prepare those test data and rewrite those variants. Furthermore, we might need to consolidate those again.