RabinKarp substring matching solution question


#1

Hi,
I’m trying to solve this problem and realized that the latest solution differs from the old revision of the book. The link is here :https://github.com/adnanaziz/EPIJudge/blob/master/epi_judge_cpp_solutions/substring_match.cc#L39

The latest revision removes the mod operation while the older book ( I have a 2014 print verison) has mod operation and even checks

t_hash -= t[i - size(s)] * power_s;
if (t_hash < 0)
t_hash += kMod;

I think the mod operation will guarantee the hash lying in certain range. But why the latest solution removes the mod? Is it possible to run into overflow situation if I choose a big base?

Thank you!


#2

Yes, we remove the mod operation since it is not necessary. You are right about it might overflow, but the beauty here is that overflow won’t affect the correctness at all. As it won’t affect the correct at all, we decide to remove it for simplicity.


#3

Hi Lee,
Can you elaborate? I did try using a large base (109 ,113, etc) and the test will still pass but I couldn’t figure out why overflow will not impact correctness of the algorithm.

I would think when hash function is rolling, both lines below might overflow but why t_hash would still be a correct one when matching is found?
t_hash -= t[i - size(s) ] * power_s;
t_hash = t_hash * kBase + t[i];

Thank you!


#4

Overflow does not mean loss of data, it is more like representation error because the size of container (i.e., integer) cannot represent it correctly. However, whatever you add into the integer just rolling over from the beginning when overflow happens without loss the magnitude. That is the reason it does not affect the correctness.