11.9 - Find The Missing IP Address - C++

#1

Hi EPI,

I am having a hard time understanding the down-sides of the less optimal solutions explained before the solution.

We could add all the IP addresses in the file to a hash table, and then enumerate IP addresses, starting with 0.0.0.0, until we find one not in the hash table. However, a hash table requires considerable overhead - of the order of 10 bytes for each integer, i.e., of the order of 10 gigabytes.

How did you estimate 10 bytes for each integer and then 10 gigabytes?

I thought we could create the IP address as a 32-bit unsigned number like so: A * 256^3 + B * 256^2 + C * 256 * 1 + D for IP A-B-C-D. And if 32 bits equate to 4 bytes (32 bits / 8 bits per byte) then each converted IP should only use up a few more than 4 bytes. So a hash table would have a billion entries of [key = converted IP, value = true]. And 4 * a billion would be 4Gbs. Please let me know where my reasoning fails.

2 Likes

#2

Hi @kolaworld,

bool is a special type that costs mostly 1 bytes in most programming languages, so an entry in hash table is 4 + 1 bytes. Also, since most hash table implementations will double the size when load factor reaches certain threshold (assuming 50%), it means in the worst case, all those IP addresses use 50% capacity of hash table which results in 5 bytes * 1B entries / 50% = 10GB.

It is a good question that we don’t explain very well in the book, and we will improve this in the future edition, and many thanks for your question.

4 Likes

#3

Can we improve the solution and use either 4 groups/counters[0-255] (if not hash lists) of 8 bits each or 8 groups/counters[0-15] of 4 bits each, to know if an IP address exists or does not exist? This is similar to a FIB table lookup as in mtrie that improves space/time complexity. I had hard time understanding program in the book with many loops ranging 0 to 65535, may be I am missing something.

0 Likes