11.9 - Find The Missing IP Address - C++



I am having a hard time understanding the down-sides of the less optimal solutions explained before the solution.

We could add all the IP addresses in the file to a hash table, and then enumerate IP addresses, starting with, until we find one not in the hash table. However, a hash table requires considerable overhead - of the order of 10 bytes for each integer, i.e., of the order of 10 gigabytes.

How did you estimate 10 bytes for each integer and then 10 gigabytes?

I thought we could create the IP address as a 32-bit unsigned number like so: A * 256^3 + B * 256^2 + C * 256 * 1 + D for IP A-B-C-D. And if 32 bits equate to 4 bytes (32 bits / 8 bits per byte) then each converted IP should only use up a few more than 4 bytes. So a hash table would have a billion entries of [key = converted IP, value = true]. And 4 * a billion would be 4Gbs. Please let me know where my reasoning fails.



Hi @kolaworld,

bool is a special type that costs mostly 1 bytes in most programming languages, so an entry in hash table is 4 + 1 bytes. Also, since most hash table implementations will double the size when load factor reaches certain threshold (assuming 50%), it means in the worst case, all those IP addresses use 50% capacity of hash table which results in 5 bytes * 1B entries / 50% = 10GB.

It is a good question that we don’t explain very well in the book, and we will improve this in the future edition, and many thanks for your question.



Can we improve the solution and use either 4 groups/counters[0-255] (if not hash lists) of 8 bits each or 8 groups/counters[0-15] of 4 bits each, to know if an IP address exists or does not exist? This is similar to a FIB table lookup as in mtrie that improves space/time complexity. I had hard time understanding program in the book with many loops ranging 0 to 65535, may be I am missing something.