Hi EPI,
I am having a hard time understanding the down-sides of the less optimal solutions explained before the solution.
We could add all the IP addresses in the file to a hash table, and then enumerate IP addresses, starting with 0.0.0.0, until we find one not in the hash table. However, a hash table requires considerable overhead - of the order of 10 bytes for each integer, i.e., of the order of 10 gigabytes.
How did you estimate 10 bytes for each integer and then 10 gigabytes?
I thought we could create the IP address as a 32-bit unsigned number like so: A * 256^3 + B * 256^2 + C * 256 * 1 + D for IP A-B-C-D. And if 32 bits equate to 4 bytes (32 bits / 8 bits per byte) then each converted IP should only use up a few more than 4 bytes. So a hash table would have a billion entries of [key = converted IP, value = true]. And 4 * a billion would be 4Gbs. Please let me know where my reasoning fails.