Issues with problem 12.11 in Java book

#1

I found a few issues with problem 12.11 in the Java book.

  1. counter is an ArrayList<Integer>. That means it is storing a boxed integer, which takes up 16 bytes on a 32-bit JVM, or 24 bytes on a 64-bit JVM. It should still fit within the “few megabytes” constraint but it’s not .25 megabyte. I would recommend just using an int array instead, so you don’t have to pay the boxing penalty. It seems like you use List<Integer> a lot in the Java book where an int array would be better.
  2. IP addresses are 32-bit unsigned values. For example, you could have 255.255.255.255 which would be 4294967295. But Java doesn’t support unsigned ints. In the solution you are using scanner.nextInt() which would fail to read 4294967295. The easy fix is to read longs and deal with going from long to int.
  3. InputStream mark / reset don’t work for a lot of types of InputStreams (for example FileInputStream). Instead of taking an InputStream parameter, take a File instead and open a FileInputStream. Then, when you go to scan the file the second time, close the FileInputStream and re-open it.

Cheers,

0 Likes

#2

Thanks for taking the trouble to share these issues with us.

  1. You are correct about the memory overhead. We started with a C++ implementation, where there’s no notion of box-types, that mindset carried over when analyzing the Java program. We do prefer List to in[], it was a conscious decision, makes working with Collections easier (addAll(), retainAll(), some constructor calls, for example) Pedantically, RandomAccess would be the right call, since we implicitly assume O(1) random access times, but we didnt want to muddy things further.) I know there’s a big overhead to using box types (I did a compare recently, it’s about 5-10x slower to use Arrays.sort() on Integer[] compared to int[]) but for teaching and interview purposes, asymptotically they are the same and box-types are easier to work with.

  2. Another good point, I think we will change the problem to say something like “in the abstract you have a sequence of integers and need to find a missing one” Since we use logical right shift, signedness issues are not relevant. If these were infact IP addresses, the long solution would work.

  3. We’ve reworked this problem recently to take an Iterable as argument, instead of an input stream, which gives us the ability to iterate and restart. This reduces the burden of coding up I/O, which is full of gotcha’s (like the one you pointed out). We’ve actually removed streams just about everywhere, replaced with Iterable or Iterator as appropriate.

In summary, we agree with your comments, and want to clarify that many of our choices are based on what works from a teaching perspective (and also an interview setting). We’d expect candidates to know the tradeoffs, but most interviewers would be happy with an asympytotically optimum solution, with the candidate demonstrating understanding of tweaks needed for the real world (like you pointed out, also the use of better memory management, short vs int, that kind of thing)

To prove that no good deed goes unpunished, I was wondering if you could take a look at some new material we wrote up, introducing the data structures via a min working example, some tips, and library review (about 30 pages). Let me know if you are interested, from your feedback on this problem I’m sure your input is very helpful,

bye
Adnan

0 Likes

#3

Thanks! I like this problem.

0 Likes

#4

PS - yes, I’m definitely interested in checking out the new material!

0 Likes

#5

Hey Eric,

Sorry for our late reply, and please contact us (tsung.hsien.lee@gmailcom, adnan.aziz@gmail.com) if you are still interested in reviewing those new materials for us.

0 Likes