Error in 5.12 Sample offline data when k > n / 2?


#1

Hi!

I am wondering if the following claim from the book is correct:

When k is bigger than n/2, we can optimize by computing a subset of n - k elements to remove from the set.

For example, let’s consider the list [a, b, c] from which we want a random subset of size k = 2.

The possible subsets are ab, ba, ac, ca, bc, cb (I’m ignoring formatting).

But by picking n - k = 1 element to remove, we can only get the following final arrays:

  • [a, b, c] (we picked a and swapped it with itself). This give bc
  • [b, a, c] (we picked b and swapped it with a). This gives ac
  • [c, b, a] (we picked c and swapped it with a). This gives ba

Which has 2 drawbacks:

  • Now the “random subset” is at the end of the array instead of at the beginning. This can be fixed by swapping with the last element instead of the first one, but that’s extra code.
  • More importantly, some other subsets are impossible to generate.

Maybe I am missing something, or not doing it correctly?

Thanks for any feedback!