Error in 5.12 Sample offline data when k > n / 2?

**Niourf** · August 18, 2018, 4:57pm

Hi!

I am wondering if the following claim from the book is correct:

When k is bigger than n/2, we can optimize by computing a subset of n - k elements to remove from the set.

For example, let’s consider the list [a, b, c] from which we want a random subset of size k = 2.

The possible subsets are ab, ba, ac, ca, bc, cb (I’m ignoring formatting).

But by picking n - k = 1 element to remove, we can only get the following final arrays:

Which has 2 drawbacks:

Now the “random subset” is at the end of the array instead of at the beginning. This can be fixed by swapping with the last element instead of the first one, but that’s extra code.
More importantly, some other subsets are impossible to generate.

Maybe I am missing something, or not doing it correctly?

Thanks for any feedback!

**heychirag** · June 4, 2020, 6:17pm

Instead of putting elements at the start, you can put them at the back while computing.
Although, it is missing subsets, the probability is still the same. This is exactly where it is saving time. For example, consider subset [b,c]. If we generate all six subsets, the probability of getting this subset is 1/3 because [b,c] and [c,b] are same. And when you generate three subsets, the probability is still 1/3.