Thanks for putting the time into checking out my take on this problem.
TL,DR:
As the first loop processes the input array, the search range where the
first missing positive entry (FMPE) is located might be tightened.
Because of the tightening and rearranging of the array, the first loop
can stop earlier than the end of the initial array.
This is where the slight improvement is coming from (not considering
caching architecture
).
A more detailed explanation:
Initially we have N numbers in the array and the FMPE is a number somewhere between
[1…N+1], bounds included. For simplicity, let’s suppose that initially we expect, that the array will hold
all the numbers [1…N] in some random order. The first loop will bucket sort the values.
and the second loop will check the buckets and report N+1 as the FMPE.
If the bucket sort finds an outlier, an outlier is a value from the array
which is outside of the range [1…N], we have to change our expectation
given this new information. From now on we will think about the array as
at most N-1 different values between [1…N] and at least one outlier. There
is at least one value missing from the range [1…N], whose spot is occupied
by the outlier. This missing value or some smaller one will be our FMPE.
Now we can define a new array, by throwing away the outlier, overwriting
it with the value at the end of the array. The array could be shrunk,
instead of some expensive shrinking, it is enough to store the remaining length
of the array.
Now we transformed the initial problem into an equivalent one, where the input
array is smaller.
The variant of the first loop, guaranteeing it’s termination:
- current value is an outlier, value at the end of the array overwrites current value, array is shrunk by one
- array value at cursor is in the search range, cursor array value is copied to its respective bucket, the value inside the bucket is copied to the cursor location. If these two values match, we don’t have a variant. In this case we have to move the cursor.
- to prove that the variant holds, we also need to prove, that there are no swap loops. This is true because once a value is moved to it’s bucket, it won’t be changed in the actual swap loop again.