Thanks for your reply, but I am not quite sure if I made myself clear.
I do understand that “sorting of an array of length n is different from getting the k smallest (or largest) elements of such array (assuming k << n).” My question is not challenging this fact.
Let’s go back to the original problem in the boot camp. The problem is to find the k largest elements from a stream of n elements (where k might be much smaller than n). The solution is to construct and keep a min-heap of size k. By popping the smallest element in the min-heap each time when reading a new element from the stream until consuming all the n elements in the stream, one can guarantee that the remaining elements in the heap are k largest elements of the original stream.
My question is NOT “why not sorting the n elements from the beginning”. My question is: inside the code, the last line outputs the k elements remaining in the heap (which is of size k) by function heapq.nsmallest. I don’t understand why you will need to use this function for the output, given that you have only k elements (not n) in the heap now.
My understanding is: the k elements in the heap are guaranteed to be the largest k elements of the stream, so why not outputting them directly, if the output doesn’t need to be sorted? Or if it needs to be sorted, you can just sort them, there are only k elements now (not n).
Instead, if you use heapq.nsmallest function in the last step, we can see what will be happening from the inside implementation of this function in Python. The fact is: the heap of k elements will be passed to construct a new list, and that the new list will be heapified, and then the k elements will be sorted. In my view, it is unnecessary for all steps but the last one, because in fact, you know the input to the nsmallest function is already a heap, so you can just sort it (which is the last step of nsmallest function itself) and skip the other steps of a standard nsmallest function.
Not sure if I made myself clear and if I am correct?