Solution 11.1 (Java): unclear remark

#1

Hi guys,

I don’t understand your remark at the end of 11.1 solution:

Alternatively, we could recursively merge the k files, two at a time using the merge step from merge sort. The time complexity is the same, i.e. O(n log k). The space complexity of any reasonable implementation of merge sort is O(n), which is considerably worse than the heap based approach when k << n.

  • if we merge 2 files at a time, then we just need space for 2 records from each file at any given moment. You say about O(n) space of merge sort, but we are not doing sort, we are doing just merge here.
  • the time complexity of repeating 2-way merges will be actually worse than heap-based solution, because we compare the same elements multiple times: during 1st merge we compare elements of A and B, during 2nd merge we compare (A+B) and C, during 3rd merge - (A+B+C)+D. So elements of A will be compared (k-1) times, of B (k-2) time and so on. This leads to O(nk) complexity.
0 Likes

#2

The algo I had in mind was to merge all arrays pair-wise, taking us from k to k/2 arrays. We repeat, to get to k/4, k/8, …, 4, 2, 1 arrays.

The final merge step will need O(n) additional space. The reason is that we cannot keep the last two arrays of length (n/2) each in the array holding the final result - it’s not possible to do an inplace mergesort (in any reasonable way).

The time complexity for the first step (k to k/2 arrays) is O(n), because the time to merge arrays of lengths N1 and N2 is O(N1 + N2). the total of the lengths of the arrays starting out is exactly n. each stage takes the same time and there are O(log k) stages (since we half the number of arrays at each stage).

Looking at your question, I think the confusion comes from my not clarifying exactly what the recursive merge is, it’s not one more array at a time (in which case the issues you raised are valid). I will put a small clarification about this, thanks for taking the trouble to bring it up.

Cheers,
Adnan

0 Likes