Woodstock Blog

a tech blog for general algorithmic interview questions

[CC150v5] 11.4 Sort 20GB File

Question

Imagine you have a 20 GB file with one string per line.

Explain how you would sort the file.

Solution

Use External Sort. For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:

  1. Divide the file into chunks of 100MB each. Sort using quicksort.

  2. Read the first 10 MB of each sorted chunk into input buffers in RAM and allocate the remaining 10 MB for an output buffer. (In practice, it might provide better performance to make the output buffer larger and the input buffers slightly smaller.)

  3. [Important] Perform a 9-way merge and store the result in the output buffer. Whenever the output buffer fills, write it to the final sorted file and empty it. Whenever any of the 9 input buffers empties, fill it with the next 10 MB of its associated 100 MB sorted chunk until no more data from the chunk is available.