What Sort of Sort?
Google CEO Eric Schmidt asked "How do you determine good ways of sorting 1 million 32-bit integers in two megabytes of RAM?" McCain laughed it off as good natured fun at his expense. However, he missed out on a great opportunity to show those Googlers that they're not as smart as they would have us believe they are.
[Full disclosure. This log is hosted by Blogger, which is owned by Google. So, if this post never shows up in any Google searches, I think I'll know why.]
Oh, but if I were John McCain. Perhaps his days as a Vietnam POW softened him up to the vagaries of badly worded questions. The first thing I would say is, Eric, I would expect you, as the leader of one of the biggest computing companies in the world, to at least be able to ask a non-ambiguous question.
Do you mean, how would I find some ways that are good for sorting said integers in said limited memory platform? Or did you mean how would I decide on the goodness among various presented algorithms for sorting under these conditions? And, whatever do you mean by "good"? Do you mean the fastest? Least disk thrashing? Fewest license fees? Or, are you asking for my definition of good? When you say I get 2 megabytes of RAM, I guessing you'll also give me some disk space, and perhaps a bit of extra memory to put my hypothetical code into. Or maybe my code will fit in the space differential between two megabytes and two million bytes (almost 95K).
Suppose good old Eric laughed at my friendly jab, and clarified his question. Don't you think that the correct answer to the question would be, well, given that my college textbooks are long gone, along with all the textbook sorting algorithms therein, my first shot at finding a good algorithm would be, what else, Google. As in, type "memory constrained sorting algorithm" into my Google toolbar and see where it takes me. What do you want to bet it would take me straight to Wikipedia, and a catalog of all popular sorting algorithms?
What would I find there? Not sure. But I'm going to guess that my best bet for a fast sort would be to throw the first half of the integers into my limited memory and sort them using whatever fast sort is in fashion these days and save them, throw the second half into the same memory and sort them similarly, and then merge the two sorted lists.
Or perhaps I'd try something cute (in a geeky cute kind of way). I'd sort only the upper 16 bits of each integer. Then when I was done I'd only have to re-sort the small subset in which the upper 2 bytes were the same. Maybe I'd get a win there, but probably I'd just waste alot of time trying to figure it out.
So, how would I determine the goodness of the algorithms I came up with? Well, classically I'd count how many loops and nested loops there were, and use that to figure out how well they scale up. But since I have a very specific problem to solve (one file of one million numbers, two megabytes of memory), I'd really only have to know one measurement. I'd probably end up creating an Excel spreadsheet, put in 200 numbers, 2,000 numbers, 20,000 numbers. Then I'd write some freaky-looking set of formulas that implement my kooky algorithm and get out my stopwatch.
But only if I were a geek.
Senator McCain, any thoughts?