I've written a poly phase merge sort using 3 "stacks" in C++. The key part of the logic is to merge from containers A and B into container C. It's expected for A or B to go empty with lots of data remaining on B or A. To deal with this, when A or B go empty, the empty container is swapped with C and the merge continues, until A and B go empty at the same time, in which case the sort is done. Without the swapping, then I'd have to code for all 6 possible permutations of A, B, C. For the C++ program, since in this case the stacks have a maximum size, each stack is simply an array with a stack pointer. Push is *--stack_pointer = value, pop is value = *stack_pointer++. This is quick, when sorting 16 million pseudo random 32 bit integers in 64 bit mode (16 registers) on my system, the poly phase merge sort using 3 "stacks" (a programming challenge) takes 1.6 seconds, versus 1.5 seconds for a 2 way merge sort, 1.4 seconds for a 4 way merge sort, or 1.45 seconds for quick sort. The programming challenge here is to emulate poly phase merge sort on 3 tape drives, where data is written forwards and read backwards, which is emulated using stacks (or anything LIFO). One alternative would be to use a linked list for the stack, which should work in Java, since a dummy / sentinal node's next pointer could be changed to point to any list, providing a means to swap "pointers" to lists. This would be slower, but Java apparently isn't meant for performance.