[NOTE: this page was downloaded from the Wayback Machine in 2012. The original page was written by Michael Lamont & hosted by linux.wku.edu from 2001-2009. JJB]

**Sorting Algorithms**

One of the fundamental problems of computer science is ordering a list of items. There's a plethora of solutions to this problem, known as sorting algorithms. Some sorting algorithms are simple and intuitive, such as the bubble sort. Others, such as the quick sort are extremely complicated, but produce lightening-fast results.

Below are links to algorithms, analysis, and source code for seven of the most common sorting algorithms.

Bubble sort

Heap sort

Insertion sort

Merge sort

Quick sort

Selection sort

Shell sort

The common sorting algorithms can be divided
into two classes by the complexity of their algorithms.
Algorithmic complexity is a complex subject (imagine that!) that
would take too much time to explain here, but suffice it to say
that there's a direct correlation between the complexity of an
algorithm and its relative efficiency. Algorithmic complexity is
generally written in a form known as Big-O notation, where the *O*
represents the complexity of the algorithm and a value *n*
represents the size of the set the algorithm is run against.

For example, *O*(*n*) means that an
algorithm has a linear complexity. In other words, it takes ten
times longer to operate on a set of 100 items than it does on a
set of 10 items (10 * 10 = 100). If the complexity was *O*(*n*^{2})
(quadratic complexity), then it would take 100 times longer to
operate on a set of 100 items than it does on a set of 10 items.

The two classes of sorting algorithms are *O*(*n*^{2}),
which includes the bubble,
insertion,
selection,
and shell
sorts; and *O*(*n* log n) which includes the heap,
merge,
and quick
sorts.

In addition to algorithmic complexity, the speed of the various sorts can be compared with empirical data. Since the speed of a sort can vary greatly depending on what data set it sorts, accurate empirical results require several runs of the sort be made and the results averaged together. The empirical data on this site is the average of a hundred runs against random data sets on a single-user 250MHz UltraSPARC II. The run times on your system will almost certainly vary from these results, but the relative speeds should be the same - the selection sort runs in roughly half the time of the bubble sort on the UltraSPARC II, and it should run in roughly half the time on whatever system you use as well.

These empirical efficiency graphs are kind of like golf - the lowest line is the "best". Keep in mind that "best" depends on your situation - the quick sort may look like the fastest sort, but using it to sort a list of 20 items is kind of like going after a fly with a sledgehammer.

As the graph pretty plainly shows, the bubble sort is grossly inefficient, and the shell sort blows it out of the water. Notice that the first horizontal line in the plot area is 100 seconds - these aren't sorts that you want to use for huge amounts of data in an interactive application. Even using the shell sort, users are going to be twiddling their thumbs if you try to sort much more than 10,000 data items.

On the bright side, all of these algorithms are incredibly simple (with the possible exception of the shell sort). For quick test programs, rapid prototypes, or internal-use software they're not bad choices unless you really think you need split-second efficiency.

Speaking of split-second efficiency, the *O*(*n*
log *n*) sorts are where it's at. Notice that the time on
this graph is measured in tenths of seconds, instead hundreds of
seconds like the *O*(*n*^{2}) graph.

But as with everything else in the real world, there are trade-offs. These algorithms are blazingly fast, but that speed comes at the cost of complexity. Recursion, advanced data structures, multiple arrays - these algorithms make extensive use of those nasty things.

In the end, the important thing is to pick the sorting algorithm that you think is appropriate for the task at hand. You should be able to use the source code on this site as a "black box" if you need to - you can just use it, without understanding how it works. Obviously taking the time to understand how the algorithm you choose works is preferable, but time constraints are a fact of life.

Michael's Homepage WKU-Linux