CS 611 Fall 2013 > Outline & Supplementary Notes for Wednesday, October 2, 2013 |
CS 611 Fall 2013
Outline & Supplementary Notes for Wednesday, October 2, 2013
Outline
Variable size decrease algorithms [L 4.5; 5.6 in 2nd ed]
- Idea
- Quickselect
- The Selection Problem
- Given a list and an integer \(k\) with \(0\le k < n\), where \(n\) is the size of the list, return the item that would be at index \(k\) if list were sorted.
- Obvious algorithm: sort then look-up. \(\Theta(n\log n)\).
- If \(k = 0\), find minimum. If \(k = n-1\), find maximum. Both easily linear-time.
- If \(k = \frac{n}{2}\), find median. Not obviously linear-time. (But linear-time algorithms exist!)
- How Quickselect works.
- Partitioning algorithms: Lomuto, Hoare. We use Lomuto here.
- Write code.
- Analyze. Average case \(\Theta(n)\). Worst case \(\Theta(n^2)\).
- Optimizing Quickselect.
- The Selection Problem
- Nim (not covered in class, but, I think, very interesting)
Supplementary Notes
Optimizing Quickselect
Introduction
Quickselect can perform poorly in some circumstances. Unoptimized, it is very slow for sorted or nearly sorted data (in fact this is its worst case). It also has linear recursion depth—again, when unoptimized—and thus linear additional space usage. In addition, it has quadratic worst-case time efficiency.
However, all of these troubles can be eliminated, using much the same optimizations that are commonly used for Quicksort.
Pivot Choice: Median-of-Three
Unoptimized Quickselect has its worst case when it is given a sorted list, since the first item is the worst possible pivot choice.
The standard solution to this is to use the median-of-three method to choose the pivot. We look at the first item in the list, the last item, and an item halfway through. Whichever of these is between the other two is our pivot.
While using median-of-three greatly improves the performance of Quickselect on sorted or nearly sorted data, the worst case is still quadratic. A sequence that, when given as input, results in worst-case behavior is called a killer sequence. Killer sequences for Quickselect with median-of-three pivot choice tend to be complex, but they still exist and produce quadratic behavior.
Killer sequences for algorithms using median-of-three have often been dismissed as “rare” and thus not worth worrying about. But such reasoning has two flaws.
- It does not take into account the importance of consistent performance to any particular application. For example, occasional poor performance is usually acceptable in social media; it may not be in medical and security work.
- It ignores the rise of the malicious user. Historically, software developers could assume that their users and their customers were mostly on the same side. But since the coming of the web, users may be people who are trying to break the customer’s computer system. When a user might deliberately introduce bad data, the supposed rarity of bad data becomes irrelevant.
Tail-Recursion Elimination
Quickselect makes one recursive call; this happens at the end of the algorithm. Thus, Quickselect can easily be converted to an iterative algorithm using tail-recursion elimination.
The effect on Quickselect’s space usage is significant. The worst-case recursion depth of Quickselect is linear, resulting in linear additional space usage for the unoptimized algorithm. But when the recursion is eliminated, The only space required by Quicksort beyond that used to store its input is a few bytes to hold local variables. The result is constant additional space usage; optimized Quickselect is in-place.
Introselect
In 1997, David Musser introduced an important algorithmic idea. He proposed an optimization for algorithms with good average-case performance but poor worst-case performance. He called his idea introspection. An algorithm monitors its performance; if this looks poor, then a different algorithm can be used, one with a better worst case.
This technique is particularly useful for algorithms like Quicksort and Quickselect, which work well as long as the recursion does not get too deep. Since we want to partition a list into equal-sized parts, we generally look for a recursion depth of \(\log_2 n\) for both of these algorithms. (Here we are counting recursive calls even if tail-recursion elimination has made them not really recursive calls; these still require time.) Musser suggested that, if a recursive call finds that its depth is more than twice what we expect—\(2\log_2 n\)—then the portion of the data being processed by that recursive call should be turned over to a different algorithm.
For Quicksort, the usual alternate algorithm is Heap Sort, which has a \(\Theta(n\log n)\) worst case. The resulting introspective sorting algorithm is called Introsort.
For Quickselect, the usual alternate algorithm is actually a variant on Quickselect itself, with a different pivot-choice method. The algorithm is called Median-of-Medians Selection; It may also be referred to as the Blum-Floyd-Pratt-Rivest-Tarjan Selection algorithm (BFPRT), after the authors of the 1973 paper that introduced it.
Pivot choice in Median-of-Medians Selection works as follows. The list is chopped into pieces of size \(5\); the final piece may need to be smaller. For each piece, we find its median. Then all the medians are moved to the beginning of our list, using swap operations, to make a little list. We then find the actual median of this little list, using a recursive call to our selection algorithm (Median-of-Medians Selection). The item returned by this recursive call is used as the pivot.
Median-of-Medians Selection has a linear-time worst case. However, it is a rather slow linear time, quite a bit slower than the average case of ordinary Quickselect.
Quickselect with Median-of-Medians Selection as an alternate algorithm is called Introselect. I thas a linear-time worst case and an average-case time that is as good as ordinary Quickselect.
Summary
The Introselect algorithm does Quickselect, but switches to the linear-time Median-of-Medians Selection algorithm if the recursion depth exceeds \(2\log_2 n\). We also do tail-recursion elimination and a smart pivot choice method like median-of-three. The result is an in-place linear-time selection algorithm that performs reasonably well on nearly sorted data.
Example code:
quickselect.cpp
.
Does median-of-three pivot choice;
tail-recursion has been eliminated.
The Introselect optimization is not included.