CS 411 Fall 2025 > Outline & Supplemental Notes for September 26, 2025
CS 411 Fall 2025
Outline & Supplemental Notes
for September 26, 2025
Outline
Variable-Size-Decrease Algorithms [L 4.5]
- Idea
- Some algorithms rely on solving smaller problems whose size varies, depending on the input. The performance of such algorithms can similarly vary.
- The most notable examples are Quicksort and related algorithms. Most of these use the Divide and Conquer strategy, but at least one—Quickselect—uses Decrease and Conquer.
- Selection
- The Selection Problem
- Given a list and an integer \(k\) with \(0\le k < n\), where \(n\) is the size of the list, return the item that would be at index \(k\) if the list were sorted.
- Obvious algorithm: do a fast sort, then look-up by index.
With \(n\) being size of list
and the basic operation being any one- or two-item operation:
\(\Theta(n\log n)\).
- Note: We use this model in all of what follows.
- If \(k = 0\), then we are finding the minimum. If \(k = n-1\), then we are finding the maximum. Both easily linear-time.
- If \(k = \frac{n}{2}\), then we are finding a median.
Not obviously linear-time.
- But linear-time algorithms do exist!
- The Selection Problem
- Quickselect
- How it works.
- Partitioning algorithms: Lomuto, Hoare. We use Lomuto here.
- Write code.
- Analyze.
- Size of input: \(n={}\)number of items in given list.
- Basic op: comparison.
- Average case \(\Theta(n)\).
- Worst case \(\Theta(n^2)\).
- Optimizing Quickselect. (See Supplemental Notes)
- How it works.
- Nim (not covered in class, but, I think, very interesting)
Supplemental Notes
Optimizing Quickselect
Introduction
Quickselect can perform poorly in some circumstances. Unoptimized, it is very slow for sorted or nearly sorted data; in fact this is when its worst case happens. It also has linear recursion depth—again, when unoptimized—and thus linear additional space usage. In addition, it has quadratic worst-case time efficiency.
However, all of these troubles can be eliminated, using optimizations similar to those that are commonly used for Quicksort.
Pivot Choice: Median-of-Three
Unoptimized Quickselect has its worst case when it is given a sorted list, since the first item is the worst possible pivot choice.
The standard solution to this is to use the median-of-three method to choose the pivot. We look at the first item in the list, the last item, and an item halfway through. Whichever of these is between the other two is our pivot.
While using median-of-three greatly improves the performance of Quickselect on sorted or nearly sorted data, the worst case is still quadratic. A sequence that, when given as input, results in worst-case behavior is called a killer sequence. Killer sequences for Quickselect with median-of-three pivot choice tend to be complex, but they still exist and produce quadratic behavior.
Killer sequences for algorithms using median-of-three have been dismissed as “rare” and thus not worth worrying about. But such reasoning has two flaws.
- It does not take into account how important worst-case performance might be to any particular application. For example, occasional poor performance is usually acceptable in social media; it may not be in medical and security software.
- It ignores the rise of the malicious user. Historically, software developers could assume that their users and their customers were mostly on the same side. But since the coming of the web, users may be people who are trying to break the customer’s computer system. When a user might deliberately introduce bad data, arguments based on the supposed rarity of bad data no longer apply.
Tail-Recursion Elimination
Quickselect makes one recursive call; this happens at the end of the algorithm. Thus, Quickselect can easily be converted to an iterative algorithm using tail-recursion elimination.
The effect on Quickselect’s space usage is significant. The worst-case recursion depth of Quickselect is linear, resulting in linear additional space usage for the unoptimized algorithm. But when the recursion is eliminated, The only space required by Quickselect beyond that used to store its input is a few bytes to hold local variables. The result is constant additional space usage; that is, with this optimization, Quickselect is in-place. (Note: Our next optimization will greatly decrease the worst-case time usage, but will make the algorithm no longer in-place.)
Introselect
In a 1997 paper, David Musser, an algorithms researcher at the Rensselaer Polytechnic Institute, introduced an important algorithmic idea. He proposed an optimization for algorithms with good average-case performance but poor worst-case performance. The idea, which he called introspection, is that an algorithm monitors its performance; if this looks poor, then it switches to an alternate algorithm, one with a better worst case. This technique is particularly useful for algorithms like Quicksort and Quickselect.
In order to do introspection, we must answer two questions.
- How do we decide when performance is poor?
- What alternate algorithm do we switch to?
For Quickselect, the first question is tricky. For typical data, we expect the recursion depth of Quickselect to be around \(\log_2 n\). We could monitor the recursion depth (counting eliminated tail calls as recursive calls) and switch to a linear-time selection algorithm if the depth exceeds, say, \(2\log_2 n\). This would result in a \(\Theta(n\log n)\) worst case—better than the quadratic worst case of ordinary Quickselect, but worse than the linear-time behavior of our alternate algorithm. A better strategy is to keep track of the sum of the sizes of the ranges handled so far. If this total exceeds \(kn\), for some small positive constant \(k\) (Musser never specified a value for \(k\)), then switch to the alternate algorithm. This approach gives a \(\Theta(n)\) worst case.
For Quickselect, the usual alternate algorithm (second question above) is actually a variation on Quickselect itself; the only difference is that we use a sophisticated pivot-choice method. This algorithm is called Median-of-Medians Selection; it may also be referred to as Blum-Floyd-Pratt-Rivest-Tarjan Selection, or BFPRT, after the authors of the 1973 paper that introduced it.
Pivot choice in Median-of-Medians Selection works as follows. The list is chopped into pieces of size \(5\); one of the pieces may need to be smaller. For each piece, we find its median (this is easy). Then all these medians are moved to the beginning of our list, using swap operations, to make a little list. We then find the actual median of this little list, using a recursive call to our selection algorithm (Median-of-Medians Selection). The item returned by this recursive call is used as the pivot for the usual Quickselect method.
Median-of-Medians Selection has a linear-time worst case. However, it is a rather slow linear time—quite a bit slower than the average case of ordinary Quickselect. A naive implementation makes two recursive calls: one to find the pivot, and one for the ordinary Quickselect recursion. Since the latter is a tail call, we can eliminate it easily, as with normal Quickselect. The former call is not a tail call. The resulting algorithm has logarithmic recursion depth, and so requires logarithmic additional space.
Quickselect with Median-of-Medians Selection as an alternate algorithm is called Introselect. It has a linear-time worst case and an average-case time that is as good as ordinary Quickselect. Its additional space usage is logarithmic.
Summary
The table below summarizes the three Quickselect optimizations we have covered.
Before | Optimization | After |
---|---|---|
Worst performance is on (nearly) sorted data. | Pivot selection via median-of-three, or other smart method. | Performance on nearly sorted data same as average case. |
Recursion depth and space usage \(\Theta(n)\). | Tail-recursion elimination. | Recursion depth and space usage \(\Theta(1)\). Note that the Introselect optimization increases this to \(\Theta(\log n)\). |
Worst case time \(\Theta(n^2)\). | Introspection: track sum of sizes of ranges processed. If total exceeds \(kn\), for some constant \(k\), then switch to Median-of-Medians Selection. | Introselect. Worst case time \(\Theta(n)\). Space usage \(\Theta(\log n)\). |