CS 411 Fall 2025  >  Outline & Supplemental Notes for October 13, 2025


CS 411 Fall 2025
Outline & Supplemental Notes
for October 13, 2025

Outline

Balanced Search Trees [L 6.3]

Supplemental Notes

Associative Data & CRUD

The overall problem that is being solved in this section is how to store associative data: data in which look-ups are done by key. Typically, there is a value corresponding to each key, and we store key-value pairs in some data structure. When we use associative data, we are usually not concerned with the organization of the data; we simply want to be able to find and operate on individual data items quickly.

Associative data is very common. For example, your UAID is a key; the corresponding value is everything the U. of Alaska knows about you: classes, grades, addresses, etc.

There are four single-item operations that are performed on an associative dataset.

Create
Insert a new key and associated value into the dataset.
Read
Retrieve an item (key, value) by key.
Update
Change the value associated with a key.
Delete
Remove an item from the dataset, by key.

A commonly used acronym is based on the first letters of these four operations: CRUD.

A stored dataset that supports CRUD operations involving arbitrary keys is called a dictionary, or table. Some datasets limit the keys that can be used; we will look at this idea later.

When we deal with CRUD operations, our input consists of the existing data structure, as well as the key, and possibly the new value, involved in the operation. The size of the input is the number of items currently in the structure (plus one for the specified key, although this \(+1\) can usually be ignored when doing analysis). The basic operations are any operation on a single key or value, or key-value pair, along with the usual built-in C operators: pointer assignment/dereference, etc.

There are many possible implementations for a dictionary. An ordinary Binary Search Tree has an average-case performance, for each of the CRUD operations, of logarithmic time, but its worst-case performance is linear time. Each of the balanced search trees (2-3 Tree, 2-3-4 Tree, Red-Black Tree, AVL Tree, B-Tree, B+ Tree, B* Tree, Splay Tree, etc.) can do each CRUD operation in logarithmic time. Later in the semester we will look at a very different dictionary implementation: a Hash Table.

Balanced Search Trees in Practice

The C++ Standard Library containers std::set, std::map, std::multiset, and std::multimap are specified with the idea that they should be implemented using a balanced search tree. In practice, a Red-Black Tree is generally used.

For in-memory tables, balanced search trees have in many cases been supplanted by Hash Tables (which we will look at later). Following the precedent set by the Perl programming language, many new languages have built-in hash-table implementations. The C++11 Standard added the containers std::unordered_set, std::unordered_map, etc., which are implemented using Hash Tables.

However, balanced search trees still have two important applications. First, balanced search trees should be used when worst-case performance is particularly important. A well written Hash Table will have excellent average-case/amortized performance, but its worst case is quite poor: linear time for all CRUD operations. In contrast, balanced search trees can do the CRUD operations in logarithmic time. (So you can use a Hash Table for your app that makes gross sound effects. Do not use one when you program a pacemaker. For everything in between: THINK.)

Second, balanced search trees are commonly used to implement external tables, particularly those that are stored on block-access devices, like most disks. External means not stored in memory. Perhaps it is on a mass-storage device; it might be accessed via a network. The point is that the connection from the processor to the data is relatively slow. In particular, many (most? all?) modern filesystems represent each directory using some B-Tree variant.

Lastly, it is worth noting, particularly in view of the text’s focus on them, that AVL Trees are used relatively rarely. This is not because they are particularly awful data structures. Rather, they are simply not the best choice in most cases. For an in-memory table whose worst-case performance is important, a Red-Black Tree is generally faster if many inserts and deletes will be done. If no inserts or deletes will be done, then an AVL Tree is faster than a Red-Black Tree, but a sorted array + Binary Search is faster yet. For such a table whose worst-case performance is less important, a Hash Table gives faster average-case performance. For a table stored on disk, the various B-Tree variants are faster. AVL Trees and variations on them are the best solutions when inserts and deletes are done, but not nearly as often as find operations.