CS 411 Fall 2025 > Outline & Supplemental Notes for October 13, 2025

CS 411 Fall 2025
Outline & Supplemental Notes
for October 13, 2025

Outline

Balanced Search Trees [L 6.3]

Associative data & CRUD (see Supplemental Notes)
- Look-up by key.
- CRUD operations: Create, Read, Update, Delete.
Binary Search Trees
- How they work.
- Problem: linear-time everything (but logarithmic-time CRUD operations in average case).
Solutions to problem of efficiently handling associative datasets: self-balancing trees. These have special insert and delete algorithms that keep the tree balanced, meaning that the heights of the two subtrees of each node do not differ greatly (for some meaning of “greatly”). All such trees have logarithmic-time CRUD operations.
- Instance simplification*—balanced binary search trees with a little extra data in each node: AVL Tree, Red-Black Tree.
  - *Actually, these are almost instance simplification. True, an AVL Tree and a Red-Black Tree are each a special kind of Binary Search Tree, but in order for their insert/delete algorithms to be efficient, each of these kinds of trees needs just a little extra data in each node.
- Representation change—allow a node to contain more than one data item and to have more than two children: 2-3 Tree, 2-3-4 Tree, B-Tree & variants.
AVL Tree
2-3 Tree
Generalizing 2-3 Trees
- First direction: generalize to 2-3-4 Tree. Then represent as Binary Search Tree + 1 bit per node: Red-Black Tree.
- Second direction: allow much larger nodes. B-Tree. Good for block-access data. Many variants (B+ Tree, B* Tree, etc.).
  - The text classifies B-Trees under “Space-time trade-offs”. We will cover them later in the semester.
Balanced Search Trees in practice (see Supplemental Notes)

Supplemental Notes

Associative Data & CRUD

The overall problem that is being solved in this section is how to store associative data: data in which look-ups are done by key. Typically, there is a value corresponding to each key, and we store key-value pairs in some data structure. When we use associative data, we are usually not concerned with the organization of the data; we simply want to be able to find and operate on individual data items quickly.

Associative data is very common. For example, your UAID is a key; the corresponding value is everything the U. of Alaska knows about you: classes, grades, addresses, etc.

There are four single-item operations that are performed on an associative dataset.

Create: Insert a new key and associated value into the dataset.
Read: Retrieve an item (key, value) by key.
Update: Change the value associated with a key.
Delete: Remove an item from the dataset, by key.

A commonly used acronym is based on the first letters of these four operations: CRUD.

A stored dataset that supports CRUD operations involving arbitrary keys is called a dictionary, or table. Some datasets limit the keys that can be used; we will look at this idea later.

When we deal with CRUD operations, our input consists of the existing data structure, as well as the key, and possibly the new value, involved in the operation. The size of the input is the number of items currently in the structure (plus one for the specified key, although this \(+1\) can usually be ignored when doing analysis). The basic operations are any operation on a single key or value, or key-value pair, along with the usual built-in C operators: pointer assignment/dereference, etc.

There are many possible implementations for a dictionary. An ordinary Binary Search Tree has an average-case performance, for each of the CRUD operations, of logarithmic time, but its worst-case performance is linear time. Each of the balanced search trees (2-3 Tree, 2-3-4 Tree, Red-Black Tree, AVL Tree, B-Tree, B+ Tree, B* Tree, Splay Tree, etc.) can do each CRUD operation in logarithmic time. Later in the semester we will look at a very different dictionary implementation: a Hash Table.

Balanced Search Trees in Practice

The C++ Standard Library containers std::set, std::map, std::multiset, and std::multimap are specified with the idea that they should be implemented using a balanced search tree. In practice, a Red-Black Tree is generally used.

For in-memory tables, balanced search trees have in many cases been supplanted by Hash Tables (which we will look at later). Following the precedent set by the Perl programming language, many new languages have built-in hash-table implementations. The C++11 Standard added the containers std::unordered_set, std::unordered_map, etc., which are implemented using Hash Tables.

However, balanced search trees still have two important applications. First, balanced search trees should be used when worst-case performance is particularly important. A well written Hash Table will have excellent average-case/amortized performance, but its worst case is quite poor: linear time for all CRUD operations. In contrast, balanced search trees can do the CRUD operations in logarithmic time. (So you can use a Hash Table for your app that makes gross sound effects. Do not use one when you program a pacemaker. For everything in between: THINK.)

Second, balanced search trees are commonly used to implement external tables, particularly those that are stored on block-access devices, like most disks. External means not stored in memory. Perhaps it is on a mass-storage device; it might be accessed via a network. The point is that the connection from the processor to the data is relatively slow. In particular, many (most? all?) modern filesystems represent each directory using some B-Tree variant.

Lastly, it is worth noting, particularly in view of the text’s focus on them, that AVL Trees are used relatively rarely. This is not because they are particularly awful data structures. Rather, they are simply not the best choice in most cases. For an in-memory table whose worst-case performance is important, a Red-Black Tree is generally faster if many inserts and deletes will be done. If no inserts or deletes will be done, then an AVL Tree is faster than a Red-Black Tree, but a sorted array + Binary Search is faster yet. For such a table whose worst-case performance is less important, a Hash Table gives faster average-case performance. For a table stored on disk, the various B-Tree variants are faster. AVL Trees and variations on them are the best solutions when inserts and deletes are done, but not nearly as often as find operations.

CS 411 Fall 2025 Outline & Supplemental Notesfor October 13, 2025

Outline

Balanced Search Trees [L 6.3]

Supplemental Notes

Associative Data & CRUD

Balanced Search Trees in Practice

CS 411 Fall 2025
Outline & Supplemental Notes
for October 13, 2025