CS 311 Fall 2024 > Exam Review Problems, Set F
CS 311 Fall 2024
Exam Review Problems, Set F
This is the sixth of seven sets of exam review problems. A complete review for the Final Exam includes all seven problem sets.
Problems
Review problems are given below. Answers are in the Answers section of this document. Do not turn these in.
- A Binary Search Tree is shown below.
Perform each of the following operations and draw the resulting tree. For each part, start with the original tree above, not the tree resulting from earlier parts.
- Insert 25.
- Delete 10.
- Delete 30.
- Delete 20.
- What is a “search tree”? Give 5 examples of kinds of search trees.
- Fill in the blank: In a search tree, the primary factor determining how many steps the insert, delete, and retrieve operations require is the _______ of the tree.
- What are 2-nodes, 3-nodes, etc.?
- How do 3-nodes, etc., help us make things more efficient?
- Explain the difference between “position-oriented” ADTs and “value-oriented” ADTs.
- List two ADTs in each category.
- We did not precisely define the operations of the Table ADT. Why not? How can these operations vary?
- How is the Table ADT related to databases?
- What are Tables for? List three or four big categories of Table applications.
- What is a “Priority Queue”?
- Is a Priority Queue a kind of Queue? Explain.
- How can we simulate a Queue using a Priority Queue? Can we simulate a Stack?
- What data structure do we usually use to implement a Priority Queue? If we use this structure, then what is the order of the three main Priority-Queue operations?
- What is a “Binary Heap”?
- Explain the terms “Maxheap” and “Minheap”.
- How is a Binary Heap usually implemented?
- We generally think of a Binary Heap not so much as a data structure, but as a collection of algorithms. Explain.
- Reallocate-and-copy can slow down a Binary Heap. However, in many Binary Heap applications, reallocate-and-copy does not happen. Why not?
- What are Binary Heaps & Heap algorithms good for?
- Suppose we start with an empty Maxheap, and insert each of the following items, in the given order: 1 2 3 4 5 6 7 8.
- Draw the resulting Binary Heap (as a tree).
- Draw the array representation of the above Binary Heap.
- Now, we perform one Heap delete operation on the Binary Heap from the previous parts. Draw the resulting Binary Heap (as a tree).
- Explain how Heap Sort works.
- What nice property does Heap Sort have that no other sorting algorithm we studied has?
- In what situations is Heap Sort a good choice?
- Explain what a “key” is,
and its role in value-oriented ADTs and their implementations.
- State the order of the three main Table operations (insert, delete, retrieve)
when a Table is implemented as each of the following.
Duplicate keys are allowed.
- A (smart) unsorted array.
- A (smart) sorted array.
- A Binary Search Tree.
- A Red-Black Tree.
- We covered a number of possible ways to get Table-style functionality.
Two of the implementations we discussed are very commonly used.
Another implementation is occasionally used.
Yet another is not really a Table implementation.
List each of these four, and indicate when/why each method is used.
- The Table ADT is used in two conceptually different ways:
one in which the whole data item is the key,
and one in which only part of the data item is the key.
- Explain.
- How are these two handled in the C++ Standard Library?
- What, exactly, is a “2-3 Tree”.
- What is the order of the insert, delete, retrieve, and traverse operations for a 2-3 Tree?
- In what situations is a 2-3 Tree the best data structure to use?
- With the answer to the previous question in mind, why did we study 2-3 Trees?
- Starting with an empty 2-3 Tree, insert 1, 2, 3, 4, 5, in that order. After each insertion, draw the resulting tree.
- Explain what a “Red-Black Tree” is, and (roughly) how it works.
- What is the order of the insert, delete, retrieve, and traverse operations for a Red-Black Tree?
- In order for a data type to be stored in a Red-Black Tree, what properties does it need to have?
- What is an “AVL Tree”?
- What is the primary tree operation involved in insert & delete in an AVL Tree?
- Hashing is an attempt to reach an ideal situation. Describe this situation, and indicate what kind of performance (order of operations) would result if the ideal could be achieved.
- Why can we not achieve this ideal situation, in practice?
- What is a “hash function”?
- What properties must a hash function have?
- What are some properties that a good hash function needs to have?
- What is a “hashcode”?
- What is a “slot”? How is a key’s slot computed?
- In the context of Hash Tables, what is “closed hashing”?
- What is a “probe sequence”?
- Name and describe three kinds of probe sequences.
- Give a disadvantage of closed hashing, as compared with some other way to implement a Hash Table.
- How can we implement a Hash Table in a way that does not use closed hashing?
- When we implement a Hash Table in this way, what form do the array items in the Hash Table usually take?
- What is this method called?
- Do the Hash Tables in the C++ STL use closed hashing or the method discussed above?
- Your instructor says that,
if a Hash Table uses open hashing,
then in its worst-case situation,
it essentially becomes a Linked List.
Explain.
Answers
- 25 is greater than 20, less than 50, and less than 30.
Thus, we put it in a new left child of the 30 node.
- 10 is in a leaf. We simply delete it.
- 30 is in a node with 1 child. We point 30’s parent’s pointer to 30’s child.
Then we delete the 30 node.
- 20 is in a node with 2 children.
We copy (or swap) the value in 20’s inorder successor (30) to the 20 node,
and then follow the 1-child deletion procedure, as in the previous part.
- 25 is greater than 20, less than 50, and less than 30.
Thus, we put it in a new left child of the 30 node.
- A search tree is a tree in which inorder traversal,
or a generalization of this, gives the tree’s data items in sorted order.
We looked at seven kinds of search trees:
- Binary Search Tree
- 2-3 Tree
- 2-3-4 Tree
- Red-Black Tree
- AVL Tree
- B-Tree
- B+ Tree
- In a search tree, the primary factor determining how many steps the insert, delete, and retrieve operations require is the height of the tree.
- A \(k\)-node (\(k\) is an integer: 2, 3, 4, …) is a tree node containing \(k-1\) data items, and having \(k\) subtrees. These have an order relationship that generalizes that of a Binary Search Tree (all of whose nodes are 2-nodes): Items in the first subtree are less than or equal to the first data item, which is less than or equal to the items in the second subtree, which are less than or equal to the second data item, etc.
- 3-nodes, etc. help us write more efficient insert and delete algorithms, since they allow a bit of “give” in how many items are present in a portion of the tree. Thus, they let us insert and delete items without rearranging the whole tree.
- A search tree is a tree in which inorder traversal,
or a generalization of this, gives the tree’s data items in sorted order.
We looked at seven kinds of search trees:
- In a position-oriented ADT, we look up an item by its position in the data structure used to implement the ADT. The quintessential example is Sequence, in which we look up items by index. In a value-oriented ADT, we look up items by their value. Often a portion of this value, called the key, is used. The quintessential example is a Table, in which we look up items by key.
- Here are four position-oriented ADTs covered this semester:
- Sequence
- Stack
- Queue
- Binary Tree
- SortedSequence
- Binary Search Tree
- Priority Queue
- Table
- We did not precisely define the Table ADT in class, because the operations
can reasonably vary according to application and implementation.
We mentioned four ways in which the operations can vary:
- The key can be the entire data item or only part of it. Of course, the former can be implemented using the latter. However, the interesting special case of set-style data makes the first possibility important.
- Duplicate keys can be allowed or disallowed.
- Traversal can be sorted or not. The former is convenient for search-tree implementations, the latter for Hash Table implementations.
- We can have a separate operation to change an item’s value, or else the retrieve operation can return a modifiable value. Alternatively, in the case of set-style data, we can have unmodifiable data items, in which case neither operation is appropriate.
- The Table ADT is the natural interface to a database. The Table operations insert, delete, retrieve, and (maybe) change a data item, correspond to the SQL data-manipulation commands Insert, Delete, Select, and Update.
- Here are four big categories of Table applications:
- Sparse array-like data.
If we have an array “
a
”, and we want to say “a[0] = 1;
” and “a[100000000] = 2;
”, then a Table implementation is probably more suitable than an array. - “Arrays” whose indices are not non-negative integers.
When we want to do “
age["Fred"] = 20;
” and “age["cheese"] = 6;
”. - Data accessed via some key or code number. The University of Alaska has a huge database indexed by student ID number. This is a good application for a Table.
- Set Data. When we only care whether an item is in the structure or not, we are talking about a set, which is usually best implemented as a Table.
- Sparse array-like data.
If we have an array “
- We did not precisely define the Table ADT in class, because the operations
can reasonably vary according to application and implementation.
We mentioned four ways in which the operations can vary:
- Priority Queue is a value-oriented ADT. Items can be inserted, but only one item can be accessed: the front item. Which item is the front is determined by the items’ keys, which are called their priorities in this context. The front item is the item with the highest priority. It can be read and it can be removed, in which case a new item becomes the front item.
- No. In Queue, items are removed in the order they were inserted. In a Priority Queue, items are removed in order of priority. Thus, a Priority Queue is not a Queue.
- Using a Priority Queue, we can simulate a Queue by ensuring that each inserted item gets a lower priority than the previous one. We can also simulate a Stack, by ensuring that each item gets a higher priority than the previous one.
- We usually implement a Priority Queue using a Binary Heap, which is stored in an array. In this case, enqueue is linear time, or, if we can be sure that no reallocate-and-copy is required, logarithmic time (a fast linear time, though: amortized constant + logarithmic), dequeue is logarithmic time, and getFront is constant time.
- A Binary Heap is a complete Binary Tree in which, for each node, the item in the node is greater than or equal to the items in all the node’s descendants. In practice, since we always represent a Binary Heap using the array implementation of a complete Binary Tree, we refer to this array as a Binary Heap, if the Binary Tree it represents is a Binary Heap.
- A Maxheap is precisely what is described above. A Minheap is a variation in which items are less than or equal to the items in descendants. We use “Maxheap” and “Minheap” to avoid misunderstandings about the structure of our data.
- We usually implement a Binary Heap using the array implementation of a complete Binary Tree.
- It is convenient to use Heap algorithms to operate on existing arrays (or other random-access ranges of data). Thus, when we implement a Binary Heap, we often just implement the Heap algorithms, and let them operate on client-provided storage.
- Heap algorithms tend to be run on client-provided storage. Thus, the allocation is already done, and no reallocate-and-copy is necessary, or even possible.
- Binary Heaps ar good for:
- Implementing a Priority Queue.
- Doing Heap Sort.
- Doing variations on Heap Sort in which items are inserted in, or deleted from, the Heap in the middle of the sorting operation, or in which the sort is stopped before the entire dataset is sorted.
- Each successive item is placed in a new node;
we then sift-up the item.
The result, after all inserts, is illustrated below.
- The items appear in order from top to bottom, then left to right.
- The root item is swapped with the item in the last node (5).
The last node is then deleted.
We sift-down the new root item (5), swapping it with its largest child.
Result:
- Each successive item is placed in a new node;
we then sift-up the item.
The result, after all inserts, is illustrated below.
- To do Heap Sort, we begin with an empty Maxheap, insert all items into it, and then remove them all. This gives the items in reverse sorted order. Typically, Heap Sort is done in-place, with the Binary Heap growing from the beginning of the data as items are inserted, and shrinking again as they are removed. When we do it this way, a removed item goes into the spot freed up by the shrinking Binary Heap. Thus, the largest item goes in the last spot, etc., and the dataset ends up sorted.
- Heap Sort is the only \(O(n\log n)\) sort we studied that can be done in-place and using only constant [\(O(1)\)] additional storage.
- Heap Sort is useful:
- As a basis for other algorithms. In particular, it is part of Introsort.
- When generalized to handle data sets that are modified in the middle of sorting, or when sorting is stopped before the dataset is completely sorted.
- In situations in which very little extra memory is available.
- A key is the portion of the value of a data item that is used to look it up. Keys are used to look up items (for retrieval or deletion) in value-oriented ADTs. Efficient implementations of such ADTs must arrange items so that keys can be found quickly. Thus, in a search tree (or other sorted structure), items are stored in order sorted by key, and in a Hash Table, the keys are hashed.
- Smart Array. insert: amortized constant. delete: linear. retrieve: linear. Insert is constant time if we have enough memory pre-allocated.
- Smart Sorted Array. insert: linear. delete: linear. retrieve: logarithmic.
- Binary Search Tree. insert: linear. delete: linear. retrieve: linear. All three are logarithmic if the tree is strongly balanced. Except that it might not be.
- Red-Black Tree insert: logarithmic. delete: logarithmic. Retrieve: logarithmic.
- Here are the four:
- Self-Balancing Search Tree. This offers logarithmic-time insert, delete, and retrieve, and linear-time sorted traverse. We use this when we need a Table implementation with worst-case performance guarantees. For in-memory implementations, we would probably use a Red-Black Tree. For external (on mass storage) implementations, we would probably use a B-Tree or a B+ Tree.
- Hash Table. On average, this is the fastest overall Table implementation. It offers constant-time average performance for insert, delete, and retrieve, although its traverse is unsorted and sometimes slower. We use this when we want a Table implementation with fast typical performance, we can handle occasional poor performance, and we do not need a fast sorted traverse.
- Prefix Tree (a.k.a. “Trie”). This special-case Table implementation is appropriate when the collection of keys is a long lists of words, or something similar. It offers insert, delete, and retrieve in time proportional to the word length. We use this when we want a Table implementation, our data is the kind that works well with a Trie. Modern language standard libraries typically do not include Tries; this may also be a factor.
- Array + Sort Algorithm. If our Table usage splits naturally into Table-creation and Table-look-up phases, with little or no deletion done, then we often do not use a Table implementation at all. Simply put all the items into an array, sort the array, and then use Binary Search for look-ups. Each insertion is all amortized-constant-time (constant-time if we know the size of our data set in advance; regardless, inserting everything is linear-time). The look-ups are all logarithmic time. (Think “phone book”; paper phone book, that is.)
- When the key is the entire data item, all we can say about an item is whether it lies in the structure or not (or, if duplicate keys are allowed, how many times it is in the structure). This is set-style data. When the key is only part of the data item, then we can look up the rest of the item via the key. Thus, we have key-based look-up: associative arrays, dictionaries, etc.
- The C++ STL has eight Table implementations
In four of them the key is the entire data item
(
std::set
,std::multiset
, and the corresponding unordered containers), and in four of them the key is only part of the data item (std::map
,std::multimap
, and the corresponding unordered containers).
- A 2-3 Tree is a tree for which:
- Every node is either a 2-node or a 3-node, with the associated order properties holding.
- All leaves are at the same level.
- Each non-leaf has its maximum number of children (2 for a 2-node, 3 for a 3-node).
- For a 2-3 Tree, insert, delete, and retrieve are all logarithmic time, and traverse is linear time.
- A 2-3 Tree is never the best data structure. Use a Red-Black Tree instead. (Okay, a 2-3 Tree is simpler to write than a Red-Black Tree, but that excuse belongs to the days before generic programming and convenient distribution of code via the internet.)
- Even though we recommend never using 2-3 Trees, we studied them because 2-3-4 Trees are similar but a bit more complex, and Red-Black Trees, which are based on 2-3-4 Trees, are an excellent, practical data structure.
- Here are the 2-3 Trees:
- A 2-3 Tree is a tree for which:
- A Red-Black Tree is a Binary-Tree representation of a 2-3-4 Tree.
Each node in the Red-Black Tree has a data item and a Boolean value (red or black).
The various nodes in a 2-3-4 Tree are represented in a Red-Black Tree as follows.
2-nodes are left alone, with them and their children being colored black.
3-nodes and 4-nodes are converted as shown
(the 3-node actually has two possible conversions; the other has 4 at the top and is shaped like a mirror image of that below):
The Red-Black Tree insert and delete algorithms are just the translation of the 2-3-4 Tree algorithms to this representation. The retrieve and traverse algorithms are those for Binary Search Trees (use inorder traverse).
- For a Red-Black Tree, insert, delete, and retrieve are all logarithmic time, and traverse is linear time.
- For a data type to be stored in a Red-Black Tree, it must have an ordering defined on it
(
operator<
, usually).
- A Red-Black Tree is a Binary-Tree representation of a 2-3-4 Tree.
Each node in the Red-Black Tree has a data item and a Boolean value (red or black).
The various nodes in a 2-3-4 Tree are represented in a Red-Black Tree as follows.
2-nodes are left alone, with them and their children being colored black.
3-nodes and 4-nodes are converted as shown
(the 3-node actually has two possible conversions; the other has 4 at the top and is shaped like a mirror image of that below):
- An AVL Tree (Adelson-Velskii Landis tree) is a strongly balanced Binary Search Tree in which each node tracks the difference between the heights of its two subtrees. (A Binary Tree is strongly balanced if, for each node in the tree, the heights of its two subtrees differ by at most 1). There are efficient insert and delete algorithms that maintain the AVL Tree invariants.
- The operation is rotation.
This is essentially the operation that switches between the two trees shown below, with subtrees of nodes being dragged along as necessary:
- The ideal situation is when data items are stored in an array (or similar structure), and there is a fast function that gives the index at which at data item can be found, if it is in the structure. There are no collisions; every item has a unique spot. Such a function is called a perfect hash function, and the result is a Table implementation in which the insert, delete, and retrieve operations are all constant time.
- In practice we cannot reach this ideal situation, since, if we allow insertions, then we might insert two items that the hash function places at the same index, resulting in a collision.
- In the context of data structures, a hash function is the location look-up function used by a Hash Table.
- A hash function must take a key and return a nonnegative integer. It must do this in a deterministic manner; that is, it must always produce the same output for a given input. And it must give the same output for keys that compare equal.
- A good hash function:
- Can be computed quickly.
- Spreads out its results evenly over the possible output values.
- When given patterned input, does not produce patterned output.
- A key’s hashcode is the integer that the hash function returns when it is given that key.
- A slot is an index in the array that makes up a Hash Table.
A key’s slot is computed by taking the hashcode
modulo the array size:
hashcode % array_size
.
- Closed hashing means each location in the Hash Table holds a single data item. It can also be marked as “empty” or “deleted”. If, when we attempt to access an item, the space it ought to be in is occupied by some other item, then we look for the item in a sequence of other locations.
- A probe sequence is the sequence of locations we look in, to find an item in a Hash Table that uses closed hashing.
- Here are three kinds of probe sequences:
- Linear Probing. If the output of the hash function is \(t\), then the probe sequence is \(t\), \(t+1\), \(t+2\), etc.
- Quadratic Probing. If the output of the hash function is \(t\), then the probe sequence is \(t\), \(t+1^2\), \(t+2^2\), etc.
- Double Hashing. This is a general term for probe sequences determined with the aid of a secondary hash function. For example, one might do something like linear probing, but with a step size other than 1. The step size would be the output of the secondary hash function.
- Here are two disadvantages:
- In closed hashing, Hash Table locations must be marked as “deleted” when an item is deleted. Then these locations, while not containing an item, do not cause a search to stop, as an “empty” location does. Thus, if the array fills up with “deleted” locations, then Hash-Table operations become slower.
- Closed hashing can result in widely separated locations being searched when an item is accessed. This can result in poor performance if a Hash Table is stored on an external mass-storage device.
- To form a Hash-Table implementation method that does not use closed hashing, make the Hash Table an array of data structures, each of which can hold zero or more data items. These structures are called buckets.
- Buckets are virtually always implemented as Singly Linked Lists.
- This technique is open hashing.
- The Hash Tables in the C++ STL
all use open hashing, not closed hashing.
To be more precise,
the C++ Standard specifies only interface and performance,
not implementation.
However, the interface for
the
unordered_
… containers is not one that will work with closed hashing, as it allows for iteration through a single bucket.
- In a Hash Table’s worst case, every item gets the same slot. For a Hash Table using open hashing, this means that every item is stored in the same bucket, which is almost certainly implemented as a Linked List. With every item stored in one Linked List, and the rest of the Hash Table being unused, it is not far from the truth to say that the Hash Table essentially is a Linked List.