CS 311 Fall 2007 > Final Exam Review Problems, Part 3

CS 311 Fall 2007
Final Exam Review Problems, Part 3

This is the third of four sets of review problems for the Final Exam. A complete review for the final includes the material of the Midterm Exam as well, which is covered in three problem sets posted earlier in the semester. For all seven sets, see the the class web page, or the following links:

Problems

Review problems are given below. Answers are in the Answers section of this document. Do not turn these in.

1. What are the orders of insert, delete, and retrieve for a Binary Search Tree (using the standard Binary Search Tree algorithms)?
2. Why is this a problem?
3. What can we do about this problem?
Suppose the following items are inserted, in the order given, into an empty Binary Search Tree. Draw the resulting tree.
Items: 3, 2, 6, 7, 1, 4, 5.
1. Describe the Treesort algorithm.
2. What is wrong with this algorithm?
3. This algorithm is closely related to another algorithm we studied. Explain.
A Binary Search Tree is shown below.

Perform each of the following operations and show the resulting tree. For each part, start with the original tree above, not the tree resulting from earlier parts.
1. insert 25.
2. delete 10.
3. delete 30.
4. delete 20.
1. What is a “search tree”? Give 5 examples of kinds of search trees.
2. Fill in the blank: In a search tree, the primary factor determining how many steps the insert, delete, and retrieve operations require is the _______ of the tree.
3. What are 2-nodes, 3-node, etc.?
4. How do 3-nodes, etc., help us make things more efficient?
1. Explain the difference between “position-oriented” ADTs and “value-oriented” ADTs.
2. List three ADTs in each category.
1. In class (as opposed to in the text) we did not precisely define the operations of the Table ADT. Why not? How can these operations vary?
2. How is the Table ADT related to databases?
3. What are Tables for? List three or four big categories of Table applications.
1. What is a “Priority Queue”?
2. Is a Priority Queue a kind of Queue? Explain.
3. How can we simulate a Queue using a Priority Queue? Can we simulate a Stack?
4. What data structure do we usually use to implement a Priority Queue? If we use this structure, then what is the order of the three main Priority-Queue operations?
1. What is a “Heap”?
2. Explain the terms “Maxheap” and “Minheap”.
3. How is a Heap usually implemented?
4. We generally think of a Heap not so much as a data structure, but as a collection of algorithms. Explain.
5. Reallocate-and-copy can slow down a Heap. However, in many Heap applications, reallocate-and-copy does not happen. Why not?
6. What are Heaps & Heap algorithms good for?
Suppose we start with an empty Maxheap, and insert each of the following items, in order: 1 2 3 4 5 6 7 8.
1. Draw the resulting Heap (as a tree).
2. Draw the array representation of the above Heap.
3. Now, we perform one Heap delete operation on the Heap from the previous parts. Draw the resulting Heap (as a tree).
1. Explain how Heap Sort works.
2. What nice property does Heap Sort have that no other sorting algorithm we studied has?
3. In what situations is Heap Sort a good choice?
Explain what a “key” is, and its role in value-oriented ADTs and their implementations.
State the order of the three main Table operations (insert, delete, retrieve) when a Table is implemented as:
1. A (smart) unsorted array.
2. A (smart) sorted array.
3. A Binary Search Tree.
4. A Red-Black Tree.
We covered a number of possible ways to get Table-style functionality. Two of the implementations we discussed are very commonly used. Another implementation is occasionally used. Yet another is not really a Table implementation. List each of these four, and indicate when/why each method is used.
The Table ADT is used in two conceptually different ways: one in which the whole data item is the key, and one in which only part of the data item is the key.
1. Explain.
2. How are these two handled in the C++ Standard Library?
1. What, exactly, is a “2-3 Tree”.
2. What is the order of the insert, delete, retrieve, and traverse operations for a 2-3 Tree?
3. In what situations is a 2-3 Tree the best data structure to use?
4. With the answer to the previous question in mind, why did we study 2-3 Trees?
5. Starting with an empty 2-3 Tree, insert 1, 2, 3, 4, 5, in that order. After each insertion, draw the resulting tree.
1. Explain what a “Red-Black Tree” is, and (roughly) how it works.
2. What is the order of the insert, delete, retrieve, and traverse operations for a Red-Black Tree?
3. In order for a data type to be stored in a Red-Black Tree, what properties does it need to have?
1. What is an “AVL Tree”?
2. What is the primary operation involved in insert & delete in an AVL Tree?
3. In practice, we never use an AVL Tree. Why not?
1. Hashing is an attempt to reach an ideal situation. Describe this situation, and indicate what sort of performance (order of operations) would result if the ideal could be achieved.
2. Why can we not achieve this ideal situation, in practice?
1. What is a “hash function”?
2. What property or properties must a hash function have have?
3. What are some properties that a good hash function should have?

Answers

1. For a Binary Search Tree, the insert, delete, and retrieve operations are all linear time. (They are logarithmic time if the tree is balanced; however, it might not be balanced.)
2. This is a problem, because we generally want to use a Binary Search Tree to implement a Table ADT, and linear time is too slow for single-element operations like these.
3. Here are two possible solutions:
  - Keep the tree balanced during insertion and deletion (for example, by using the AVL Tree algorithms).
  - Or, even better, use some other Table implementation, like a Red-Black Tree or a hash table.
1. The Treesort algorithm is a sorting algorithm that solves the General Sorting Problem. It uses a Binary Search Tree: begin with an empty tree, insert each item into the tree, then do an inorder traversal of the tree to recover the items in sorted order.
2. First of all, Treesort has the same problem as Quicksort: it is O(n²), even though it has good performance for typical data. Second, like unoptimized Quicksort, Treesort has particularly bad performance for already sorted data. Third, Treesort is not in-place, as Quicksort is. Because of these problems, in practice Treesort is never used.
3. Treesort is essentially Quicksort, disguised and made a little worse. In both, we pick a single item (the root in Treesort; the pivot in Quicksort), then we (recursively) sort the remaining items in two bunches: those less than the chosen item and those greater.
1. 25 is greater than 20, less than 50, and less than 30. Thus, we put it in a new left child of the 30 node.
2. 10 is in a leaf. We simply delete it.
3. 30 is in a node with 1 child. We point 30’s parent’s pointer to 30’s child. Then we delete the 30 node.
4. 20 is in a node with 2 children. We copy the value in 20’s inorder successor (30) to its node, and then follow the 1-child deletion procedure, as in the previous part.
1. A search tree is a tree in which inorder traversal, or a generalization of this, gives the tree’s data items in sorted order. Examples:
  - Binary Search Tree
  - 2-3 Tree
  - 2-3-4 Tree
  - Red-Black Tree
  - AVL Tree
  - B-Tree
  - B+ Tree
2. In a search tree, the primary factor determining how many steps the insert, delete, and retrieve operations require is the height of the tree.
3. A k-node (k is an integer: 2, 3, 4, ...) is a tree node containing k – 1 data items, and having k subtrees. These have an order relationship that generalizes that of a Binary Search Tree (all of whose nodes are 2-nodes): Items in the first subtree are less than or equal to the first data item, which is less than or equal to the items in the second subtree, which are less than or equal to the second data item, etc.
4. 3-nodes, etc. help us write more efficient insert and delete algorithms, since they allow a bit of “give” in how many items are present in a portion of the tree. Thus, they let us insert and delete items without rearranging the whole tree.
1. In a position-oriented ADT, we look up an item by its position in the data structure used to implement the ADT. The quintessential example is Sequence, in which we look up items by index. In a value-oriented ADT, we look up items by their value. Often a portion of this value, called the key, is used. The quintessential example is a Table, in which we look up items by key.
2. Here are four position-oriented ADTs covered this semester:
  - Sequence
  - Stack
  - Queue
  - Binary Tree
  Here are four value-oriented ADTs covered this semester:
  - SortedSequence
  - Binary Search Tree
  - Priority Queue
  - Table
1. We did not precisely define the Table ADT in class, because the operations can reasonably vary according to application and implementation. We mentioned four ways in which the operations can vary:
  - The key can be the entire data item or only part of it. Of course, the former can be implemented using the latter. However, the interesting special case of set-style data makes the first possibility important.
  - Duplicate keys can be allowed or disallowed.
  - Traversal can be sorted or not. The former is convenient for search-tree implementations, the latter for hash-table implementations.
  - We can have a separate operation to change an item’s value, or else the retrieve operation can return a modifiable value. Alternatively, in the case of set-style data, we can have unmodifiable data items, in which case neither operation is appropriate.
2. The Table ADT is the natural interface to a database. The Table operations insert, delete, retrieve, and (maybe) change a data item, correspond to the SQL data-manipulation commands Insert, Delete, Select, and Update.
3. Here are four big categories of Table applications:
  - Sparse data. If we have an array “a”, and we want to say “a[0] = 1;” and “a[100000000] = 2;”, then a Table is probably more suitable than an array.
  - “Arrays” whose indices are not non-negative integers. When we want to do “age["Fred"] = 20;” and “age["USA"] = 229;”.
  - Data accessed via some key or code number. UAF undoubtedly has a huge database indexed by student ID number. This is a good application for a Table.
  - Set-Style Data. When we only care whether an item is in the structure or not, we are talking about a set, which is usually best implemented as a Table.
1. Priority Queue is a value-oriented ADT. Items can be inserted (“enqueue”), but only one item can be accessed: the front item. Which item is the front is determined by the items’ keys, which are called their priorities in this context. The front item is the item with the highest priority. It can be read (“getFront”) and removed (“dequeue”), in which case a new item becomes the front.
2. No. In Queue, items are removed in the order they were inserted. In a Priority Queue, items are removed in order of priority. Thus, a Priority Queue is not a Queue.
3. Using a Priority Queue, we can simulate a Queue by ensuring that each inserted item gets a lower priority than the previous one. We can also simulate a Stack, by ensuring that each item gets a higher priority than the previous one.
4. We usually implement a Priority Queue using a Heap, which is stored in a smart array. In this case, enqueue is linear time, or, if we can be sure that no reallocate-and-copy is required, logarithmic time (a fast linear time, though: amortized constant + logarithmic), dequeue is logarithmic time, and getFront is constant time.
1. A Heap is a complete Binary Tree in which, for each node, the item in the node is greater than or equal to the items in all the node’s descendants. In practice, since we always represent a Heap using the array implementation of a complete Binary Tree, we refer to this array as a Heap, if the Binary Tree it represents is a Heap.
2. A Maxheap is precisely what is described above. A Minheap is a variation in which items are less than or equal to the items in descendants. We use “Maxheap” and “Minheap” to avoid misunderstandings about the structure of our data.
3. We usually implement a Heap using the array implementation of a complete Binary Tree.
4. It is convenient to use Heap algorithms to operate on existing arrays (or other random-access ranges of data). Thus, when we implement a Heap, we often just implement the Heap algorithms, and let them operate on client-provided storage.
5. Heap algorithms tend to be run on client-provided storage. Thus, the allocation is already done, and no reallocate-and-copy is necessary, or even possible.
6. Heaps are for:
  - Implementing a Priority Queue.
  - Doing Heap Sort.
  - Doing variations on Heap Sort in which items are inserted in, or deleted from, the Heap in the middle of the sorting operation.
1. Each successive item is placed in a new node, and then “bubbled” up to its proper place (which, at that time, is always the root, since we are inserting in ascending order). The result, after all 7 inserts, is below.
2. The items appear in order from top to bottom, then left to right.
3. The root item is replaced with the item in the last node (5). The last node is then deleted. We bubble the new root item down, swapping it with its largest child. Result:
1. To do Heap Sort, we begin with an empty Maxheap, insert all items into it, and then remove them. This gives the items in reverse sorted order. Typically, Heap Sort is done in-place, with the Heap growing from the beginning of the data as items are inserted, and shrinking again as they are removed. When we do it this way, a removed item goes into the spot freed up by the shrinking Heap. Thus, the largest item goes in the last spot, etc., and the data ends up sorted.
2. Heap Sort is the only O(n log n) sort we studied that can be done in-place and using only constant [O(1)] additional storage.
3. Heap Sort is useful:
  - As a basis for other algorithms. In particular, it is part of Introsort.
  - When generalized to handle data sets that are modified in the middle of sorting.
  - In situations in which very little extra memory is available.
A key is the portion of the value of a data item that is used to look it up. Keys are used to look up items (for retrieval or deletion) in value-oriented ADTs. Efficient implementations of such ADTs must arrange items so that keys can be found quickly. Thus, in a search tree (or other sorted structure), items are stored in order sorted by key, and in a hash table, the keys are hashed.
1. Smart Array. insert: amortized constant. delete: linear. retrieve: linear. Note: Insert is constant time if we have enough memory pre-allocated.
2. Smart Sorted Array. insert: linear. delete: linear. retrieve: logarithmic.
3. Binary Search Tree. insert: linear. delete: linear. retrieve: linear. Note: All three are logarithmic if the tree is balanced. Except that it might not be.
4. Red-Black Tree insert: logarithmic. delete: logarithmic. Retrieve: logarithmic.
Here are the four:
- Balanced Search Tree. This offers logarithmic-time insert, delete, and retrieve, and linear-time sorted traverse. We use this when we need a Table implementation with worst-case performance guarantees. For in-memory implementations, we would probably use a Red-Black Tree. For external (on mass storage) implementations, we would probably use a B-Tree or a B+ Tree.
- Hash Table. On average, this is the fastest overall Table implementation. It offers constant-time average performance for insert, delete, and retrieve, although its traverse is unsorted and sometimes slower. We use this when we want a Table implementation with fast typical performance, we can handle occasional poor performance, and we do not need a fast sorted traverse.
- Trie (a.k.a. “Prefix Tree”). This special-case Table implementation is appropriate when the collection of keys is a long lists of words, or something similar. It offers insert, delete, and retrieve in time proportional to the word length. We use this when we want a Table implementation, our data is the kind that works well with a Trie. Modern language standard libraries typically do not include Tries; this may also be a factor.
- Array + Sort Algorithm. If our Table usage splits naturally into Table-creation and Table-look-up phases, with little or no deletion done, then we often do not use a Table implementation at all. Simply put all the items into an array, sort the array, and then use Binary Search for look-ups. Each insertion is all amortized-constant-time (constant-time if we know the size of our data set in advance; regardless, inserting everything is linear-time). The look-ups are all logarithmic time. (Think “phone book”; paper phone book, that is.)
1. When the key is the entire data item, all we can say about an item is whether it lies in the structure or not (or, if duplicate keys are allowed, how many times it is in the structure). This is set-style data. When the key is only part of the data item, then we can look up the rest of the item via the key. Thus, we have key-based look-up: associative arrays, dictionaries, etc.
2. The C++ STL has four Table implementations In two of them the key is the entire data item (std::set and std::multiset), and in two of them the key is only part of the data item (std::map and std::multimap).
1. A 2-3 Tree is a tree for which:
  - All nodes contain either 1 or 2 data items. If 2 items, then the first is less than the second.
  - All leaves are at the same level.
  - Each non-leaf is either a 2-node or a 3-node.
  Note: As usual, we may loosen these requirements slightly, to allow for multiple equivalent items. In practice, when we talk about a 2-3 Tree, we usually associate with it the efficient insert and delete algorithms that maintain its invariants.
2. For a 2-3 Tree, insert, delete, and retrieve are all logarithmic time, and traverse is linear time.
3. A 2-3 Tree is never the best data structure. Use a Red-Black Tree instead. (Okay, a 2-3 Tree is simpler to write than a Red-Black Tree, but that excuse belongs to the days before generic programming.)
4. Even though we recommend never using 2-3 Trees, we studied them because 2-3-4 Trees are similar but a bit more complex, and Red-Black Trees, which are based on 2-3-4 Trees, are an excellent, practical data structure.
5. Here are the trees:
1. A Red-Black Tree is a Binary-Tree representation of a 2-3-4 Tree. Each node in the Red-Black Tree has a data item and a Boolean value (red or black). The various nodes in a 2-3-4 Tree are represented in a Red-Black Tree as follows. 2-nodes are left alone, with them and their children being colored black. 3-nodes and 4-nodes are converted as shown (the 3-node actually has two possible conversions; the other is the mirror image, with 4 at the top):
  
  The Red-Black Tree insert and delete algorithms are just the translation of the 2-3-4 Tree algorithms to this representation. The retrieve and traverse algorithms are those for Binary Search Trees (use inorder traverse).
2. For a Red-Black Tree, insert, delete, and retrieve are all logarithmic time, and traverse is linear time.
3. For a data type to be stored in a Red-Black Tree, it must have an ordering defined on it (operator<, usually).
1. An AVL Tree (Adelson-Velskii Landis tree) is a balanced Binary Search Tree. (A Binary Tree is balanced if, for each node in the tree, the heights of its two subtrees differ by at most 1). In practice, when we talk about an AVL Tree, we usually associate with it the efficient insert and delete algorithms that maintain its invariants.
2. The operation is rotation. This is essentially the operation that switches between the two trees shown below, with subtrees of nodes being dragged along as necessary:
3. We never actually use an AVL Tree, since they are never the best solution. Other kinds of balanced search trees (Red-Black Trees, in particular) are more efficient when dealing with repeated insertions and deletions, and a sorted array is more efficient when there are separate “filling” and “searching” phases (for example, as your instructor keeps pointing out: phone books).
1. The ideal situation is when data items are stored in an array (or similar structure), and there is a fast function that gives the index at which at data item can be found, if it is in the structure. There are no collisions; every item has a unique spot. Such a function is called a perfect hash function, and the result is a Table implementation in which the insert, delete, and retrieve operations are all constant time.
2. In practice cannot reach this ideal situation, since, if we allow insertions, then we might insert two items with the same hashed key, resulting in a collision.
1. A hash function is the location look-up function used by a hash table.
2. A hash function must take a key and return a nonnegative integer. It must do this in a deterministic manner; that is, it must always produce the same output for a given input.
3. A good hash function:
  - Can be computed quickly.
  - Spreads out its results evenly over the possible output values.
  - When given patterned input, does not produce patterned output.

CS 311 Fall 2007: Final Exam Review Problems, Part 3 / Updated: 12 Dec 2007 / Glenn G. Chappell / ffggc@uaf.edu

CS 311 Fall 2007 Final Exam Review Problems, Part 3

Problems

Answers

CS 311 Fall 2007
Final Exam Review Problems, Part 3