CS 311 Fall 2009  >  Final Exam Review Problems, Part 4

CS 311 Fall 2009
Final Exam Review Problems, Part 4

This is the fourth of four sets of review problems for the Final Exam. A complete review for the final includes the material of the Midterm Exam as well, which is covered in three problem sets posted earlier in the semester. For all seven sets, see the the class web page, or the following links:

Problems

Review problems are given below. Answers are in the Answers section of this document. Do not turn these in.

  1. How can we implement a Hash Table in a way that does not use open addressing?
     
    1. What problems occur when a Hash Table gets too full?
    2. What is the “load factor” of a Hash Table?
    3. What should be done when a Hash Table gets too full?
    4. What problems does the procedure from the previous part create?

     
  2. “The collision-resolution method is the primary design decision behind Hash-Table implementation.” Explain.
     
    1. What advantages do Hash Tables have over search trees, as a Table implementation?
    2. What disadvantages do Hash Tables have (list three disadvantages)?
    3. “Use Hash Tables intelligently,” says your instructor. Explain.

     
  3. Consider the following C++ code fragment:
    double arr[20];
    for (int i = 0; i < 20; ++i)
        arr[i] = i + 7.1;
    cout << arr[3] << endl;
    1. Change this code so that instead of an array, it uses one of the STL Table implementations. Make as few changes as possible. You may assume that the required header files have already been included.
    2. Changing the above code in this way is probably a bad idea. Why?

     
  4. The bracket operator of std::map is non-const. So, for example, the following code will not compile:
    const std::map<int, int> m = mymap;  // No problem
    cout << m[0] << endl;                // COMPILER ERROR! m is const
    Why is there no const version of this operator?
     
  5. The C++ STL includes no Hash Tables. However, some versions of the STL have nonstandard extensions, some of which are Hash Tables. For example, the SGI version of the STL includes a container called std::hash_map. This is a Hash-Table version of std::map. It has five template paramters:
    1. Based on the above information, we can conclude that SGI’s std::hash_map does not implement buckets as search trees. How do we know this?
    2. What should the worst-case performance of the insert, delete, and retrieve operations be for this container? (By the way, the three operations are actually called “insert”, “erase”, and “find”.)

     
  6. In a std::vector, if one has an iterator (“iter”) to an item, one can say
    *iter = value;
    to set the item’s value. The same is true for a std::deque and std::list. However, this is not the case for the STL Table implementations (std::set, std::map, etc.). Why not?
     
  7. Draw a Prefix Tree (Trie) holding the following words:

    a, an, and, ankle, any


     
  8. Discuss the efficiency of the Table insert, delete, and retrieve operations for a Table implemented using a Prefix Tree.
     
  9. Suppose we put our data structures on a mass storage device. How does this affect the optimal design of ...
    1. ... a Hash Table?
    2. ... a search tree?

     
    1. What does it mean to keep an “index” to a data structure? (Note: I am speaking of a separate data structure used to index another, NOT the concept of the index of an item in an array.)
    2. Under what circumstances is it a good idea to keep an index to a data structure?

     
    1. What is a “B-Tree”?
    2. What is a “B+ Tree”?
    It is acceptable to answer either of the above questions with something like, “It is just like this other thing, except ...”.
     
  10. Given a graph and a start vertex, write DFS and BFS orderings of the vertices. Finish this problem!!!
     
    1. What does Prim’s Algorithm find?
    2. Outline how Prim’s Algorithm works.

     
    1. What is a “greedy” algorithm?
    2. Give a problem that is correctly solved by a greedy algorithm.
    3. Give a problem that is not correctly solved by a greedy algorithm.

     
  11. We would say that insert-at-beginning is inefficient for a smart array. On the other hand, we would say that one can efficiently iterate through all items in an array. And yet both of these operations are linear time. Why do we say one is “inefficient” and the other is “efficient”, when they have the same order?
     
  12. In each part below, a problem is given. This problem has two solutions that are generally considered to be the best. In each part, And now for the parts:
    1. Sorting a sequence.
    2. Searching for an item in a sequence.
    3. Implementing a Table.

     
  13. Modern processors all use caching, in which, when a memory location is read, nearby memory locations are also read and stored on the processor for faster access. What implications does this have for the relative desirability of smart arrays vs. linked lists.
     
  14. For each of the following time efficiency classes, we have discussed a quintessential operation or algorithm that lies in that class (or, in one case, a class of operations). For each, indicate what this is.
    1. Constant time.
    2. Logarithmic time.
    3. Linear time.
    4. Log-linear time.
    5. Quadratic time.

     

Answers

  1. To form a Hash-Table implementation method that does not use open addressing, make the Hash Table an array of data structures, each of which can hold zero or more data items. These structures are often called buckets. Traditionally, buckets are implemented as linked lists, although other data structures can be used.
    1. When a Hash Table gets too full, performance can degrade.
    2. The load factor of a Hash Table is the number of items divided by the number of locations (buckets) in the table. Equivalently, it is the average number of items per bucket.
    3. When the load factor of a Hash Table gets to high, rehashing is necessary. This corresponds to the reallocate-and-copy operation of a smart array; however, it is more complex and time-consuming (although still linear time). The Hash Table must be reallocated, and every item must be inserted into the new table. Thus, the hash function must be called for every item.
    4. Rehashing means that, for an expanding Hash Table, every now and then a slow operation is required. Of course, just as with smart arrays, we can avoid this problem if we can start with a large enough Hash Table.
  2. Conceptually, a Hash Table is very simple: it is an array whose index is generated by the hash function. All the complexity in the implementation comes from what we do in the case of a collision. Thus, nearly all the implementation details of a Hash Table are determined by the collision-resolution technique.
    1. On the average, a Hash-Table-based Table has performance significantly better than a search-tree-based Table. Hash Tables may also be easier to implement (but now that we have generic programming, who cares?).
    2. Here are five disadvantages of Hash Tables:
      • Poor worst-case performance.
      • Overhead due to rehashing.
      • Requirement that a hash function be specified for user-defined types.
      • Lousy performance in the worst case, or if using a poorly designed hash function.
      • Traverse is unsorted, and may be slower.
    3. Hash Table are often presented as an option that always works well. However, they can have serious disadvantages (see the previous part). If the drawbacks of Hash Tables are acceptable, then by all means use one. But first think about those drawbacks and how they may affect the performance and quality of your code.
    1. This is not set-style data, and it does not have duplicate keys. Thus, the appropriate Table implementation is std::map. Since std::map has a bracket operator, the only thing that needs to be changed is the declaration of the variable arr.
      Old:
      double arr[20];
      New:
      std::map<int, double> arr;
    2. Using a map for array-style data gives certain advantages, such as compact storage of sparse data, and the availability of non-integer key types. However, none of those applies here. On the other hand, the disadvantages of a map, like slower execution, do apply.
  3. The operator[] of std::map returns a reference to an item in the structure. The item must exist; otherwise we cannot return a reference to it. Thus, if no item with the given key exists, one must be inserted. This modifies the structure, and so cannot be a const member function.
    1. A search tree would require an ordering comparison (think “operator<”) to be specified for user-defined types. Now, std::hash_map could just use operator<. However, this is against STL conventions, which require that the client should be able to specify whatever comparison is desired. Since there is no way to specify this, a search tree (or any other sorted container) is not used.
    2. The worst-case performance for insert, delete, and retrieve should be linear time. And, in fact, that is exactly what the SGI documentation says it is.
  4. In an implementation of a value-oriented ADT, the location an item is stored in depends on what its key is. Thus, we cannot change the value of an item in-place; we must delete and re-insert the item. This may be time-consuming, and so the STL Table implementations do not provide a single function that allows this operation.
  5. Here is the Prefix Tree, drawn using the conventions from the lecture slides:
    A Prefix Tree holding the required words
    The words, again, are the following: a, an, and, ankle, any.
  6. If we implement a Table using a Prefix Tree, then the number of steps required for a Table insert, delete, or retrieve operation is essentially the number of characters in the given key. (Each key is a string, and so we can talk about the number of characters in it.) Thus, if we consider the maximum length of a key to be fixed, then all three operations are constant time.
     
    On the other hand, if we want each item to have a different key, then with larger data sets, we need longer keys. The key length needs to be something like the log of the number of keys. Thus, there is a hidden logarithm (just as there was with Radix Sort); for arbitrarily large data sets, it is reasonable to consider the three operations to be logarithmic time.
    1. When we keep a data structure on external storage, our primary concern is to minimize the number of block accesses. Thus, we would avoid open addressing, since a typical probe sequence involves multiple locations within the table, and thus multiple block reads. We would prefer to use buckets designed so that items in a bucket are stored close together, and in the same block (or small number of blocks) if possible.
    2. Again, we wish to minimize the number of block accesses. Making nodes large (around the size of a block?) can help with this. Thus, we use a B-Tree of high degree, which has large nodes.
    1. An index is a data structure that tells us where to find something in some other data structure. For example, we may store a Table as an array of key-data pairs, then have an index stored in a more sophisticated way (Hash Table or search tree). The index holds keys and, for each key, a pointer to the appropriate location in the other structure.
    2. We typically use an index when the data in a structure is large and accessing it is slow, and the index can fit in memory. For example, we might use an in-memory index with pointers to data in an external (kept on mass storage) structure.
    1. B-Trees are a generalization of 2-3 Trees. Essentially, a B-Tree of degree m (m is odd) is an (m+1)/2-...-m Tree. That is, nodes can be (m+1)/2-nodes, ..., m-1-nodes, or m-nodes. (Recall: A k-node has k-1 data items and, if it is not a leaf, exactly k children). The exception is the root node, which can have 1, ..., m-1 data items; the number of children of the root is still one more than the number of data items in it, unless it is a leaf. The algorithms for a B-Tree generalize those for a 2-3 Tree.
    2. A B+ Tree is a variation on a B-Tree in which each data item that is in a non-leaf node is duplicated in a leaf node. Typically the leaf nodes are also joined in an auxiliary Linked List.
  7. Answer goes here!!!
    1. Prim’s Algorithm finds a minimum spanning tree in a weighted graph.
    2. Prim’s Algorithm begins with one vertex that is declared to be reachable. It then repeatedly adds to the spanning tree the edge with least weight that joins a reachable vertex to a not-reachable vertex, with the not-reachable vertex then becoming reachable.
    1. A greedy algorithm is one that maintains some kind of partial solution to a problem, and, at each step, it improves this partial solution as much as possible.
    2. A greedy algorithm correctly finds a minimum spanning tree in a weighted graph.
    3. Many, many problems are not solved correctly by greedy algorithms. One example mentioned in class is finding the shortest path between two vertices, in a weighted graph.
  8. Insertion is a single-item operation. Such operations are often done in bunches. Thus, we need insertion to be faster than linear time for it to be very useful. Iteration through all items, on the other hand, involves all items in the structure. Thus it cannot be faster than linear time, and so we call a linear-time implementation “efficient”.
      • Introsort, Merge Sort.
      • Both algorithms are O(n log n). Introsort is significantly faster, but requires random-access data and is not stable. Merge Sort is stable and can be written to work on a linked list.
      • We may use Insertion Sort to handle nearly sorted lists or small lists. Other algorithms (e.g., Radix Sort) may be used when sorting special kinds of data. We may use Heap Sort in low-memory situations. We may use variants of Heap Sort to handle more general situations, for example, when a sequence is modified during the sorting process, or when we only want to sort the greatest or least items in a sequence.
      • Binary Search, Sequential Search.
      • Binary Search is O(log n), while Sequential Search is O(n). However, Binary Search requires sorted, random-access data, and Sequential Search works on any sequence.
      • For a simple sequence type (array, Linked List), we would pretty much always use Binary Search or Sequential Search. But in a fancier data structure (various trees, Hash Tables) we would use the search algorithm appropriate to the structure.
      • Balanced Search Tree, Hash Table.
      • Balanced Search Trees have better worst-case performance (O(log n) for Insert, Delete, Retrieve) and sorted traverse. They may require a user-provided ordering. We generally use a Red-Black Tree for in-memory data structures and a B-Tree (or B+ Tree or other B-Tree variation) of high degree for external data structures. Hash Tables have better typical-case performance (O(1) on average for Insert, Delete, Retrieve if Hash Table is not overly full), but worse worst-case performance (typically O(n) Insert, Delete, Retrieve). Their traverse is unsorted. They also may require occasional rehashing and a user-provided hash function.
      • If we need Table-style functionality, but our application requires an extended Insert phase, followed by an extended Retrieve phase, with little or no deletion, then we probably will not use a Table implementation at all. Instead, we simply place all our data in an (unsorted) array, sort it, and use Binary Search for our retrievals.
  9. If the application generally accesses items in a list in such a way that, when an item is accessed, nearby items are likely to be accessed next (locality of reference), then the existence of caching makes arrays more desirable and linked lists less desirable. This is because consecutive items in an array are stored in adjacent memory locations, which will be loaded into the cache. Linked lists, on the other hand, do not have this property.
    1. Constant time. Look-up by index (bracket operator) for an array.
    2. Logarithmic time. Binary Search.
    3. Linear time. Any simple loop that goes through all the items in a list and performs a very simple (constant time) operation on each. For example: copying an array.
    4. Log-linear time. Sorting, using a good algorithm (e.g., Merge Sort, Introsort, Quicksort).
    5. Quadratic time. Bad sorting algorithms (e.g., Bubble Sort).


CS 311 Fall 2009: Final Exam Review Problems, Part 4 / Updated: 15 Dec 2009 / Glenn G. Chappell / ffggc@uaf.edu Valid HTML 4.01!