CS 311 Fall 2007
Final Exam Review Problems, Part 1
This is the first of four sets of review problems for the Final Exam.
A complete review for the final includes the material of the Midterm Exam
as well, which is covered in three problem sets posted earlier in the semester.
For all seven sets, see the
the class web page,
or the following links:
Final Exam Information
The Final Exam will be given on Monday, December 17, 1–3 p.m.
It will be worth 100 points.
The exam will be comprehensive, emphasizing the material covered after the Midterm Exam.
Referring to books, electronic devices, or other people
during the exam will not be permitted.
Limited use of notes will be permitted;
see the “Cheat Sheet” section, below.
“Cheat Sheet”
Students will be allowed to bring one sheet of handwritten notes
(a “cheat sheet”) to the Final Exam.
Notes brought to the final must conform to the following guidelines,
which will be ruthlessly enforced.
- You may bring no more than one sheet of notes;
both the front and the back may be used.
- Notes must be on paper no larger than standard notebook paper
(U.S. letter [8.5" x 11"], U.S. Legal [8.5" x 14"], or ISO A4 [210 mm x 297 mm])
- Notes must be handwritten.
In particular, the following are prohibited:
photocopies (including photocopies of handwriting),
computer-printed pages, pages from books, typewritten notes, photographs.
Problems
Review problems are given below.
Answers are in the
Answers section of this document.
Do not turn these in.
- Briefly explain how the Introsort algorithm works.
- What is the order of Introsort?
- What is the major advantage of Introsort, over other sorting algorithms?
- List four important disadvantages of Introsort, compared with other sorting algorithms?
- What kinds of data does the Radix Sort algorithm work on?
- Briefly explain how Radix Sort works.
- What is “abstraction”?
- What does “ADT” stand for,
and what does it mean?
- Name two possible data structures that one might use to implement ADT Sequence
(or a similar ADT).
Briefly discuss advantages and disadvantages of each.
- ADTs Sequence and SortedSequence are superficially similar.
However, they have an important fundamental difference. Explain.
(Hint: What kind of ADT is SortedSequence?)
- When we put a list of items in order, we often do not care what the order is.
Explain.
- What kinds of data are SortedSequence and ADTs like it used for?
(Hint: There are two big categories of applications.)
- What does it mean for an interface to be “complete”?
Give an example of an interface that is not complete.
- Give an example of an interface that does not facilitate efficiency.
- Give an example of an interface that is not very generic.
- List the three standard levels of exception safety, and
briefly describe each.
- Which level do we usually prefer? Briefly explain why we prefer it.
- List two circumstances in C++ programming in which it is important to offer the No-Throw Guarantee.
For each, briefly explain why this guarantee should be offered.
- We can write a good copy assignment operator for a class
even if we know next to nothing about the class.
Explain.
- If certain member functions in our class have certain properties,
then our copy assignment operator (written as in the previous part) is exception-safe.
What properties do these functions have to have,
and what level of exception safety does our copy assignment operator have?
- Explain the difference between constant time and amortized constant time.
- What kinds of operations on sequences are typically constant time?
- What kinds of operations on sequences are typically amortized constant time
(without being constant time)?
- What is a “generic container”?
- What language construct do we usually use to implement generic containers in C++?
- How does the use of this construct change our normal
source/header file setup?
- How do generic containers complicate exception safety?
- What is “exception neutrality”?
- Suppose we use an operation on a client-specified type, and this operation may throw.
If we want to be exception-neutral, how should we handle this
(what should the code look like)?
- Give a reason for the rule that every function should have exactly
one purpose (that is, it should do “one thing”).
- What do we mean by a “node-based structure”?
- List some advantages that node-based structures have over
(smart) arrays.
- List some disadvantages.
- List several data structures that are often implemented using the node/pointer idea.
- When we implement a node-based structure using a C++ class,
we usually write several classes.
List and explain these (at least three).
- How are ownership issues typically handled in a node-based structure?
- How does this affect the design of the destructors for
classes related to a node-based structure?
- In the context of Linked Lists, what does it mean to “splice”?
- “Splicing is what makes Linked Lists worthwhile.” Explain (even if you do not agree with the statement :-).
- How does the possibility of splicing affect the efficiency of computing the size of a Linked List?
- What is “locality of reference”?
- How is locality of reference related to memory caching and data-structure design?
- Briefly discuss locality-of-reference and caching issues
with std::vector, std::deque, and std::list.
- List five operations or algorithms for which the orders differ for (smart) array-based sequences and Doubly Linked Lists.
Give the order of each operation.
- List five operations or algorithms for which the orders are the same for (smart) array-based sequences and Doubly Linked Lists.
Give the order of each operation.
- Suppose we are implementing a Singly Linked List.
- What data members do we generally want in the class representing the entire list?
- What data members do we generally want in the node class?
- How would the above change for a doubly linked list?
Answers
- Introsort is essentially Quicksort with median-of-three pivot selection.
In addition, it keeps track of the recursion depth.
If this exceeds some bound (perhaps 2log2n?),
the algorithm switches to Heap Sort for the given recursive call.
Usually, standard optimizations for Quicksort are used:
tail-recursion elimination for the larger recursive call,
and maybe a final pass with Insertion Sort.
- Introsort is log-linear time.
- Introsort is generally considered to be the fastest known sorting algorithm.
- On the downside, Introsort is not stable, requires random-access data,
uses significant extra space (logarithmic), and nearly sorted data does not improve its performance.
- Radix Sort sorts lists of strings
(positive integers may be considered as strings of digits, if padded on the left with zeroes).
- Radix Sort proceeds in a number of passes.
Each pass uses a Pigeonhole Sort to sort the list by one of the characters in each string.
The first pass sorts by the least-significant character.
The second pass sorts by the next-to-least-significant character,
and so on, with the last pass sorting by the most-significant character.
- Abstraction means separating the purpose of a module from its implementation.
That is, we specify interface and effects,
without specifying internal implementation.
- An ADT is an Abstract Data Type,
that is, a collection of data together with some
operations defined on that data.
- One might implement a Sequence using:
- An array.
- Advantages: Using an array allows for faster look-up
and Binary Search (if the items are sorted).
Operations with the same order tend to be somewhat
faster with an array than a Linked List.
Arrays tend to have less memory-management overhead overall
and to use less total memory.
Arrays also store consecutive items in nearby memory locations;
thus, they allow effective use of a memory cache,
when algorithms having good locality of reference are used.
- a (Doubly?) Linked List.
- Advantages: Using a Linked List allows for faster insert and delete (by position)
and splicing.
It also avoids the need for reallocate-and-copy as a container expands,
which helps with efficiency and keeps iterators and references valid.
- Sequence is a position-oriented ADT, in which we deal with items primarily by their position in the container.
SortedSequence is a value-oriented ADT, in which we deal with items primarily by their values.
- When we put items in order, it is often not because we want them in that order, but because we want them
to be easy to find (think “Binary Search”).
Thus, we do not care what the order is, only that it makes searching fast.
- Value-oriented ADTs like SortedSequence are used for
- “Set” data.
Data for which we only care whether an item is in the container, not where it is.
- “Table” data.
We use value-oriented ADTs to handle key-based look-up.
Other names for such structures: dictionaries, maps, associative arrays.
- An interface is complete if all desired operations on the data
can be performed using only the interface.
A simple example of an incomplete interface would be a Stack
with push, but no pop.
- One example is the text’s ADT List, especially if implemented using an array.
Arrays allow constant-time modification of data in-place.
However, using List, to modify an item
we must first delete the old item and then insert a new one.
With an array implementation, these are both linear-time operations.
- One example would be a Sequence-style interface that requires the data to be integers.
There iss no particular reason for this to be the case.
- The three standard levels of exception safety are as follows:
- Basic Guarantee
- Data remain in a usable state, and resources are never leaked, even in the presence of exceptions.
- Strong Guarantee
- If an operation throws an exception, then it makes no changes that are visible to the client.
- No-Throw Guarantee
- The operation never throws an exception.
- We generally prefer the Strong Guarantee.
We do not prefer the Basic Guarantee, since it means that, when an exception is thrown,
objects may end up in unknown (but valid!) states.
With the Strong Guarantee, we always know what state an object will be in.
On the other hand, we do not prefer the No-Throw Guarantee, since
it limits our options for flagging an error.
The Strong Guarantee allows us to use exceptions,
while the No-Throw Guarantee does not.
- It is important to offer the No-Throw Guarantee:
- In destructors.
Destructors can be called between a throw and the associated catch.
In such circumstances, a second throw terminates the program.
- In functions used to “commit” changes to data.
Modifying a copy of data and then committing using a non-throwing operation allows us to
offer the Strong Guarantee.
- If the class has a copy constructor and a swap member function
(declared in the usual way), then we can write a copy assignment operator
that uses them.
The copy assignment operator uses the copy constructor to create a copy of a given object,
and then uses the swap function to swap the value of the copy with its own value.
The code is as follows (“Foo” is the name of the class):
Foo & operator=(const Foo & rhs)
{
if (this != &rhs)
Foo(rhs).swap(*this);
return *this;
}
- The copy constructor needs to offer the Basic Guarantee
(as everything should), and the swap function needs to offer the
No-Throw Guarantee.
If these are both the case,
then our copy assignment operator offers the Strong Guarantee.
- An operation is constant time if its time order is O(1);
that is, if it takes O(1) time to complete a single operation.
An operation is amortized constant time if takes O(k) time to complete k operations.
Thus, for an amortized constant time operation,
the average time for a single operation,
taken over a large number of consecutive operations, is O(1).
- Constant-time operations include look-up by index in an array
and accessing the first item in pretty much any kind of sequence.
- The quintessential amortized-constant-time operation is insert-at-end for a smart array.
This is a linear-time operation, since it may require reallocating and copying the entire array.
However, in a well-written implementation, realocate-and-copy will not happen very often.
- A generic container is a container that can hold items of a client-specified type.
- In C++ we generally implement a generic container using a class template.
- The C++ Standard does not require compilers to be able to do separate compilation
of templates.
Thus, when we write a class template, we generally put all the code in the header file,
with no source file at all.
- Generic containers use value-type operations to deal with the data
they hold.
These operations may throw exceptions that the generic container does not
know about;
but it still must be able to deal with them properly.
- Code is exception-neutral if, when an exception is thrown by
a function the code uses,
the exception is propagated unchanged to the client.
Code for generic containers generally needs to be exception-neutral.
- When we write exception-neutral code, we deal with operations that
may throw in one of two ways:
- We use such operations outside any try block,
thus automatically propagating the exception to the caller.
- We use such operations inside a try block.
In the associated catch block, we catch all exceptions,
do whatever clean-up is necessary,
and then re-throw the same exception.
- Here are two possible answers.
- Having a function perform multiple operations makes error handling difficult.
For example, if a function performs two operations, and the first
is successful, but the second is not,
then we may need to undo the first operation.
Sometimes this is impossible, as when the second operation
is the function’s return-by-value.
- Giving each function a single responsibility
improves the modularity of code.
It is easier to understand,
since the code is split into functions in natural ways.
It is also easier to debug,
since it is clear what each function is supposed to do.
- A node-based structure
is a data structure whose data items are stored in
small memory blocks (often separately allocated),
called nodes.
Nodes typically reference each other via pointers.
- Node-based structures generally offer:
- Faster rearrangement (insert, delete, and more complex operations).
- More variety (tree-shaped structures, etc.).
- Memory management that is more appropriate for some situations.
Memory does not need to be allocated in a large block,
and so the reallocate-and-copy operation that can slow down smart arrays never needs to happen.
- (Smart) arrays generally offer:
- Faster position-based look-ups (e.g., “arr[i]”).
- Less memory-management overhead, overall.
- Simpler implementation.
- All of the following:
- Linked lists (including doubly linked lists).
- Just about any tree structure (except a Heap):
- Binary Trees.
- 2-3 Trees.
- 2-3-4 Trees.
- Red-Black Trees.
- AVL Trees.
- B-Trees.
- Quadtrees.
- Octrees.
- Some Hash Tables (buckets can be node/pointer-based).
- The C++ STL’s std::deque uses pointers to nodes that are small arrays.
- Classes needed:
- We need a class to represent the data structure as a whole.
- We usually have a class representing a single node in the structure.
- Often we have an iterator class.
Or perhaps several iterator classes: iterator, const_iterator,
reverse_iterator, const_reverse_iterator.
- Each node object typically owns the nodes it points to.
The object representing the structure as a whole
typically owns the root node (or something similar).
- The destructor for each object only needs to delete pointers
to nodes it owns.
A typical destructor is thus one or two lines,
with each line being a delete statement.
- Splicing is removing a subsection (consisting of consecutive items)
from one list and inserting it into another list.
- Splicing is a constant-time operation for linked lists.
It is not constant time for any of the other data structures we studied.
Thus, splicing is where linked lists shine.
- If we do not allow splicing, then every linked-list operation changes the size of the list by at most 1
(or sets the size to zero).
Thus, we can easily keep track of the size, and return it in constant time.
However, splicing changes the size of a list by an amount that cannot be efficiently determined.
Thus, if we allow splicing, then to determine the size of a linked list, we must iterate through it,
a linear-time operation.
- Locality of reference
is a property that an algorithm can have.
It means that,
when the algorithm access an item in a data structure,
the following accesses are likely to be to the same item
or nearby items.
- When a memory location is accessed,
a memory cache will generally load nearby memory locations,
thus greatly speeding up access to them.
If an algorithm has locality of reference,
it is thus advantageous to use a data structure in which consecutive
items are stored in nearby memory locations.
- std::vector and std::deque
tend to store consecutive items in nearby memory locations.
Thus, when they are used with algorithms having good locality of reference,
memory caching gives a significant performance boost.
std::list generally does not store consecutive items in
nearby memory locations.
Thus, there is no associated performance boost.
- Here are seven for which the order differs:
- Look-up, given index. O(1) for array; O(n) for doubly linked list.
- Insert, given iterator. O(n) for array; O(1) for doubly linked list.
- Insert at beginning. O(n) for array; O(1) for doubly linked list.
- Insert at end. O(n) [amortized constant time] for array,
unless provisions have been made for avoiding reallocation; O(1) for doubly linked list.
- Remove, given iterator. O(n) for array; O(1) for doubly linked list.
- Remove at beginning. O(n) for array; O(1) for doubly linked list.
- Binary Search. O(log n) for array; O(n) for doubly linked list.
- Here are nine for which the order is the same:
- Traverse (iterate through all items). O(n) for both.
- Copy entire sequence. O(n) for both.
- Swap two sequences. O(1) for both.
- Sort. O(n log n) for both.
- Retrieve first item. O(1) for both.
- Retrieve last item. O(1) for both.
- Create empty data structure. O(1) for both.
- Remove at end. O(1) for both.
- Sequential Search. O(n) for both.
- In a Singly Linked List class, the data members needed are:
- A pointer to the first node in the list (head pointer).
- Maybe the size of the list.
- In a Singly Linked List node class, the data members needed are:
- This node’s data item.
- A pointer to the next node in the list.
- In a Doubly Linked List,
the class needs two pointers (head & tail),
and the node needs two pointers (next & previous).