Course Review and Final Exam Preview

Generally, the things we've covered in this second half of the course fall into two categories:

These all interact in a nearly dense graph: for example, when two threads share the same cache line, there is a large performance cost.  You can use bitwise operations to simulate branching in SSE.  GPU performance is similar to the performance gain from both threads and SIMD.  I've had to use bitwise tricks to improve performance on slow embedded systems.  

For the exam, you're expected to not only know this information, but know how it interrelates!


  1. CS 301 Practice Final Exam

0.) NAME: ___________________________ (Score: 2 points!)
(From 2011). Closed book; closed notes. Complete sentences NOT needed.

1.) Which C++ data types can benefit from SSE? ___________________________________________

What do SSE's "ps" instructions, like "addps" do? ______________________________________

Why would you choose a “ps” over an “ss” instruction? ____________________________________

(Score: 9 points.)

2.) Given four floats sitting in memory at pointers "a", "b", "c", and "d", please write the x86 assembly code to compute “a*b+c*d”. You can leave this value in SSE register xmm0.

(Score: 10 points. Hint: movss, addss, and mulss.)

3.) What's inside an IEEE 32-bit float? Draw at least three parts, and label each part's name, and size in bits.

(Score: 10 points)

4.) What will this function return? x = ____________________

float funky(void) {

float x=2;

for (int i=0;i<10;i++) x=x*x;

return x; 


And why? ______________________

(Score: 5 points)

5.) If this code returned integer zero, then how fast is “doStuff”? _____________________________ seconds

int start=time_in_seconds();


return time_in_seconds()-start;

(Score: 5 points)

6.) Critique these false statements about performance analysis.

False Statement

Your Critique / Solution

I'm sure the code is slow—just look at it, it's way too complicated to run fast!



Since the CPU sometimes gets randomly delayed, such as processing network traffic, you can't ever say anything about performance.


We can't time that! It runs in nanoseconds, and our best timers return microseconds!



(Score: 15 points.)


7.) A coworker is amazed to discover that adding threads makes their code slower, not faster!  Why might this be?



(Score: 10 points.)


8.) Which instruction do you use for x86 system calls? _________________

Why might you make a bare system call? __________

(Score: 5 points.)

9.) Here's some over-optimized C++ code. Rewrite it in sane C++ with no bitwise operators.

Hideous Crap

Nice C++


int mask=(x<3) ? (~0) : 0;


// now mask==0xfffffff (x<3) or 0 (else)


int y=(mask&(10))|(~mask&(x+7));




(Score: 10 points.)

10.) For threads, when might you prefer to use std::thread over OpenMP threads?






(Score: 5 points.)

11.) Compared to x86, which of these is the “same” or “different” (and how?) on ARM and GPU?

Concept on x86...

...on ARM?

...on a GPU?

Register names, like rax



Accessing memory via a pointer stored in a register.



Calling functions, by storing return address on the stack.



The number of bits used in each instruction, like 8 bits for a “ret”.



Instructions exist like add, subtract, multiply, and divide.



(Score: 12 points. If they're different, briefly tell me how they're different.)



CS 301 Lecture Note, 2014, Dr. Orion LawlorUAF Computer Science Department.