CS 622 - Performance Evaluation, Fall 2002

CS 622 - Performance Evaluation, Fall 2002
TR 2:00 - 3:30 in Chapman 206

Recent Announcements:

11/12 Project - Presentation/Report due Dec 12. Profile and optimize the implementation assigned to you on a RISC architecture and the Itanium PC. (Make sure to verify the code produces correct results and that any optimizations you make also result in correct results) Where are the "hot spots"? Compare/contrast the "hot spots" for RISC vs. Itanium. On each architecture, illustrate the top 3 optimizations and summarize

how you discovered the "hot spot"
how you improved the execution time
how you verified your optimization was indeed faster
What is your final speedup (unoptimized code vs. final optimized code) for RISC and Itanium? Which were you able to improve the most? Which had the best performance in the end?
The code assignments are coming soon ...

RM - vector normalization
LL - heapsort
BH - gaussian elimination
SK - tri-linear interpolation

11/12 HW#7 due Nov 26 - Repeat HW#6 on one of the Itanium PCs. Also note the main differences (if any) between the machine you used in HW#6 and the Itanium PC.

11/05 HW#6 due Nov 12 - Profile gausstable.c (the two for loops). Report

the percentage of time spent on each statement
the number of times each statement was executed
the average execution time for each statement
Use you experience to estimate how much you could optimize this program in terms of execution time. Explain and justify your answer.

10/17 HW#5 due Oct 24 - Problem 5.4. Use an ANOVA test to compare the performances of three different, but roughly comparable, computer systems measured in terms of the execution time of the benchmark program you found and ran for HW #2. The ANOVA test shows only whether there is a statistically significant difference among systems, not how large the difference really is. Use appropriate contrasts to compare the differences between all possible pairs of the systems. Explain and interpret your results.

Course Materials:

09/17 - gausstable.c source listing
09/15 - memory.c source listing. Sample output can be found here under Memory Hierarchy Examples.
09/04 - Syllabus

Old Announcements:

10/10 HW#4 due Oct 17 - Re-do HW#1 part 3 by integrating what we have learned about statistics in Chapters 4 & 5. Do it for two different loops - randsqrt.c and arraysqrt.c. Generate 4 total graphs as follows:

for randsqrt, average sqrt time vs precision with 95% confidence bands for using sqrt() and using LUTs.
for arraysqrt, average sqrt time vs precision with 95% confidence bands for using sqrt() and using LUTs.
for randsqrt, plot the mean of the differences between sqrt() and using LUTs for 1 to 5 digits of precision with 95% confidence bands.
for arraysqrt, plot the mean of the differences between sqrt() and using LUTs for 1 to 5 digits of precision with 95% confidence bands.
To get a 95% confidence interval for each data point, run 30 loops of random size between 1,000,000 and 10,000,000. Comment on the results and what conclusions you can draw.

10/10 Here are some comments on HW#1.

10/08 HW#3 due Oct 15 - Write a proram to find the clock resolution (minimum time interval you can measure) on 2 Unix systems. You can use the web for ideas, but document how/where you found the information and what you used.

09/26 HW#1 part 3 due Oct 3 - For two different systems, when is it better to use a sqrt lookup table instead of computing using sqrt()? The "application" is a for loop that computes sqrts of random numbers in the range [0,1].

09/19 - HW#1 part 2 due Sep 26 - Implement a sqrt lookup table for input values between 0.0 and 1.0. Create one program that generates a table (like gausstable.c) with a "digits of precision" input. Write a driver program that creates random numbers between 0.0 and 1.0 and looks up the sqrt value from your table. Graph the average sqrt lookup for precision from 1 to 5 digits.
09/19 - HW#2 due Sep 26 - Go to the SPEC web page and give a 1-2 sentence summary of their 12 current benchmarks. Read about some other benchmarks here. Compile and run the C Loops and C Cache benchmarks on a Unix system and interpret the results. Compare their timing method to your method from HW#1.
09/17 - No class on Tuesday Sep 24.
09/12 - HW#1 part 1 due Sep 17 - Find an accurate time measurement on two platforms and test with a loop of sqrt calls. Comment on the quality of your results.

11/12	Project - Presentation/Report due Dec 12. Profile and optimize the implementation assigned to you on a RISC architecture and the Itanium PC. (Make sure to verify the code produces correct results and that any optimizations you make also result in correct results) Where are the "hot spots"? Compare/contrast the "hot spots" for RISC vs. Itanium. On each architecture, illustrate the top 3 optimizations and summarize how you discovered the "hot spot" how you improved the execution time how you verified your optimization was indeed faster What is your final speedup (unoptimized code vs. final optimized code) for RISC and Itanium? Which were you able to improve the most? Which had the best performance in the end? The code assignments are coming soon ... RM - vector normalization LL - heapsort BH - gaussian elimination SK - tri-linear interpolation
11/12	HW#7 due Nov 26 - Repeat HW#6 on one of the Itanium PCs. Also note the main differences (if any) between the machine you used in HW#6 and the Itanium PC.
11/05	HW#6 due Nov 12 - Profile gausstable.c (the two for loops). Report the percentage of time spent on each statement the number of times each statement was executed the average execution time for each statement Use you experience to estimate how much you could optimize this program in terms of execution time. Explain and justify your answer.
10/17	HW#5 due Oct 24 - Problem 5.4. Use an ANOVA test to compare the performances of three different, but roughly comparable, computer systems measured in terms of the execution time of the benchmark program you found and ran for HW #2. The ANOVA test shows only whether there is a statistically significant difference among systems, not how large the difference really is. Use appropriate contrasts to compare the differences between all possible pairs of the systems. Explain and interpret your results.

10/10	HW#4 due Oct 17 - Re-do HW#1 part 3 by integrating what we have learned about statistics in Chapters 4 & 5. Do it for two different loops - randsqrt.c and arraysqrt.c. Generate 4 total graphs as follows: for randsqrt, average sqrt time vs precision with 95% confidence bands for using sqrt() and using LUTs. for arraysqrt, average sqrt time vs precision with 95% confidence bands for using sqrt() and using LUTs. for randsqrt, plot the mean of the differences between sqrt() and using LUTs for 1 to 5 digits of precision with 95% confidence bands. for arraysqrt, plot the mean of the differences between sqrt() and using LUTs for 1 to 5 digits of precision with 95% confidence bands. To get a 95% confidence interval for each data point, run 30 loops of random size between 1,000,000 and 10,000,000. Comment on the results and what conclusions you can draw.
10/10	Here are some comments on HW#1.
10/08	HW#3 due Oct 15 - Write a proram to find the clock resolution (minimum time interval you can measure) on 2 Unix systems. You can use the web for ideas, but document how/where you found the information and what you used.
09/26	HW#1 part 3 due Oct 3 - For two different systems, when is it better to use a sqrt lookup table instead of computing using sqrt()? The "application" is a for loop that computes sqrts of random numbers in the range [0,1].

CS 622 - Performance Evaluation, Fall 2002 TR 2:00 - 3:30 in Chapman 206

CS 622 - Performance Evaluation, Fall 2002
TR 2:00 - 3:30 in Chapman 206