HW 4: Ballistics Tables
CS 441 Homework, Dr. Lawlor
Sir Charles Babbage's computational engine, and the Navy's first
automatic digital computers, were designed to compute ballistics
tables. Today, ballistics tables are important both for the army
and navy, and in path planning for space exploration.
This program computes the destination location for ore launched by a
lunar mass driver. The ore experiences drag as it encounters
mining dust in flight. The launcher's maximum range is 25 miles,
approximately a one minute flight time.
typedef float real; /* "real" is our numeric type (could also be double, __m128) */
/* Coefficients used in one path computation. */
class path_coeff {
public:
real drag; /* multiply by velocity to get drag-induced acceleration */
real g; /* gravitational acceleration, miles/sec/sec */
real dt; /* timestep, in seconds */
};
/*
Run one ballistic path computation.
Y axis is height. X axis is distance downrange.
Projectile starts at (0,0) miles with initial velocity vx,vy miles/second.
Projectile's terminal (y==0) range in miles is returned in final_x,
and total flight time in seconds is returned in final_t.
*/
void follow_path(const path_coeff &coeff,
real vx,real vy,
real *final_x,real *final_t)
{
real x=(real)0.0, y=(real)0.0, t=(real)0.0;
real dt=coeff.dt; /* timestep */
bool done=false;
while (!done) {
real ax=-coeff.drag*vx, ay=-coeff.drag*vy-coeff.g; /* acceleration */
vx+=dt*ax; vy+=dt*ay; /* velocity */
if (y+dt*vy<0.0) { /* we're about to hit the ground */
*final_x=x;
*final_t=t;
done=true;
}
x+=dt*vx; y+=dt*vy; /* position */
t+=dt;
}
}
/* Return the initial velocity for path i */
void path_initial(int i,real *vx,real *vy) {
*vx=0.73-i*0.005;
*vy=0.01+i*0.0005;
}
/* Stores a set of computed flight paths */
class path_table {
public:
enum {n=100};
real x[n], t[n]; /* range (miles) and time (seconds) for path */
};
/* Build and print a set of computed ballistics paths */
void fill_table(const path_coeff &coeff,path_table &table) {
for (int i=0;i<path_table::n;i++) {
real vx,vy;
path_initial(i,&vx,&vy);
follow_path(coeff,vx,vy,&table.x[i],&table.t[i]);
}
}
void print_table(path_table &table) {
for (int i=0;i<path_table::n;i++) {
printf("Path %3d: x=%.3f miles, t=%.3f seconds\n",
i,table.x[i],table.t[i]);
}
}
path_coeff coeff;
int time_fill(void) {
path_table table;
fill_table(coeff,table);
return 0;
}
int foo(void) {
coeff.drag=0.007; /* drag due to ore collisions with mining dust */
coeff.g=1/6.0*(32/5280.0); /*lunar gravity acceleration, miles/s^2*/
coeff.dt=0.001; /*timestep, in seconds: more accurate at 0.0001 or smaller*/
printf("fill_table takes %.3f ms\n",time_function(time_fill)*1.0e3);
path_table table;
fill_table(coeff,table);
print_table(table);
return 0;
}
(Try this in NetRun now!)
This code is also on the powerwall in "~olawlor/441_demo/hw4/".
"follow_path" is where most of the time is spent. The
"time_function" call above is NetRun-specific; feel free to replace it
with an MPI timer call in the MPI version.
For maximum accuracy, they'd like to run this same program at
dt=0.0001 or smaller, and for bidding purposes, they'd like the results
to be returned as fast as possible. Of course, the results
still have to be printed the same way, and they have to be right!
For the homework, please:
- Determine how this program's answer and run time changes as you
vary the timestep coeff.dt. Note that NetRun will only let the
code run for so long, so smaller dt's may get killed off by
NetRun. Very small dt's may cause rounding error to dominate the
answer.
- Optimize this program using OpenMP. Luckily, there are no global
variables in this program. However, due to the different path lengths,
load balance between threads can be a serious performance problem; I recommend "schedule(dynamic,1)".
Feel free to use either NetRun (quad core; switch to the
"OpenMP" language), the powerwall (dual core; compile with ~olawlor/441_demo/hw4/ makefile or "g++-4.2
-msse3 -fopenmp main.cpp -o main"), or use your own machine for performance
tests.
- Optimize this program using SSE. Note that "follow_path"
doesn't do any 4-input arithmetic, so it'll be tough to optimize
follow_path much without changing the interface. Feel free to
change the call to follow_path! I recommend making a new "__m128"
version of follow_path that traces 4 paths at once--the only tricky
part about doing this is the "hit the ground" check, which can be
transformed into an SSE-style parallel if.
- Optimize this program using MPI. You'll need to do your
performance runs on the powerwall. Feel free to start with your
SSE version, or the original program. You'll need to collect up
the final answers onto one machine to be printed. As with OpenMP,
load balance is a serious problem; one solution is to have process p
compute the iterations where i%size==p (called "round robin"
decomposition). Because they'll be computing tables repeatedly on
the fly, you can ignore the MPI startup and shutdown time--MPI_Wtime()
is the important time.
This homework will be due at midnight on Tuesday, November 24 on Blackboard.
My preferred format is set of source code files, like "openmp.cpp",
"sse.cpp", and "mpi.cpp", together with a short clear README file
explaining your overall parallelization strategy and performance
results from each parallelization method.