Parallelizing Finite Element Simulations
CS 641 Lecture, Dr. Lawlor
Aside: sending pointers between MPI processes isn't likely to
work--because different MPI processes live in different memory spaces,
a valid pointer one process may point to garbage or invalid memory on
another process. However, on Linux program executable code is
normally loaded to exactly the same memory location every time, so
curiously string constants can anomalously work across processes.
You can even manually allocate addresses (using mmap) to be identical
across processors, if you want to be able to send pointers for some
bizarre reason like thread migration.
Finite Element Simulations
You saw in class and in the homework how easy it is to parallelize most
programs based on a regular 1D or 2D grid--just divide up the grid, and
possibly exchange a few boundary rows. Not all programs use
regular grids though, and for an unstructured mesh, parallelization
takes a lot more work.
Step one is to divide up the mesh into pieces. At each timestep, you do some local work, then communicate to keep the computation tied together.
For example, a typical finite element program looks like this (more physics details here):
/* Elements apply forces to nodes based on node positions (displacements) */
for (element = ...) {
mat3 strain = element.readpositions(nodes);
mat3 stress = element.material(strain);
element.nodes.force += stress * element.sides;
}
/* Move nodes based on net force */
for (node = ...) {
vec3 acceleration = node.force / node.mass; /* F=mA; A=F/m */
node.velocity += dt*acceleration;
node.position += dt*node.velocity;
}
One way to parallelize this is to assign elements to processors, make
separate copies of the nodes along processor boundaries, and then send
and sum up the forces on those "shared" nodes:
/* (my) Elements apply forces to nodes based on node positions (displacements) */
for (element = ...) {
mat3 strain = element.readpositions(nodes);
mat3 stress = element.material(strain);
element.nodes.force += stress * element.sides;
}
/* Exchange forces with my neighbors, to get forces from neighboring elements */
for (neighbor = ...) {
for (n in neighbor.sharedlist)
neighbor.sendforces[n] = nodes[neighbor.sharedlist[n]].force;
MPI_Isend(neighbor.sendforces, ... , neighbor.rank); /* I send him mine */
}
for (neighbor = ...) {
MPI_Recv(neighbor.recvforces, ... , neighbor.rank); /* he sends me his */
for (n in neighbor.sharedlist) /* we both total ours up */
nodes[neighbor.sharedlist[n]].force += neighbor.recvforces[n];
}
/* Move nodes based on net force */
for (node = ...) {
vec3 acceleration = node.force / node.mass; /* F=mA; A=F/m */
node.velocity += dt*acceleration;
node.position += dt*node.velocity;
}
The other way to parallelize this application is to divide up the
nodes, and then replicate the elements along a boundary. These
are often called "ghost" elements.
In principle, this is pretty straightforward, but in practice,
organizing your serial and parallel node and element numbers, and
shared-node lists, is pretty painful.
While I was in Illinois I worked on the Finite Element Framework ParFUM, a library for parallelizing mesh-based programs. The key features are an interface for describing arbitrary meshes, partitioning and unpartitioning meshes, and communicating shared nodes or ghosts.