Parallelizing Finite Element Simulations

CS 641 Lecture, Dr. Lawlor

Aside: sending pointers between MPI processes isn't likely to work--because different MPI processes live in different memory spaces, a valid pointer one process may point to garbage or invalid memory on another process.  However, on Linux program executable code is normally loaded to exactly the same memory location every time, so curiously string constants can anomalously work across processes.  You can even manually allocate addresses (using mmap) to be identical across processors, if you want to be able to send pointers for some bizarre reason like thread migration.

Finite Element Simulations

You saw in class and in the homework how easy it is to parallelize most programs based on a regular 1D or 2D grid--just divide up the grid, and possibly exchange a few boundary rows.  Not all programs use regular grids though, and for an unstructured mesh, parallelization takes a lot more work.

Step one is to divide up the mesh into pieces.  At each timestep, you do some local work, then communicate to keep the computation tied together.

For example, a typical finite element program looks like this (more physics details here):
	/* Elements apply forces to nodes based on node positions (displacements) */
for (element = ...) {
mat3 strain = element.readpositions(nodes);
mat3 stress = element.material(strain);
element.nodes.force += stress * element.sides;
}

/* Move nodes based on net force */
for (node = ...) {
vec3 acceleration = node.force / node.mass; /* F=mA; A=F/m */
node.velocity += dt*acceleration;
node.position += dt*node.velocity;
}
One way to parallelize this is to assign elements to processors, make separate copies of the nodes along processor boundaries, and then send and sum up the forces on those "shared" nodes:
	/* (my) Elements apply forces to nodes based on node positions (displacements) */
for (element = ...) {
mat3 strain = element.readpositions(nodes);
mat3 stress = element.material(strain);
element.nodes.force += stress * element.sides;
}

/* Exchange forces with my neighbors, to get forces from neighboring elements */
for (neighbor = ...) {
for (n in neighbor.sharedlist)
neighbor.sendforces[n] = nodes[neighbor.sharedlist[n]].force;

MPI_Isend(neighbor.sendforces, ... , neighbor.rank); /* I send him mine */
}

for (neighbor = ...) {
MPI_Recv(neighbor.recvforces, ... , neighbor.rank); /* he sends me his */
for (n in neighbor.sharedlist) /* we both total ours up */
nodes[neighbor.sharedlist[n]].force += neighbor.recvforces[n];
}

/* Move nodes based on net force */
for (node = ...) {
vec3 acceleration = node.force / node.mass; /* F=mA; A=F/m */
node.velocity += dt*acceleration;
node.position += dt*node.velocity;
}
The other way to parallelize this application is to divide up the nodes, and then replicate the elements along a boundary.  These are often called "ghost" elements.

In principle, this is pretty straightforward, but in practice, organizing your serial and parallel node and element numbers, and shared-node lists, is pretty painful.

While I was in Illinois I worked on the Finite Element Framework ParFUM, a library for parallelizing mesh-based programs.  The key features are an interface for describing arbitrary meshes, partitioning and unpartitioning meshes, and communicating shared nodes or ghosts.