Newtonian Particles on the GPU: P and V textures

CS 482 Lecture, Dr. Lawlor

The basic idea for simulating Newtonian particles on the GPU is to keep the position P and velocity V in textures, and compute their changes in pixel shaders.

The big challenge is in a 3D simulation, P is XYZ, and V is XYZ, so we have a total of 6 floats, but a shader's output can only be 4 floats. Possible solutions include:

Use separate textures for P and V. This means writing separate shaders that output P and V separately. Luckily, P+=dt*V and V+=dt*A read and write mostly separate data. In general, P=fnP(oldP,oldV), and V=fnV(oldP,oldV), so this theoretically scales to any number of outputs, although it might not be very efficient if you need to recalculate stuff (e.g., collision detection) in both shaders.
Use the same texture for P and V, but interleave the pixels somehow, for example by making even X coordinates represent P, while odd X coordinates represent V. This means your shader actually runs twice for each particle, once to output the P pixel, and a second time to output the V pixel, and because these probably don't share much functionality, the shader probably starts with one big branch and is effectively two shaders welded together. Because of GPU branch granularity, it's probably best if P and V are separated by at least a few pixels, the more the better (e.g., left half of texture is P, right half is V). This has fewer passes than fully separated P and V, but has little else to recommend it.
Use a single pixel to merge P and V. This is trivial in 2D: vec2 P=tex.xy; vec2 V=tex.zw;. In 3D in general this won't work, but often particles are trapped in a 2D+height surface anyway, so you can recompute Z where you need it. I've seen people do very strange things, like pack P.x and P.y into a single float using scaling and mod commands, but to do this the high coordinate needs to be OK with truncated precision.
There's a special OpenGL state command glDrawBuffers that enables a single shader to output pixels for several textures at once, for example writing to gl_FragData[2] will write to the framebuffer object's GL_COLOR_ATTACHMENT2 texture, but you do need to set up quite a bit of OpenGL state beforehand. There is still a hardware limit to how many textures you can write from a single shader GL_MAX_COLOR_ATTACHMENTS, typically 4 or 8 depending on your OpenGL hardware and drivers. OpenGL ES implementations such as WebGL only support one color attachment.

OK, now we can compute positions and velocities. For newtonian mechanics, we can either have a separate force texture, or compute the force as a local variable inside the velocity shader. Typically it's more efficient, and less code and hassle, to recompute things rather than store and load them later.