GPU Programming with gVector1D

CS 482 Lecture, Dr. Lawlor

To make general-purpose GPU programming in THREE.js/WebGL easier, I built a library named gpu1D.js. Unlike most JavaScript, it actually has comments!

This is a JavaScript library, but if it were C++ the declarations would be:

/* A gVector1D is a 1D array, implemented internally as a 2D texture.
You can draw it onscreen like a 2D texture, or pass it to a gFunction1D as a 1D array.
*/
class gVector1D {
public:
	gVector1D(int newSize,float *newValues=null);

	int size; // number of elements in our 1D vector
	THREE.WebGLRenderTarget tex; // the texture that stores our data
	
	// vec1.swap(vec2) interchanges the guts of the two vectors efficiently.
	//   This is commonly used for ping-pong type operations.
	void swap(gVector1D &other);

	// Show this texture onscreen, for debugging
	void debugTo(THREE.Scene &scene,float size=10.0,float xPos=-size,float yPos=+size,float zPos=+size);
}

/* A gFunction1D is some GPU code, which can be executed into a vector.
You set uniforms to provide the input values and textures. 
*/
class gFunction1D {
public:
	/*
	  Pass in the GLSL code that forms the guts of this function.
	  This needs at least one function named "run", with signature:
	    vec4 run(float index) { ... }
	  "index" is your integer index (from 0 to size-1) in the target vector.
	  The return vec4 will be written into that vector.
	*/
	gFunction1D(const char *fragShaderGLSLcode);

	/* Set a uniform variable to this value. 
	   This automatically adds a uniform of the right type to your GLSL code.
	   It currently only works with values of type gVector1d, float, or vec2-3-4.

	   Returns this gFunction1D, so you can string together set calls in a fluent style.
	*/
	template <class T>
	gFunction1D &set(const char *uniformName,T value);

	/* This executes the pixel shader.  
	   Compilation is delayed until this stage during the first run, 
	   so we can collect info about your uniform variables.
   
	   targetVector is a gVector1D where the output of your shader will be written.
   
	   runLength is an integer pixel count giving the minimum number of pixels to render.  
	     It's optional: leave it off to run at the full targetVector.size.
	*/
	gFunction1D &run(gVector1D &targetVector,int runLength=targetVector.size);
};

Generally speaking, you create gVector1D objects to store your application data, and gFunction1D objects to manipulate that data. For example, here's how we might create and initialize an array:

   var P=new gVector1D(10000); // P has 10 thousand elements
   var f=new gFunction1D(" vec4 run(float index) { return vec4(index*scale); } ");
   f.set("scale",3.0); // set a uniform float named "scale" (used above)
   f.run(P); // run f for every pixel in P, and put the resulting values into the P texture

Note that "set" and "run" return the gFunction1D object back to you, so you can string together calls in the "fluent programming" style:

  new gFunction1D(" vec4 run(float index) { return vec4(index*scale); } ")
   .set("scale",3.0) // set a uniform named "scale" (used above) to three
   .run(P);

This runs the same as above, but it's a single statement.

I'm particularly proud of my scheme for accessing other textures. Calling
fn.set("name",vec);
binds "name" to a GLSL function that looks up pixels from vec. For example, you could make a new texture Q with each pixel equal to one plus the corresponding pixel in P like this:

  var Q=new gVector1D(P.size); // Q is the same size as P
  var fq=new gFunction1D(" vec4 run(float index) { return 1.0+readP(index); } ");
  fq.set("readP",P); // read from P (as "readP" function)
  fq.run(Q); // write to Q

Performance

Creating textures on the GPU is expensive (hundreds of microseconds), so typically you do all your allocation in "Setup". Compiling gFunction1D objects is also expensive (milliseconds), so you should create them once and call them repeatedly during the simulation.

You can read from as many vectors as you like (just call "set" repeatedly), but one big limitation of GLSL is can only write to a single vector at a time. And you can't even control where you're writing in that vector--you always write to your "index", not some place of your choice. The reason for these limitations in parallelism: reads can happen in any order, get cached, and still be fast and correct; but writes in the wrong order or at the wrong time give you the wrong answer. Thus, GLSL doesn't give you control over writes.

For this reason, you won't always get the right answer if a shader reads (via set) and writes (via run) the same vector. A common trick is to have two versions of the vector, "old" and "new", where you read from "old" and write to "new". You can then swap the two for the next step, a trick often called "ping pong the buffers".

(my) Related Work

gpu1D is not the first GPU programming interface I've built.

gpgpu is my C++ OpenGL wrapper, and conceptually quite similar to gpu1D.
EPGPU is a C++ OpenCL wrapper, and currently my main GPU programming language.
cudaMPI is a CUDA wrapper for MPI communication between GPUs, on a GPGPU cluster.
MPIglut distributes a C++ OpenGL/GLUT graphical program across a distributed-memory MPI display cluster.