Recursive Raytracing, on GPU Hardware without Recursion
2010, Dr. Lawlor, CS 
481/681, CS, UAF
It's really surprisingly easy to add true reflections to a
raytracer.  If we start with the global "find_color" function,
originally something like this:
// Look up the color of the world viewed along this ray
vec4 find_color(vec3 rayStart,vec3 rayDir) {
	ray_intersection i = closest_hit(rayStart,rayDir); // geometric search
	return light_intersection(i); // diffuse + specular
}
To support perfect mirror reflection, you just add a recursive call to "find_color", that searches along the reflected ray:
// Recursively look up the color of the world viewed along this ray
vec4 find_color(vec3 rayStart,vec3 rayDir) {
	ray_intersection i = closest_hit(rayStart,rayDir); // geometric search
	vec4 local = light_intersection(i); // diffuse + specular
	vec4 remote = find_color(i.hit_location,reflect(rayDir,i.normal)); // recurse!
	return local*(1.0-i.mirror) + remote*i.mirror;
}
Unfortunately, the current generation of GPU hardware doesn't support
recursive functions.  This is a hardware limitation, due to the
fact that functions are inlined, so there isn't a runtime call stack.
But luckily, we can actually transform recursion into iteration quite
generally.  For example, here's a typical for loop, written with
recursion:
void recursive(int i) {
	if (i<10) {
		cout<<"I'm at recursion "<<i<<"\n";
		recursive(i+1);
	}
}
int foo(void) {
	recursive(0);
	
	// Iterative version
	for (int i=0;i<10;i++) {
		cout<<"I'm at iteration "<<i<<"\n";
	}
	return 0;
}
(Try this in NetRun now!)
It's a little bit more complicated to support recursive functions that
modify their return value, like this example.  The trick is to
accumulate all the "0.5*i" multiplications into a single cumulative
scale factor:
float recursive(int i) {
	if (i<10) {
		cout<<"I'm at recursion "<<i<<"\n";
		return 0.5*i*recursive(i+1);
	}
	else return 1.0; // base case
}
int foo(void) {
	cout<<"Recursive: "<<recursive(1)<<"\n\n";
	
	float scale=1.0;
	for (int i=1;i<10;i++) {
		cout<<"I'm at iteration "<<i<<"\n";
		scale*=0.5*i;
	}
	cout<<"Iterative: "<<scale*1.0<<"\n";
	
	return 0;
}
(Try this in NetRun now!)
We can apply this same trick to transform a recursive raytracer into an iterative loop over ray bounces.
// Recursively look up the color of the world viewed along this ray
vec4 find_color(vec3 rayStart,vec3 rayDir) {
	ray_intersection i = closest_hit(rayStart,rayDir); // geometric search
	vec4 local = light_intersection(i); // diffuse + specular
	vec4 remote = find_color(i.hit_location,reflect(rayDir,i.normal)); // recurse!
	return local*(1.0-i.mirror) + remote*i.mirror;
}
Transformed to iteration, we need a "contribution" variable to account
for all the recursive "i.mirror" factors from previous bounces:
// Iteratively look up the color of the world viewed along this ray
vec4 find_color(vec3 rayStart,vec3 rayDir) {
	vec4 finalColor=vec4(0.0);
	float contribution=1.0; // fraction of my color to add to finalColor
	for (int raybounce=0;raybounce<max_bounces;raybounce++) {
		ray_intersection i = closest_hit(rayStart,rayDir); // geometric search
		vec4 local = light_intersection(i); // diffuse + specular
		finalColor += local*(1.0-i.mirror)*contribution;
		contribution *= i.mirror; // <- scale down all subsequent rays
		rayStart=i.hit_location; // set up for next bounce
		rayDir=reflect(rayDir,i.normal);
	}
	return finalColor;
}
This iterative version works great on the GPU, and the cumulative
"contribution" scale factor even suggests an optimization: stop looping
when the cumulative impact drops below some quality threshold:
// Iteratively look up the color of the world viewed along this ray
vec4 find_color(vec3 rayStart,vec3 rayDir) {
	... as before ...
	for (int raybounce=0;raybounce<max_bounces;raybounce++) {
		... as before ...
		if (contribution<0.01) break; // give up--no impact on final color
	}
	return finalColor;
}
See the example code for a worked-out example.