- Change the "bar" subroutine (which evaluates a quadratic
polynomial on the array "f") so it takes less than 1.0ns per
float. Don't change anything else. Don't worry about
roundoff or when the array isn't a multiple of 4 in length.
You'll probably have to use x86 SSE instructions.
(Executable NetRun Link)
#include <xmmintrin.h>
enum {n=1000};
float f[n+1];
float a=0.2,b=0.3,c=0.4;
int bar(void) { // YOU MAY ONLY CHANGE THE BAR ROUTINE!
/* FIX: use SSE. Lots of options are possible. */
__m128 va=_mm_load1_ps(&a);
__m128 vb=_mm_load1_ps(&b);
__m128 vc=_mm_load1_ps(&c);
for (int i=0;i<n;i+=4) {
__m128 vf=_mm_load_ps(&f[i]);
__m128 s=_mm_add_ps(_mm_mul_ps(vf,va),vb);
s=_mm_add_ps(_mm_mul_ps(vf,s),vc);
_mm_store_ps(&f[i],s);
}
return 0;
}
int foo(void) {
printf("bar: %.2f ns/float\n",time_function(bar)/n*1.0e9);
farray_fill(f,n,1.0); bar();
return farray_checksum(f,n,1.0);
}
- Your buddy wrote this x86 SSE code, but it's now segfaulting. Change "bar" to do the same thing, but without segfaulting.
(Executable NetRun Link)
#include <xmmintrin.h>
enum {n=1000};
float f[n+1];
int bar(void) { // YOU MAY ONLY CHANGE THE BAR ROUTINE!
//FIX: __m128 v=_mm_load_ps(&f[22]); // unaligned address!
__m128 v=_mm_loadu_ps(&f[22]); /* FIX: use load-unaligned */
/* lots of other fixes OK too--
load_ss, load_ps1, copy to aligned address, "return f[22]*f[22]" */
v=_mm_mul_ps(v,v);
float ret;
_mm_store_ss(&ret,v);
return (int)ret;
}
int foo(void) {
farray_fill(f,n,1.0); bar();
return bar();
}
- You're working for the huge game company Ego(tm). Ego, close to
bankruptcy, just released Earth XXVI (the latest version of their most
popular game) on a very tight production schedule. The schedule
was so tight that due to various miscommunications most of the 90GB of
game art is horribly screwed up. Rather than recall and reprint
all 18 game DVDs, Ego(tm) wants to release a small patch to correct the
game art on-the-fly at display time. Your job is to write
ARB_fragment_program code to correct the art. For example,
texture[3] is almost right, except the colors are all backwards--each
color component has values from 1 to 0, instead of 0 to 1. You
can see the texture with this small OpenGL program: (Executable NetRun Link)
TEX out,in,texture[3],2D;
MAD out,out,-1.0,1.0; /* out = out*-1.0 + 1.0, which flips colors around */
Change this fragment program so the art is displayed properly, with
colors from 0 to 1 (black text on a white background)--i.e., where the
texture contains color c, you should display color 1.0-c. The photo of the demon, er, professor should then look like an ordinary photo.
- texture[4] was defaced by vandals, who broke into Ego's servers
just before the release. Luckily, they only destroyed the red and
blue channels--the green channel is still OK. Copy the texture's
green channel out to all the other channels--the result will be
greyscale, but it won't be as embarassing as the vandalized version.
TEX out,in,texture[4],2D;
MOV out,out.ggga; /* copy green channel into all output channels */
- texture[5] is the result of a malfunction in Ego's "rot-20" Content
Protection System, used to prevent crackers, pirates, and terrorists
from stealing Ego's Intellectual Property(tm). The system works
by cyclically rotating each line by a fixed amount. The company
cryptographers say this malfunction can be compensated for by adjusting
the input texture coordinates, changing coordinates (x,y) to
coordinates (x+20.0*y,y)--OpenGL's texture repeat will automatically
take care of the rest. If this works, you should see the actual
end-game image instead of weird noise.
MAD in.x,in.y,20.0,in.x; /* in.x = in.y*20+in.x */
TEX out,in,texture[5],2D;
For problems 1 and 2, read the As usual, you'll turn these problem in by just naming them HW8_1, HW8_2, HW8_3, etc. in NetRun.
Problems are due Thursday, December 8, at NOON--I'm tired of NetRun crashing when I'm not there to fix it.