CS 301 - Homework 8 KEY

  1. Change the "bar" subroutine (which evaluates a quadratic polynomial on the array "f") so it takes less than 1.0ns per float.  Don't change anything else.  Don't worry about roundoff or when the array isn't a multiple of 4 in length.  You'll probably have to use x86 SSE instructions. (Executable NetRun Link)
    #include <xmmintrin.h>
    
    enum {n=1000};
    float f[n+1];
    float a=0.2,b=0.3,c=0.4;

    int bar(void) { // YOU MAY ONLY CHANGE THE BAR ROUTINE!

    /* FIX: use SSE. Lots of options are possible. */
    __m128 va=_mm_load1_ps(&a);
    __m128 vb=_mm_load1_ps(&b);
    __m128 vc=_mm_load1_ps(&c);
    for (int i=0;i<n;i+=4) {
    __m128 vf=_mm_load_ps(&f[i]);
    __m128 s=_mm_add_ps(_mm_mul_ps(vf,va),vb);
    s=_mm_add_ps(_mm_mul_ps(vf,s),vc);
    _mm_store_ps(&f[i],s);
    }
    return 0;
    }

    int foo(void) {
    printf("bar: %.2f ns/float\n",time_function(bar)/n*1.0e9);
    farray_fill(f,n,1.0); bar();
    return farray_checksum(f,n,1.0);
    }

  2. Your buddy wrote this x86 SSE code, but it's now segfaulting.  Change "bar" to do the same thing, but without segfaulting. (Executable NetRun Link)
    #include <xmmintrin.h>
    enum {n=1000};
    float f[n+1];

    int bar(void) { // YOU MAY ONLY CHANGE THE BAR ROUTINE!
    //FIX: __m128 v=_mm_load_ps(&f[22]); // unaligned address!
    __m128 v=_mm_loadu_ps(&f[22]); /* FIX: use load-unaligned */
    /* lots of other fixes OK too--
    load_ss, load_ps1, copy to aligned address, "return f[22]*f[22]" */
    v=_mm_mul_ps(v,v);
    float ret;
    _mm_store_ss(&ret,v);
    return (int)ret;
    }

    int foo(void) {
    farray_fill(f,n,1.0); bar();
    return bar();
    }
  3. You're working for the huge game company Ego(tm). Ego, close to bankruptcy, just released Earth XXVI (the latest version of their most popular game) on a very tight production schedule.  The schedule was so tight that due to various miscommunications most of the 90GB of game art is horribly screwed up.  Rather than recall and reprint all 18 game DVDs, Ego(tm) wants to release a small patch to correct the game art on-the-fly at display time.  Your job is to write ARB_fragment_program code to correct the art.  For example, texture[3] is almost right, except the colors are all backwards--each color component has values from 1 to 0, instead of 0 to 1.  You can see the texture with this small OpenGL program: (Executable NetRun Link)
    TEX out,in,texture[3],2D;
    MAD out,out,-1.0,1.0; /* out = out*-1.0 + 1.0, which flips colors around */
    Change this fragment program so the art is displayed properly, with colors from 0 to 1 (black text on a white background)--i.e., where the texture contains color c, you should display color 1.0-c.  The photo of the demon, er, professor should then look like an ordinary photo.
  4. texture[4] was defaced by vandals, who broke into Ego's servers just before the release.  Luckily, they only destroyed the red and blue channels--the green channel is still OK.  Copy the texture's green channel out to all the other channels--the result will be greyscale, but it won't be as embarassing as the vandalized version.
    TEX out,in,texture[4],2D;
    MOV out,out.ggga; /* copy green channel into all output channels */
  5. texture[5] is the result of a malfunction in Ego's "rot-20" Content Protection System, used to prevent crackers, pirates, and terrorists from stealing Ego's Intellectual Property(tm).  The system works by cyclically rotating each line by a fixed amount.  The company cryptographers say this malfunction can be compensated for by adjusting the input texture coordinates, changing coordinates (x,y) to coordinates (x+20.0*y,y)--OpenGL's texture repeat will automatically take care of the rest.  If this works, you should see the actual end-game image instead of weird noise.
    MAD in.x,in.y,20.0,in.x; /* in.x = in.y*20+in.x */
    TEX out,in,texture[5],2D;
For problems 1 and 2, read the SSE and SIMD lecture notes.  You shouldn't need anything other than loads and stores and basic arithmetic.

For problems 3-5, read the graphics card lecture notes and see the ARB_fragment_program cheat sheet.  You shouldn't need anything other than the TEX call above, ADD, MUL, MOV, swizzling, and writemasks.  You'll know you're done when you have a white background, black text listing the problem number, and a reasonable-looking photo.

As usual, you'll turn these problem in by just naming them HW8_1, HW8_2, HW8_3, etc. in NetRun

Problems are due Thursday, December 8, at NOON--I'm tired of NetRun crashing when I'm not there to fix it.


O. Lawlor, ffosl@uaf.edu
Up to: Class Site, CS, UAF