- Single, meaning just one.

- Instruction, as in a machine code instruction, executed by hardware.

- Multiple, as in more than one--from 2 to a thousand or so.

- Data, as in floats or ints.

One universal aspect of SIMD programming is that you can't use jump instructions to implement conditional branches. The problem is you're doing multiple things at once, so if some should branch, and others shouldn't branch, you can't just jump to the right branch. So they've developed a whole variety of other ways to implement conditional logic without using jump instructions. The common aspect to all these approaches is that they evaluate *both* sides of the branch, and then selectively combine the results into one answer.

for (int doit=0;doit<=1;doit++) {This outputs:

int blended=4+doit*2;

std::cout<<"doit=="<<doit<<" -> "<<blended<<"\n";

}

doit==0 -> 4More generally, if you want to return A if doit is false (0), and B if doit is true (1), then you can do:

doit==1 -> 6

Program complete. Return 0 (0x0)

for (int doit=0;doit<=1;doit++) {You also see this written as the algebraically equivalent "blended = (1-doit)*A+doit*B;". This has four operations instead of the three above, so you don't see it as much anymore.

int A=4;

int B=6;

int blended=A+doit*(B-A);

std::cout<<"doit=="<<doit<<" -> "<<blended<<"\n";

}

There are some interesting advantages of the multiply approach to conditionals. One is you can do "fuzzy conditionals", where your code blends between A and B if doit is somewhere between 0 and 1:

for (float doit=0.0;doit<=1.0;doit+=0.25) {This outputs:

int A=4;

int B=6;

float blended=A+doit*(B-A);

std::cout<<"doit == "<<doit<<" -> "<<blended<<"\n";

}

doit == 0 -> 4Multiply-based conditionals work fine for a single digit, but they don't scale well to multiple digits. For example, "0x4444 + 0x0100*(0x6666-0x4444)" gives you "0x226644", not "0x4644".

doit == 0.25 -> 4.5

doit == 0.5 -> 5

doit == 0.75 -> 5.5

doit == 1 -> 6

Program complete. Return 0 (0x0)

for (int doit=-1;doit<=0;doit++) {This outputs:

int A=4;

int B=6;

int blended=A+(doit&(B-A));

std::cout<<"doit == "<<doit<<" -> "<<blended<<"\n";

}

doit == -1 -> 6Note that:

doit == 0 -> 4

Program complete. Return 0 (0x0)

- 0 is stored in binary as 000000 ... 0000 (all zeros). Bitwise AND with zero bits: always outputs zero.

- -1 is stored in binary as 111111 ... 1111 (all ones). Bitwise AND with one bits: works like a copy.
- The parenthesis are really important. If you forget them, the compiler treats "A+doit&(B-A)" like "(A+doit)&(B-A)", which will give you a WRONG, confusing answer!

For example:

void blend(int doit)

{

int A=0x444444;

int B=0x666666;

int blended=A+(doit&(B-A));

std::cout<<"doit == "<<std::hex<<doit<<" -> "<<blended<<"\n";

}

int foo(void) {

blend(0x000000);

blend(0xffffff);

blend(0xfff000);

blend(0xff00ff);

blend(0xf0f0f0);

blend(0xff0f00);

return 0;

}

doit == 0 -> 444444The basic idea here is we're just turning the appropriate bits on and off, to get the answer we need. Here I've printed the results in hex (base 16), but you can actually work with any group of bits you like, by chosing your mask correctly.

doit == ffffff -> 666666

doit == fff000 -> 666444

doit == ff00ff -> 664466

doit == f0f0f0 -> 646464

doit == ff0f00 -> 664644

Program complete. Return 0 (0x0)

One caveat: the subtraction based method above can give really weird values if B-A is negative:

int blended=A+(doit&(B-A));This formulation doesn't do any subtraction, so it works for any A and B:

int blended=((~doit)&A)+(doit&B);

- Bitwise XOR can be used to perform per-bit inversion. XOR by 0 does nothing, XOR by 1 flips that bit.
- Ordinary addition and subtraction actually work fine on a per-bit basis, except for the carry (or borrow) bits. You can stop carry by making sure there's at least a single zero bit as a "guard bit", and can stop borrow by making sure there's at least a single one bit as a "guard bit".

If your input values are all good data, and you can't get away with trashing even a few bits to make space for guards, you need to use the "even odd trick", where you make two overlapping passes on the data:

long A=0x4444444488888888; // if doit==0, return thisOf course, this particular value is much easier to compute with "long blended=((~doit)&A)+(doit&B);", but it illustrates how to do a subtraction if you need it. It's a total of fifteen bitwise operations, which sounds like too many, but that's for *sixteen* if-then-else operations, which are way more than one instruction each!

long B=0x6666666666666666; // if doit==1, return this

long pick_4_or_6(long doit)

{

/* first, compute even hex digits,

using the odd digits as guards against borrows */

long guards=0x1010101010101010;

long goodma=0x0f0f0f0f0f0f0f0f;

long gA=A&goodma;

long gB=(B&goodma)+guards;

long blendedE=gA+(doit&(gB-gA));

/* same thing, for odd hex digits */

long guardsO=0x101010101010101;

long goodmaO=0xf0f0f0f0f0f0f0f0;

long gAO=A&goodmaO;

long gBO=(B&goodmaO)+guardsO;

long blendedO=gAO+(doit&(gBO-gAO));

/* recombine even and odd digits to get entire answer */

return (goodma&blendedE)|(goodmaO&blendedO);

}

int foo(void) {

long doit=0xffff000ff0f000ff;

long blended=pick_4_or_6(doit);

std::cout<<"doit=="<<doit<<" -> "<<std::hex<<blended<<"\n";

return 0;

}

There's a beautiful way to do per-bitfield comparisons, where you clear out some zeros ahead of the values you want to compare, add a single one bit as a guard, and subtract the values. If the difference is zero or positive, the zeros stay zeros and can be used as a mask. If the difference is negative, the zeros get borrowed into ones right up to the guard bit, and you can use those borrows as a bitmask.

UAF's own Jim Long wrote a portable version of the Cray Bioinformatics Library using SWAR techniques. He could pack 32 nucleotides (2-bit ACGT DNA "letters") into a 64-bit "long"!