Last class, we looked at binary and hexadecimal numbers.

Here's the "truth table" for the 3 main binary bitwise operations: AND, OR, and XOR. These are accessible directly from C-like languages using '&' (ampersand or just "and"), '|' (pipe or just "or"), and '^' (caret or just "ex-oar").

A |
B |
A&B (AND) |
A|B (OR) |
A^B (XOR) |

0 |
0 |
0 |
0 |
0 |

0 |
1 |
0 |
1 |
1 |

1 |
0 |
0 |
1 |
1 |

1 |
1 |
1 |
1 |
0 |

Interpretations: |
Both-1 | Either-1 | Different | |

Spreads-0 |
Spreads-1 |
Flips |

Say you're Google. For each possible word, you store a big list of documents containing one 32-bit integer per document. Each bit of that integer corresponds to a section of the document where the word might occur (e.g., bit 0 is set if the word occurs in the first 3% of the document, bit 1 represents the next 3%, and so on). If you've got two search terms, for a given document you can look up two integers, A and B, that represent where those terms appear in the document. If both terms appear near the same place in the document, that's good--and we can find a list of all those places (where both bits are set to 1) in just one clock cycle using just "A&B"! The same idea shows up in the "region codes" of Cohen-Sutherland clipping, used in computer graphics.

But what if I want to compare two documents for similarity? Note that C-like languages don't provide a "bitwise-compare" operator, that sets equal bits to 1 and unequal bits to 0 (because == compares whole integers, not individual bits). So how can you compute this?

Well, all the bitwise operations do interesting and useful things when inverted:

A |
B |
~(A&B) (NAND) |
~(A|B) (NOR) |
~(A^B) (XNOR) |

0 |
0 |
1 |
1 |
1 |

0 |
1 |
1 |
0 |
0 |

1 |
0 |
1 |
0 |
0 |

1 |
1 |
0 |
0 |
1 |

Interpretations: |
Either-0 |
Both-0 | Equality |

Note that "~(A^B)" is 1 if both input bits are identical, and 0 if they're different--so it's perfect for computing document similarity!

The final major bitwise operation is bit shifting, represented in C-like languages using ">>" and "<<". Left shift ("<<", pointing left), shifts bits into higher positions, filling the now-empty lower positions with zeros. For unsigned numbers, right shifting (">>", pointing right) shifts bits into lower positions and fills the empties with zeros. If you wants to search for both-term matches in B that are one bit off of those in A, you can use "A&(B<<1)" or "A&(B>>1)", depending on the direction you want to check. To check both directions, use "(A&(B<<1))|(A&(B>>1))".

One warning: bitwise operations have strange precedence. Specifically, "A&B==C" is interpreted by default like "A&(B==C)", not "(A&B)==C"; and "A>>B+C" is interpreted as "A>>(B+C)". This can cause horrible, hard-to-understand errors, so I just stick in parenthesis around everything when I'm using bitwise operators.

The genome uses just 4 digits (ACGT, for the acids Adenine, Cytosine, Guanine, and Thymine), and so can be represented in as few as 2 bits per nucleotide (genetic digit). Former UAF CS undergraduate James Long, now up the hill at ARSC, developed a very fast sofware implementation of the Smith-Waterman string (or gene) matching algorithm called CBL that uses this SIMD-within-a-register approach.

If you've got 2 possible values, you only need one bit. To chop something down to n bits, you can "mask" it by ANDing with a value where the bits less than n are set to 1, and bits n and higher are cleared to 0. For example, 0xFF has bits 8 and higher clear, so you can chop a value down to 8 bits by ANDing by 0xff. Note that this removes all powers of two at 2

Number of Values |
Number of bits |
Mask |

2 |
1 |
0x1 |

4 |
2 |
0x3 |

8 |
3 |
0x7 |

16 |
4 |
0xF |

32 |
5 |
0x1F |

64 |
6 |
0x3F |

128 |
7 |
0x7F |

256 |
8 |
0xFF |

65,536 |
16 |
0xFFFF |

4,294,967,296 |
32 |
0xFFFFFFFF |

2^{n} |
n |
2^{n}-1 |

For example, let's say you're trying to decode data written by a satellite radar sensor, which uses 4 bits each for its "I" and "Q" values (which have 16 possible values each), and stores them together in one integer "A" like this:

Name |
I |
Q |

Size |
4 bits |
4 bits |

Starting bit in A |
Bit 4 |
Bit 0 |

To extract out just the Q field, we've just got to get rid of the bits that store I somehow. The standard way to do this is to use the AND operation to set all the bits of I to zero--zero AND anything is zero, so "Q=A&0x0F". That is, we've used 0xF to mask off the bits above Q.

To extract out the I field, we've got to get rid of Q (as before), but we've also got to shift I down so it starts at bit zero. We can do this using "I=(A&0xF0)>>4". Or, because right-shift throws away bits that fall off the right end, we can also just use "I=A>>4".

To stick A back together from separate I and Q values, we just shift I and Q into the right places, then OR them together, like "A=(I<<4)|Q". Or if you really want to be persnickety, "A=(I<<4)|(Q<<0)", which is redundant because the ">>0" doesn't do anything. If you're really paranoid, you'll mask off any high bits, like "A=(0xF0&(I<<4))|(0x0F&(Q<<0))", but this isn't needed if I and Q don't have any high bits set.

This "keep a bunch of bitfields in one integer" technique is in found all over the place in machine code. In x86 machine code, the "SIB" byte stores a scale, index, and base (that we'll learn the details of later) as:

Name |
Scale |
Index | Base |

Size |
2 bits |
3 bits | 3 bits |

Starting bit |
Bit 6 |
Bit 3 | Bit 0 |

We can reassemble a SIB byte like "SIB=(Scale<<6)|(Index<<3)|Base"; and extract out the fields like "Scale=0x3&(SIB>>6)", "Index=0x7&(SIB>>3)", and "Base=0x7&SIB".

0=A&0 (AND by 0's creates 0's--used for masking)

0=A^A (XOR by yourself creates 0's--used for cryptography)

~0=A|~0 (OR by 1's creates 1's)

A=A&~0 (AND by 1's has no effect)

A=A|0 (OR by 0's has no effect)

A=A&A=A|A (AND or OR by yourself has no effect)