The Wisdom of Bitwise Operators
CS 301 Lecture, Dr. Lawlor
So most of the usual work you do in C/C++/Java/C# manipulates integers
or strings. For example, you'll write a simple line like:
x = y + 4;
which adds 4 to the value of y.
But the "bitwise" operators manipulate the underlying bits
of the integer. It's like the computer counts on its (32 or 64)
fingers, does the operation on those bits, and then converts back to an
integer (Except, of course, deep down the computer only knows about
bits, not integers!). For example, you might write:
x = y & 4;
which masks out all but bit 2 of y. So if y is 6, x will be 4. But if y is 8, x will be zero. Try it! (executable NetRun Link)
Bitwise Operations in C
There are actually six bitwise operators in C/C++/Java/C#:
- & AND, bitwise AND. Output bits are 1 only if both corresponding input bits are 1. This is useful to "mask out" bits you don't want, by ANDing them with zero.
As Ints
|
As Bits
|
3&5 == 1
|
0011&0101 == 0001
|
3&6 == 2
|
0011&0110 == 0010
|
3&4 == 0
|
0011&0100 == 0000
|
-
0=A&0 (AND by 0's creates 0's--used for masking)
-
A=A&~0 (AND by 1's has no effect)
-
A=A&A (AND by yourself has no effect)
- | OR, bitwise OR. Output bits are 1 if either input bit is 1. E.g., 3|5 == 7; or 011 | 101 == 111.
As Ints
|
As Bits
|
3|0 == 3
|
0011|0000 == 0011
|
3|3 == 3
|
0011|0011 == 0011
|
1|4 == 5
|
0001|0100 == 0101
|
-
A=A|0 (OR by 0's has no effect)
-
~0=A|~0 (OR by 1's creates 1's)
-
A=A|A (OR by yourself has no effect)
- ^ XOR, bitwise XOR. Output bits are 1 if either input bit is 1, but not both. E.g., 3^5 == 6; or 011 ^ 101 == 110. Note how the low bit is 0, because both input bits are 1.
As Ints
|
As Bits
|
3^5 == 6
|
0011&0101 == 0110
|
3^6 == 5
|
0011&0110 == 0101
|
3^4 == 7
|
0011&0100 == 0111
|
- A=A^0 (XOR by zeros has no effect)
- ~A = A ^ ~0 (XOR by 1's inverts all the bits)
-
0=A^A (XOR by yourself creates 0's--used in cryptography)
- ~ NOT, bitwise invert. Output bits are 1 if the
corresponding input bit is zero. E.g., ~011 ==
111....111100. (The number of leading ones depends on the size of
the machine's "int".)
As Ints
|
As Bits
|
~0 == -1
|
~...0000 == ...1111
|
- << Left shift. Makes values bigger, by shifting them
into higher-valued places. E.g., 3<<1 == 6. You can
shift by as many bits as you want; 1<<n pushes a 1 up into bit
n. You can't shift by a negative number of bits.
As Ints
|
As Bits
|
3<<0 == 3
|
0011<<0 == 0011
|
3<<1 == 6
|
0011<<1 == 0110
|
3<<2 == 12
|
0011<<2 == 1100
|
- >> Right shift. Makes values smaller, by shifting
them into lower-valued places. E.g., 6>>1 == 3. Note
the bits in the lowest places just "fall off the end" and vanish.
As Ints
|
As Bits
|
3>>0 == 3
|
0011>>0 == 0011
|
3>>1 == 1
|
0011>>1 == 0001
|
3>>2 == 0
|
0011>>2 == 0000
|
Here's the "truth table" for the 3 main binary bitwise operations: AND,
OR, and XOR. Again, these are accessible directly from all C-like languages
using '&' (ampersand or just "and"), '|' (pipe or just "or"), and
'^' (caret or just "ex-oar").
A
|
B
|
A&B (AND)
|
A|B (OR)
|
A^B (XOR)
|
0
|
0
|
0
|
0
|
0
|
0
|
1
|
0
|
1
|
1
|
1
|
0
|
0
|
1
|
1
|
1
|
1
|
1
|
1
|
0
|
Interpretations:
|
|
Both-1 |
Either-1 |
Different |
|
|
Spreads-0
|
Spreads-1
|
Flip A if B is set
|
Say you're Google. For each possible word, you store a big list
of documents containing one 32-bit integer per document. Each bit
of that integer corresponds to a section of the document where the word
might occur (e.g., bit 0 is set if the word occurs in the first 3% of
the document, bit 1 represents the next 3%, and so on). If you've
got two search terms, for a given document you can look up two
integers, A and B, that represent where those terms appear in the
document. If both terms appear near the same place in the
document, that's good--and we can find a list of all those places
(where both bits are set to 1) in just one clock cycle using just
"A&B"! The same idea shows up in the "region codes" of Cohen-Sutherland clipping, used in computer graphics.
But what if I want to compare two documents for similarity? Note
that C-like languages don't provide a "bitwise-compare" operator, that
sets equal bits to 1 and unequal bits to 0 (because == compares whole
integers, not individual bits). So how can you compute this?
Well, all the bitwise operations do interesting and useful things when inverted:
A
|
B
|
~(A&B) (NAND)
|
~(A|B) (NOR)
|
~(A^B) (XNOR)
|
0
|
0
|
1
|
1
|
1
|
0
|
1
|
1
|
0
|
0
|
1
|
0
|
1
|
0
|
0
|
1
|
1
|
0
|
0
|
1
|
Interpretations:
|
|
Either-0
|
Both-0 |
Equality
|
Note that "~(A^B)" is 1 if both input bits are identical, and 0 if
they're different--so it's perfect for computing document similarity!
The final major bitwise operation is bit shifting, represented in
C-like languages using ">>" and "<<". Left shift
("<<", pointing left), shifts bits into higher positions, filling
the now-empty lower positions with zeros. For unsigned numbers,
right shifting (">>", pointing right) shifts bits into lower
positions and fills the empties with zeros. If you wants to
search for both-term matches in B that are one bit off of those in A,
you can use "A&(B<<1)" or "A&(B>>1)", depending on
the direction you want to check. To check both directions, use
"(A&(B<<1))|(A&(B>>1))".
One warning: bitwise operations have strange precedence.
Specifically, "A&B==C" is interpreted by default like
"A&(B==C)", not "(A&B)==C"; and "A>>B+C" is interpreted
as "A>>(B+C)". This can cause horrible, hard-to-understand
errors, so I just stick in parenthesis around everything when I'm using
bitwise operators.
Applications of Bitwise Operations--Thinking in SIMD
The simplest bitwise operation is the NOT operation: NOT 0 is 1;
NOT 1 is 0. Bitwise NOT is written as "~" in
C/C++/Java/C#. The cool thing about the NOT (and other
bitwise) operations is that they operate on *all* the
bits of an integer at *once*. That means a 32-bit integer NOT
does
thirty-two separate things at the same time! This
"single-instruction,
multiple data" (SIMD) approach is a really common way to speed up
computations; it's the basis of the MMX, SSE, and AltiVec instruction
set extensions. But you can do lots interesting SIMD work without using
any special instructions--plain old C will do. This use of
bitwise operations is often called "SIMD within a register (SWAR)" or
"word-SIMD"; see Sean Anderson's "Bit Twiddling Hacks" for a variety of amazing examples.
The genome uses just 4 digits (ACGT, for the acids Adenine, Cytosine,
Guanine, and
Thymine), and so can be represented in as few as 2 bits per nucleotide
(genetic digit). Former UAF CS undergraduate James Long, now up
the hill at ARSC, developed a very fast sofware implementation of the Smith-Waterman string (or gene) matching algorithm called CBL that uses this SIMD-within-a-register approach.