Bitwise Operations

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

The fact is, variables on a computer only have so many bits. If the value gets bigger than can fit in those bits, the extra bits first go negative and then "overflow". By default they're then ignored completely.

For example:

int value=1; /* value to test, starts at first (lowest) bit */
for (int bit=0;bit<100;bit++) {
	std::cout<<"at bit "<<bit<<" the value is "<<value<<"\n";
	value=value+value; /* moves over by one bit (value=value<<1 would work too) */
	if (value==0) break;
}
return 0;

(Try this in NetRun now!)

Because "int" currently has 32 bits, if you start at one, and add a variable to itself 32 times, the one overflows and is lost completely.

In assembly, there's a handy instruction "jo" (jump if overflow) to check for overflow from the previous instruction. The C++ compiler doesn't bother to use jo, though!

mov edi,1 ; loop variable
mov eax,0 ; counter

start:
	add eax,1 ; increment bit counter

	add edi,edi ; add variable to itself
	jo noes ; check for overflow in the above add

	cmp edi,0
	jne start

ret

noes: ; called for overflow
	mov eax,999
	ret

(Try this in NetRun now!)

Notice the above program returns 999 on overflow, which somebody else will need to check for. (Responding correctly to overflow is actually quite difficult--see, e.g.,Ariane 5 explosion, caused by a detected overflow.)

C++ Storage Sizes

Eight bits make a "byte" (note: it's pronounced exactly like "bite", but always spelled with a 'y'), although in some rare networking manuals (and in French) the same eight bits would be called an "octet" (hard drive sizes are in "Go", Giga-octets, when sold in French). In DOS and Windows programming, 16 bits is a "WORD", 32 bits is a "DWORD" (double word), and 64 bits is a "QWORD"; but in other contexts "word" means the machine's natural binary processing size, which ranges from 32 to 64 bits nowadays. "word" should now be considered ambiguous. Giving an actual bit count is the best approach ("The file begins with a 32-bit binary integer describing...").

Object	C++ Name	ASM Register	Bits	Bytes (8 bits)	Hex Digits (4 bits)	Unsigned Range	Signed Range
Bit	none!	none!	1	less than 1	less than 1	0..1	-1..0
BYTE, or octet	char	al	8	1	2	255	-128 .. 127
Windows WORD	short	ax	16	2	4	65535	-32768 .. +32767
Windows DWORD	int	eax	32	4	8	>4 billion	-2G .. +2G
Windows QWORD	long	rax	64	8	16	>16 quadrillion	-8Q .. +8Q

Signed versus Unsigned Numbers

If you watch closely right before overflow, you see something funny happen:

signed char value=1; /* value to test, starts at first (lowest) bit */
for (int bit=0;bit<100;bit++) {
	std::cout<<"at bit "<<bit<<" the value is "<<(long)value<<"\n";
	value=value+value; /* moves over by one bit (value=value<<1 would work too) */
	if (value==0) break;
}
return 0;

(Try this in NetRun now!)

This prints out:

at bit 0 the value is 1
at bit 1 the value is 2
at bit 2 the value is 4
at bit 3 the value is 8
at bit 4 the value is 16
at bit 5 the value is 32
at bit 6 the value is 64
at bit 7 the value is -128 
Program complete.  Return 0 (0x0)

Wait, the last bit's value is -128? Yes, it really is!

This negative high bit is called the "sign bit", and it has a negative value in two's complement signed numbers. This means to represent -1, for example, you set not only the high bit, but all the other bits as well: in unsigned, this is the largest possible value. The reason binary 11111111 represents -1 is the same reason you might choose 9999 to represent -1 on a 4-digit odometer: if you add one, you wrap around and hit zero.

A very cool thing about two's complement is addition is the same operation whether the numbers are signed or unsigned--we just interpret the result differently. Subtraction is also identical for signed and unsigned. Register names are identical in assembly for signed and unsigned. However, when you change register sizes using an instruction like "movsxd rax,eax", when you check for overflow, when you compare numbers, multiply or divide, or shift bits, you need to know if the number is signed (has a sign bit) or unsigned (no sign bit, no negative numbers).

Signed	Unsigned	Language
int	unsigned int	C++, int is signed by default.
signed char	unsigned char	C++, char may be signed or unsigned.
movsxd	movzxd	Assembly, sign extend or zero extend to change register sizes.
jo	jc	Assembly, overflow is calculated for signed values, carry for unsigned values.
jg	ja	Assembly, jump greater is signed, jump above is unsigned.
jl	jb	Assembly, jump less signed, jump below unsigned.
imul	mul	Assembly, imul is signed (and more modern), mul is for unsigned (and ancient and horrible!). idiv/div work similarly.

Bitwise Operators

There are a whole group of "bitwise" operators that operate on bits.

Name	C++	x86	Useful to...
AND	&	and	mask out bits (set other bits to zero)
OR	\|	or	reassemble bit fields
XOR	^	xor	invert selected bits
NOT	~	not	invert all the bits in a number
Left shift	<<	shl	makes numbers bigger by shifting their bits to higher places
Right shift	>>	shr sar	makes numbers smaller by shifting their bits to lower places. sar (arithmetic, signed shift) works for negative numbers.

If you'd like to see the bits inside a number, you can loop over the bits and use AND to extract each bit:

int i=9; // 9 == 8 + 1 == 1001

for (long bit=31;bit>=0;bit--) { // print each bit
	long mask=(1L<<bit); // only this bit is set
	long biti=mask&i; // extract this bit from i
	if (biti!=0) std::cout<<"1";
	else         std::cout<<"0";
	if (bit==0)  std::cout<<" integer\n";
}

(Try this in NetRun now!)

Because binary is almost perfectly unreadable (was that 1000000000000000 or 10000000000000000?), we normally use hexadecimal, base 16.

Decimal	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15	16
Hex	1	2	3	4	5	6	7	8	9	A	B	C	D	E	F	10
Binary	1	10	11	100	101	110	111	1000	1001	1010	1011	1100	1101	1110	1111	10000

Remember that every hex digit represents four bits. So if you shift a hex constant by four bits, it shifts by one entire hex digit:

0xf0d<<4 == 0xf0d0
0xf0d>>4 == 0xf0

If you shift a hex constant by a non-multiple of four bits, you end up interleaving the hex digits of the constant, which is confusing:

0xf0>>2 == 0x3C (?)

Bitwise operators make perfect sense working with hex digits, because they operate on the underlying bits of those digits:
    0xff0 & 0x0ff == 0x0f0
    0xff0 | 0x0ff == 0xfff
    0xff0 ^ 0x0ff == 0xf0f

You can use these bitwise operators to peel off the hex digits of a number, to print out stuff in hex.

int v=1024+15;
for (int digit=7;digit>=0;digit--) {
	char *digitTable="0123456789abcdef";
	int d=(v>>(digit*4))&0xF;
	std::cout<<digitTable[d];
}
std::cout<<std::endl;
return v;

(Try this in NetRun now!)

You could also use printf("%X",v);

Bitwise Left Shift: <<

Makes values bigger, by shifting the value's bits into higher places, tacking on zeros in the vacated lower places.

As Ints	As Bits
3<<0 == 3	0011<<0 == 0011
3<<1 == 6	0011<<1 == 0110
3<<2 == 12	0011<<2 == 1100

Interesting facts about left shift:

1<<n pushes a 1 up into bit number n, creating the bit pattern 1 followed by n zeros.
The value of (k<<n) is actually k*2ⁿ. This means bit shifting can be used as a faster multiply by a power of two.
k<<0 == k, for any k.
(k<<n) >= k, for any n and k (unless you have "overflow"!).
On a 32-bit machine, (k<<32) == 0, plus a compiler warning, because all the bits of k have overflowed away.
Left shift always shifts in fresh new zero bits.
You can left shift by as many bits as you want.
You can't left shift by a negative number of bits.

In C++, the << operator is also overloaded for iostream output. This was a confusing choice, in particular because "cout<<3<<0;" just prints 3, then 0! To actually print the value of "3<<0", you need parenthesis, like this: "cout<<(3<<0);". Operator precedence is screwy for bitwise operators, so you really want to use excess parenthesis!

In assembly:

shl is "shift left". Use it like "shl eax,4" (Try this in NetRun now!). Note that the '4' can be a constant, or register cl (low bits of ecx), but not any other register (Try this in NetRun now!).
sal is the same instruction (same machine code).
There's also a "rol" that does a circular left shift: the bits leaving the left side come back in the right side.

Bitwise Right Shift: >>

Makes values smaller, by shifting them into lower-valued places. Note the bits in the lowest places just "fall off the end" and vanish.

As Ints	As Bits
3>>0 == 3	0011>>0 == 0011
3>>1 == 1	0011>>1 == 0001
3>>2 == 0	0011>>2 == 0000
6>>1 == 3	0110>>1 == 0011

Interesting facts about right shift:

The value of (k>>n) is actually k/2ⁿ. This can be used as a faster divide.
(k<<n)>>n == k, unless overflow has happened.
On a 32-bit machine, (k>>32) == 0, plus a compiler warning, because all the bits of k have fallen off the end.
There are two flavors of right shift: signed, and unsigned. Unsigned shift fills in the new bits with zeros. Signed shift fills the new bits with copies of the sign bit, so negative numbers stay negative after the shift.

If you're dyslexic, like me, the left shift << and right shift >> can be really tricky to tell apart. I always remember it like this:

k<<n pumps up the value of k (the point of the << is injecting bigness into k)
k>>n drains away the value of k (the point of the >> is draining bigness from k)

In assembly:

shr is the unsigned shift.
sar is the signed (or "arithmetic") shift.
Again, there's a circular right shift "ror".

Bitwise AND: &

Output bits are 1 only if both corresponding input bits are 1. This is useful to "mask out" bits you don't want, by ANDing them with zero.

As Ints	As Bits
3&5 == 1	0011&0101 == 0001
3&6 == 2	0011&0110 == 0010
3&4 == 0	0011&0100 == 0000

Properties:

0=A&0 (AND by 0's creates 0's--used for masking)
A=A&~0 (AND by 1's has no effect)
A=A&A (AND by yourself has no effect)

Bitwise AND is a really really useful tool for extracting bits from a number--you often create a "mask" value with 1's marking the bits you want, and AND by the mask. For example, this code figures out if bit 2 of an integer is set:
    int mask=(1<<2); // in binary: 100
    int value=...;           // in binary: xyz
    if (0!=(mask&value)) // in binary: x00
       ...

In C/C++, bitwise AND has the wrong precedence--leaving out the parenthesis in the comparison above gives the wrong answer! Be sure to use extra parenthesis!

In assembly, it's the "and" instruction. Very simple!

Bitwise OR: |

Output bits are 1 if either input bit is 1. E.g., 3|5 == 7; or 011 | 101 == 111.

As Ints	As Bits
3\|0 == 3	0011\|0000 == 0011
3\|3 == 3	0011\|0011 == 0011
1\|4 == 5	0001\|0100 == 0101

A=A|0 (OR by 0's has no effect)
~0=A|~0 (OR by 1's creates 1's)
A=A|A (OR by yourself has no effect)

Bitwise OR is useful for sticking together bit fields you've prepared separately. Overall, you use AND to pick apart an integer's values, XOR and NOT to manipulate them, and finally OR to assemble them back together.

Bitwise XOR: ^

Output bits are 1 if either input bit is 1, but not both. E.g., 3^5 == 6; or 011 ^ 101 == 110. Note how the low bit is 0, because both input bits are 1.

As Ints	As Bits
3^5 == 6	0011&0101 == 0110
3^6 == 5	0011&0110 == 0101
3^4 == 7	0011&0100 == 0111

A=A^0 (XOR by zeros has no effect)
~A = A ^ ~0 (XOR by 1's inverts all the bits)
0=A^A (XOR by yourself creates 0's--used in cryptography)

The second property, that XOR by 1 inverts the value, is useful for flipping a set of bits. Generally, XOR is used for equality testing (a^b!=0 means a!=b), controlled bitwise inversion, and crypto.

Bitwise NOT: ~

Output bits are 1 if the corresponding input bit is zero. E.g., ~011 == 111....111100. (The number of leading ones depends on the size of the machine's "int".)

As Ints	As Bits
~0 == big value	~...0000 == ...1111

I don't use bitwise NOT very often, but it's handy for making an integer whose bits are all 1: ~0 is all-ones.

Non-bitwise Logical Operators

Note that the logical operators &&, ||, and ! work exactly the same as the bitwise values, but for exactly one bit. Internally, these operators map multi-bit values to a single bit by treating zero as a zero bit, and nonzero values as a one bit. So
(2&&4) == 1 (because both 2 and 4 are nonzero)
(2&4) == 0 (because 2==0010 and 4 == 0100 don't have any overlapping one bits).

As Ints	As Bits
3\|0 == 3	0011\|0000 == 0011
3\|3 == 3	0011\|0011 == 0011
1\|4 == 5	0001\|0100 == 0101