Hex, bases, and overflow
CS 301 Lecture, Dr. Lawlor
What's the deal with all this hex?
Humans have used the "place/value" number system for a long timethe Sumerians
used base60 in 4000BC! (Base60 still shows up in our measurement of
time (hours have 60 minutes, which have 60 seconds) and angles (360
degrees)). The Maya used base 20. The world standard, though, is now base 10 using Arabic numerals. For example, I'm 30 = 3 * 10 + 0 years old.
But every computer built today uses binary1's and 0'sto do its
work. The reason is electrical0 is no voltage, 1 is
voltage. Having just two states makes it easy to build reliable
circuits; for example, a transistor will threshold the input value and
either conduct or not conduct. A single zero or one is called a
"bit".
OK, so we got 1's and 0's: how to we build bigger numbers? The modern standard
method is using "binary", which is just the placevalue system using
base 2. In binary, 1 means 1; 10 (base 2) means 2 (base 10); 100
(base 2) means 4 (base 10); 1000 (base 4) means 8 (base 10);
10000 (base 2) means 16 (base 10); and so on. Every machine
produced today supports direct binary arithmetic.
Decimal: Base10

10^{i}

...

10000

1000

100

10

1

Binary: Base2

2^{i} 
...

16 = 2^{4}

8 = 2^{3}

4 = 2^{2}

2

1

Hex: Base16

16^{i} 
...

65536 = 2

4096 = 16^{3}

256 = 16^{2}

16

1

Basen

n^{i} 
...

n^{4}

n^{3} 
n^{2} 
n

1 = n^{0}

Place/bit Number 
i

...

4

3

2

1

0

Sadly, (for a human!) writing or reading binary is really painful and
errorprone for large numbers. For example, one million is
11110100001001000000 (base 2), which is painful to write or read.
So instead, we often use a larger base. Back in the 1970's, it
was pretty common to use octal (base 8), but the modern standard is
hexadecimalbase 16. Base 16's gotta use 16 different digits,
and there are only 10 arabic numerals, so we use normal alphabet
letters for the remaining digits. For example, 15 (base 10) is
just F (base 16); an one million in hex is F4240 (base 16).
You've got to be careful about the base, thoughthe string "11" would
be interpreted as having the value 1*2+1=3 if it was base 2, the usual
1*10+1=11 if it was base 10, or 1*16+1=17 in base 16!
Decimal

Hex 
Binary

0

0 
0

1

1 
1

2

2 
10

3

3 
11

4

4 
100

5

5 
101

6

6 
110

7

7 
111

8

8 
1000

9

9 
1001

10

A 
1010

11

B 
1011

12

C 
1100

13

D 
1101

14

E 
1110

15

F 
1111

16

10

10000

Hexadecimaltodecimal and binary table.
Note that a single digit in base 16 corresponds to exactly 4 bits,
since 16=2*2*2*2. This means it's easy to convert from a binary
number to a hex number: take groups of 4 bits and convert to a hex
digitor back again: take each hex digit and expand out to the 4
bits it represents.
Hex really is the only true universal in assembly and
disassembly. For example, here's some random disassembled code
(produced using "objdump disassemble /bin/ls" on a Linux machine):
80561a6: 83 ec 0c sub $0xc,%esp
80561a9: e8 ea ff ff ff call 80561f8 <__i686.get_pc_thunk.bx>
Note that every single number
is listed in hexthe addresses, on the left; the machine code, in the
middle; and the constants in the assembly, on the right. A binary
file display tool is called a "hex dump". A binary file editor is
called a "hex editor". That's how common hex is, so for the rest
of the class to make sense, you've gotta learn it!
Base Conversion
It's pretty easy to write code to extract the bits of an integer, and print them out.
/* Print the low 32 bits of this number */
void print_binary(int v)
{
for (int bit=31;bit>=0;bit) {
if ((v&(1<<bit))!=0) {
std::cout<<"1";
} else {
std::cout<<"0";
}
}
std::cout<<" binary\n";
}
int foo(void) {
print_binary(6);
return 0;
}
(executable NetRun link)
Storage Sizes
Eight bits make a "byte" (note: it's pronounced exactly like "bite",
but always spelled with a 'y'), although in some rare networking
manuals (and in French) the same eight bits would be called an
"octet" (hard drive sizes are in "Go", Gigaoctets, when sold in French).
There are some rarelyused, quasijoke measurements like a
fourbit "nybble", but these are quite rare, and basically just jokes.
In DOS and Windows programming, 16 bits is a "WORD", 32 bits is
a
"DWORD" (double word), and 64 bits is a "QWORD"; but in other contexts
"word" means the machine's
natural binary processing size, which ranges from 32 to 64 bits
nowadays. "word" should be considered ambiguous. Giving an
actual bit count is the best approach ("The file begins with a 32bit
binary integer describing...").
Object

Bits

Bytes

Hex Digits (4 bits each)

Bit

1

less than 1

less than 1

Byte, char

8

1

2

"short" (or Windows WORD)

16

2

4

"int" (Windows DWORD)

32

4

8

"long" (or sometimes "long long")

64

8

16

Arithmetic In Binary
We can do arithmetic in binary or hex by hand just like we do in
decimal. To add, line up the digits, add corresponding digits, and if
the perdigit sum is greater than the base, carry to the next digit. Easy,
right? To subtract, do the exact same thing, but "borrow" from
the next digit if the perdigit difference is less than 0.
For example, in binary 01+01=10 because the first digit overflows, and
the extra "1" is carried to the next digit. Similarly,
1111+1=10000 because all the digits overflow in sequence. In
general, adding a "1" will cause carries to ripple up through ones,
flipping them to zeros, until it finally reaches a zero, which it will
flip to a one.
Addition in hex is exactly the sameit's a bit tricker to add the
larger digits involved, but carries work exactly the same.
So 0x8+0x8=0x10, and 0xffff+0x1=0x10000.
Subtraction in binary seems really bizarre until you remember than
101=1 in binarythat is, "1" is the onelessthanzero digit in
binary; just like "9" is the corresponding onebelowzero digit in decimal. So when
you have to borrow, you just flip the zeros to ones until you hit a
one.
Subtraction actually shows you how to represent negative numbers in
binary. Consider 1==01: you just keep on flipping zeros to ones
forever, "borrowing" against the future that never comes (just like a credit card!). So 1
is represented (in principle) by an infinite number of ones, just like
+1 is represented (in principle) by an infinite number of zeros
followed by a one. In practice, we only store a finite number
(usually 32 or 64) of bits, so on a 32bit machine we define 32 ones as
1. The highest bit, then, can be thought of as a "sign
bit"negative numbers have that bit set, positive numbers have that
bit clear. Suprisingly, it make sense to think of the sign bit in
digit n as having value 2^{n}, instead of value 2^{n} for unsigned numbers. What's weirder is that addition and subtraction are exactly the same
for signed or unsigned numberstry it! (All other operations are
different, though: comparison, multiplication, division, etc.)
Signed versus unsigned numbers are clearer if you draw them out for a 3bit machine:
Bit Pattern

Unsigned Value

Signed Value

000

0

0

001

1

1

010

2

2

011

3

3

100

4

4

101

5

3

110

6

2

111

7

1

Note that everything but Java supports "signed" as well as "unsigned"
integers explicitly. In C/C++/C#, typecasting to an unsigned
value does nothing to the bits, and is hence a zeroclock
operation. One way to speed up a "zero to n" bounds test on a
signed value is to turn it into an unsigned value and just compare
against nthat is, "if ((x>=0)&&(x<n)) ..." is the same
as "if (((unsigned int)x)<n) ..." do exactly the same thing, because
negative signed x's will turn into really huge unsigned x's!
Arithmetic Version of Bitwise Operations
Think about what "A<<1" does to the bits of an integer: what was
in the 1's place is now in the 2's place, 2's are shifted to 4's place,
4 is shifted to 8's place, etc. That is, all the bits are now in
places worth twice as much as before, so the overall integer is worth
twice as much as before: "(A<<1) == (A*2)". In general,
leftshifting by n bits is a fast way to multiply by 2^{n}.
Correspondingly, "A>>1" pushes everything into lowervalued
places, and drops the lowest digit. This means it's equivalent to
division: "(A>>1) == (A/2)". In general, rightshifting by
n bits is a fast way to divide by 2^{n} (and ignore the remainder).
You might think right shifting a negative number would bring in
zeros,
hence flipping it into a positive number, but in C/C++/C# the
">>" operator actually does the correct thingit brings in zeros
for unsigned values, and brings in the sign bit for signed
values. This means "x>>1" will turn +10 (1010 binary) into
5 (101 binary), but 10 (...1111110110) into 5 (...1111111011), just
like it should. Java doesn't have an "unsigned" data type, but it
does
have an arithmetic rightshift called ">>>" that brings in
zeros; ">>" assumes signed by default.
Finally, "A&0x1" drops all digits higher than the 1's place, so the
result is either 0 or 1 corresponding to whether A was even or odd;
that is, it's the modulo2 operation (remainder after dividing by 2),
so "(A&0x1)==(A%2)". Similarly, "A&0x3" gives a result
from 03, corresponding to the version of A where we've dropped
corresponding the higher binary digits, so
"(A&0x3)==A%4". So masking is a fast way to do modulus
operation.
All three of these can be used to speed up arithmetic: multiplying is
pretty fast on modern machines, but it's not (quite) as fast as bit
shifting. Division and modulus operations are really quite slow
on most machines, so replacing them by bitshifts can often give a 20x
speedup! Keep in mind, though, that the compiler also knows about
these tricksso if you write "A/4" somewhere, the compiler will almost
always use the bit shift instead of the divide. The only place
the compiler won't help you is when it doesn't know what you're
dividing byso if you write "A/B", even if B always turns out to be a
power of two, the compiler will use the slow divide instruction instead
of a fast bit shift. This means you may have to keep track
of the corresponding bit shift (logbase2 of B) or bit mask (for B a
power of two, it's just B1) manually, which is a pain, but sometimes
speed hurts!
Once place where bitstyle arithmetic operations really shine is with
packed fields. Let's say you've got two packed ARGB colors A and
B (i.e., four 8bit fields) that you want to averageblend together at
50% intensity. The usual way to do this would be to extract the
alpha, red, green, and blue fields (using masking and bit shifts),
blend them by adding them together and dividing by two, and reassemble
them using more bit shifts and OR operationsa total of like
4*(2+2+2)=24 operations, including four divides. But it's way
faster to do this:
int blend=((A>>1)&0x7f7f7f7f)+((B>>1)&0x7f7f7f7f);
That is, we divide all
the fields of both inputs by 2 using a bit shift, mask off space for
the (overflow or carry) in the high bit, and add. This gives you
a fourcolor blend operation for the cost of 5 cheap singleclock bit
operations! Michael Herf has a great page on a similar trick for weighted blending.