Number systems, bases, and counting

CS 301 Lecture, Dr. Lawlor, 2005/09/07

First item of business: classroom change to Room 106. 106 has more desks, so hopefully now people won't have to crouch in the hallway to listen to the lecture!

What's the deal with all this hex?

Humans have used the "place/value" number system for a long time--the Sumerians used base-60 in 4000BC! (Base-60 still shows up in our measurement of time (hours have 60 minutes, which have 60 seconds) and angles (360 degrees)). The Maya used base 20. The world standard, though, is now base 10 using Arabic numerals. For example, I'm 28 = 2 * 10 + 8 years old.

Place Number:	i	...	3	2	1	0
Base-10 value	10ⁱ		1000	100	10	1
Base-2 value	2ⁱ		8	4	2	1
Base-16 value	16ⁱ		4096=16³	256=16²	16	1
Base-n	nⁱ		n³	n²	n	1

But every computer built today uses binary--1's and 0's--to do its work. The reason is electrical--0 is no current, 1 is current. Having just two states makes it easy to build reliable circuits; for example, a transistor will threshold the input value and either conduct or not conduct. A single zero or one is called a "bit".

OK, so we got 1's and 0's: how to we build bigger numbers? There are two ways: the older method, called "binary coded decimal" (BCD), was to first build decimal digits using binary, then interpret the decimal numbers in the usual way--the complication being that arithmetic is tricky in decimal, as any grade schooler can attest! Older machines, like the Motorola 68K, had hardware support for BCD computations. But the modern standard method is using "binary", which is just the place-value system using base 2. In binary, 1 means 1; 10 (base 2) means 2 (base 10); 100 (base 2) means 4 (base 10); 1000 (base 4) means 8 (base 10); 10000 (base 2) means 16 (base 10); and so on. Every machine produced today supports direct binary arithmetic.

Eight bits make a "byte" (note: it's pronounced exactly like "bite", but always spelled with a 'y'), although in some rare networking manuals (and in French) the same eight bits would be called an "octet". There are theoretically some other measurements like a four-bit "nybble", but these are quite rare, and basically just jokes. In DOS and Windows programming, 16 bits is a "word", and 32 bits is a "dword" (double word); but in other contexts "word" means the machine's natural binary processing size, which ranges from 8 to 64 bits nowadays.

Sadly, (for a human!) writing or reading binary is really painful and error-prone for large numbers. For example, one million is 11110100001001000000 (base 2), which is painful to write or read. So instead, we often use a larger base. Back in the 1970's, it was pretty common to use octal (base 8), but the modern standard is hexadecimal--base 16. Base 16's gotta use 16 different digits, and there are only 10 arabic numerals, so we use normal alphabet letters for the remaining digits. For example, 15 (base 10) is just F (base 16); an one million in hex is F4240 (base 16). You've got to be careful about the base, though--the string "11" would be interpreted as having the value 1*2+1=3 if it was base 2, the usual 1*10+1=11 if it was base 10, or 1*16+1=17 in base 16!

Note that a single digit in base 16 corresponds to exactly 4 bits, since 16=2*2*2*2. This means it's easy to convert from a binary number to a hex number: take groups of 4 bits and convert to a hex digit--or back again: take each hex digit and expand out to 4 bits.

Hex really is the only true universal in assembly and disassembly. For example, here's some random disassembled code (produced using "objdump --disassemble /bin/ls" on a Linux machine):

 80561a6:       83 ec 0c                sub    $0xc,%esp

 80561a9:       e8 ea ff ff ff          call   80561f8 <__i686.get_pc_thunk.bx>

Note that every single number is listed in hex--the addresses, on the left; the machine code, in the middle; and the constants in the assembly, on the right. A binary file display tool is called a "hex dump". A binary file editor is called a "hex editor". That's how common hex is, so for the rest of the class to make sense, you've gotta learn it!

Next class, we'll talk about all the basic logical operations using bits: bit shifts ('>>' and '<<' in C), the NOT operation ('~' in C), the AND operation ('&' in C), the OR operation ('|' in C), and the XOR operation ('^' in C).