Number systems, bases, and counting
CS 301 Lecture, Dr. Lawlor, 2005/09/07
First item of business: classroom change to Room 106. 106 has
more desks, so hopefully now people won't have to crouch in the hallway
to listen to the lecture!
What's the deal with all this hex?
Humans have used the "place/value" number system for a long time--the Sumerians
used base-60 in 4000BC! (Base-60 still shows up in our measurement of
time (hours have 60 minutes, which have 60 seconds) and angles (360
degrees)). The Maya used base 20. The world standard, though, is now base 10 using Arabic numerals. For example, I'm 28 = 2 * 10 + 8 years old.
Place Number:
|
i
|
...
|
3
|
2
|
1
|
0
|
Base-10 value
|
10i
|
|
1000
|
100
|
10
|
1
|
Base-2 value
|
2i |
|
8
|
4
|
2
|
1
|
Base-16 value
|
16i |
|
4096=163
|
256=162
|
16
|
1
|
Base-n
|
ni |
|
n3 |
n2 |
n
|
1
|
But every computer built today uses binary--1's and 0's--to do its
work. The reason is electrical--0 is no current, 1 is
current. Having just two states makes it easy to build reliable
circuits; for example, a transistor will threshold the input value and
either conduct or not conduct. A single zero or one is called a
"bit".
OK, so we got 1's and 0's: how to we build bigger numbers? There
are two ways: the older method, called "binary coded decimal" (BCD),
was to first build decimal digits using binary, then interpret the
decimal numbers in the usual way--the complication being that
arithmetic is tricky in decimal, as any grade schooler can
attest! Older machines, like the Motorola 68K, had hardware
support for BCD computations. But the modern standard
method is using "binary", which is just the place-value system using
base 2. In binary, 1 means 1; 10 (base 2) means 2 (base 10); 100
(base 2) means 4 (base 10); 1000 (base 4) means 8 (base 10);
10000 (base 2) means 16 (base 10); and so on. Every machine
produced today supports direct binary arithmetic.
Eight bits make a "byte" (note: it's pronounced exactly like "bite",
but always spelled with a 'y'), although in some rare networking
manuals (and in French) the same eight bits would be called an
"octet". There are theoretically some other measurements like a
four-bit "nybble", but these are quite rare, and basically just jokes.
In DOS and Windows programming, 16 bits is a "word", and 32 bits is a
"dword" (double word); but in other contexts "word" means the machine's
natural binary processing size, which ranges from 8 to 64 bits nowadays.
Sadly, (for a human!) writing or reading binary is really painful and
error-prone for large numbers. For example, one million is
11110100001001000000 (base 2), which is painful to write or read.
So instead, we often use a larger base. Back in the 1970's, it
was pretty common to use octal (base 8), but the modern standard is
hexadecimal--base 16. Base 16's gotta use 16 different digits,
and there are only 10 arabic numerals, so we use normal alphabet
letters for the remaining digits. For example, 15 (base 10) is
just F (base 16); an one million in hex is F4240 (base 16).
You've got to be careful about the base, though--the string "11" would
be interpreted as having the value 1*2+1=3 if it was base 2, the usual
1*10+1=11 if it was base 10, or 1*16+1=17 in base 16!
Note that a single digit in base 16 corresponds to exactly 4 bits,
since 16=2*2*2*2. This means it's easy to convert from a binary
number to a hex number: take groups of 4 bits and convert to a hex
digit--or back again: take each hex digit and expand out to 4
bits.
Hex really is the only true universal in assembly and
disassembly. For example, here's some random disassembled code
(produced using "objdump --disassemble /bin/ls" on a Linux machine):
80561a6: 83 ec 0c sub $0xc,%esp
80561a9: e8 ea ff ff ff call 80561f8 <__i686.get_pc_thunk.bx>
Note that every single number
is listed in hex--the addresses, on the left; the machine code, in the
middle; and the constants in the assembly, on the right. A binary
file display tool is called a "hex dump". A binary file editor is
called a "hex editor". That's how common hex is, so for the rest
of the class to make sense, you've gotta learn it!
Next class, we'll talk about all the basic logical operations using
bits: bit shifts ('>>' and '<<' in C), the NOT operation
('~' in C), the AND operation ('&' in C), the OR operation ('|' in
C), and the XOR operation ('^' in C).