# Bits in an Integer, and Hex Conversions

CS 301 Lecture, Dr. Lawlor

## Storage Sizes

Eight bits make a "byte" (note: it's pronounced exactly like "bite", but always spelled with a 'y'), although in some rare networking manuals (and in French) the same eight bits would be called an "octet" (hard drive sizes are in "Go", Giga-octets, when sold in French).  There are some rarely-used, quasi-joke measurements like a four-bit "nybble", but these are quite rare, and basically just jokes.   In DOS and Windows programming, 16 bits is a "WORD", 32 bits is a "DWORD" (double word), and 64 bits is a "QWORD"; but in other contexts "word" means the machine's natural binary processing size, which ranges from 32 to 64 bits nowadays.  "word" should now be considered ambiguous.  Giving an actual bit count is the best approach ("The file begins with a 32-bit binary integer describing...").

 Object Unsigned Range Signed Range Bits Hex Digits (4 bits) Bytes (8 bits) Octal Digits (3 bits) Bit 0..1 -1..0 1 less than 1 less than 1 less than 1 Byte, "char" 255 -128 .. 127 8 2 1 two and two thirds "short" (or Windows WORD) 65535 -32768 .. +32767 16 4 2 five and one third "int" (Windows DWORD) >4 billion -2G .. +2G 32 8 4 ten and two thirds "long" (or "long long" on some machines) >16 quadrillion -8Q .. +8Q 64 16 8 twenty-one and one-third

You can determine the usable range of a value by experimentally measure the overflow point, for example with code like this:
`int value=1; /* value to test, starts at first (lowest) bit */for (int bit=0;bit<100;bit++) {	std::cout<<"at bit "<<bit<<" the value is "<<value<<"\n";	value=value+value; /* moves over by one bit (value=value<<1 would work too) */}return 0;`

(Try this in NetRun now!)

## Base Conversion

It's pretty easy to write code to extract the bits of an integer, and print them out.
`/* Print the low 32 bits of this number */void print_binary(int v) {	for (int bit=31;bit>=0;bit--) {		if ((v&(1<<bit))!=0) {			std::cout<<"1";		} else {			std::cout<<"0";		}	}	std::cout<<" binary\n";}int foo(void) {	print_binary(6);	return 0;}`

It's also pretty easy to stick together a set of bits (say, in an array) into an integer:
`const int bits[]={1,1,0};int nbits=sizeof(bits)/sizeof(bits[0]); /*<- funky trick to find size of static array!*/int value=0;for (int bit=0;bit<nbits;bit++) {	if (bits[nbits-1-bit]) /*<- gotta index the bit array starting at the end */		value = value | (1<<bit);}return value;`

Or you can do the same binary-to-integer conversion using digits from cin:

`void print_binary(int v) {	for (int i=0;i<32;i++)	{		int mask=1<<(32-1-i);		if ((v&mask)==0) std::cout<<"0";		else std::cout<<"1";	}	std::cout<<endl;}int read_binary(void) {	int v=0;	while (std::cin) {		char c='?'; 		std::cin>>c;		//std::cout<<"I just read the character '"<<c<<"'."<<endl;		if (c=='0') { 			/* zero digit in that place */			v=v<<1; 		} else if (c=='1') {			/* one digit in that place */			v=(v<<1)+1;		}		else {			break;		}	}	return v;}int foo(void) {	return read_binary();}`

(Try this in NetRun now!)

## What's the deal with all this hex?

Humans have used the "place/value" number system for a long time--the Sumerians used base-60 in 4000BC! (Base-60 still shows up in our measurement of time and angles: hours have 60 minutes, which have 60 seconds, degrees also have seconds, and the circle is divided into six sections of 60 degrees each.  The Maya used base 20.  The world standard, though, is now base 10 using Arabic numerals.  For example, I'm 34 = 3 * 10 + 4 years old.

But every computer built today uses binary--1's and 0's--to do its work.  The reason is electrical--0 is no voltage, 1 is voltage.  Having just two states makes it easy to build cheap and reliable circuits; for example, a transistor will threshold the input value and either conduct or not conduct.  A single zero or one is called a "bit".

OK, so we got 1's and 0's: how to we build bigger numbers?  The modern standard method is using "binary", which is just the place-value system using base 2.  In binary, 1 means 1; 10 (base 2) means 2 (base 10); 100 (base 2) means 4 (base 10); 1000 (base 4) means 8 (base 10);  10000 (base 2) means 16 (base 10); and so on.  Every machine produced today supports direct binary arithmetic. Sadly, for a human writing or reading binary is really painful and error-prone for large numbers.  For example, one million is 11110100001001000000 (base 2), which is painful to write or read.  So instead, we often use a larger base.

Back in the 1970's, it was pretty common to use octal (base 8), but the modern standard is hexadecimal--base 16.  Base 16's gotta use 16 different digits, and there are only 10 arabic numerals, so we use normal alphabet letters for the remaining digits.  For example, 15 (base 10) is just F (base 16); an one million in hex is F4240 (base 16).  You've got to be careful about the base, though--the string "11" would be interpreted as having the value 1*2+1=3 if it was base 2, the usual 1*10+1=11 if it was base 10, or 1*16+1=17 in base 16!

 Place/bit Number i ... 4 3 2 1 0 Decimal: Base-10 10i ... 10000 1000 100 10 1 Binary: Base-2 2i ... 16 = 24 8 = 23 4 = 22 2 1 Octal: Base-8 8i ... 4096=84 512=83 64=82 8 1 Hex: Base-16 16i ... 65536 = 2 4096 = 163 256 = 162 16 1 Base-n ni ... n4 n3 n2 n 1 = n0

Number bases used throughout time:
• Base 0: not counting at all.  (only representable number is zero)
• Base 1: counting on your fingers.  (inefficient use of fingers, though...)
• Base 2: binary.  Used by all known computers.
• Base 8: octal.  Popular back in 1970s, 1980s.  In C++, prefix the number with zero to make it octal, like "070" (==7*8 = 56).
• Base 10: decimal.  Matches number of human fingers.  In C++, numbers are decimal by default.
• Base 16: hexadecimal.  Fits nicely with bytes.  In C++, prefix the number with "0x" (zero-x) to make it hex, like "0xF3".
• Base 20, used by Mayans.  Matches number of human fingers and toes.
• Base 60, used by Sumerians six thousand years ago.
 Decimal Hex Binary 0 0 0 1 1 1 2 2 10 3 3 11 4 4 100 5 5 101 6 6 110 7 7 111 8 8 1000 9 9 1001 10 A 1010 11 B 1011 12 C 1100 13 D 1101 14 E 1110 15 F 1111 16 10 10000

Note that a single digit in base 16 corresponds to exactly 4 bits, since 16=2*2*2*2.  This means it's easy to convert from a binary number to a hex number: take groups of 4 bits and convert to a hex digit--or back again: take each hex digit and expand out to the 4 bits it represents.  For example, 0xF036 is, in binary, 1111000000110100, because you can match up the place-values like this:
 Hex Place-Value 163 162 16 1=160 Hex Digit F 0 3 6 Binary Digit 1 1 1 1 0 0 0 0 0 0 1 1 0 1 1 0 Binary Place-Value 215 214 213 212 211 210 29 28 27 26 25 24 23 22 21 20
Converting 0xF036 (hex) to 1111000000110110 (binary)

Hex really is the only true universal in assembly and disassembly.  For example, here's some random disassembled code (produced using "objdump -drC -M intel /bin/ls" on a Linux machine):
` 804a516:       80 fa 3f                cmp    dl,0x3f 804a519:       0f 84 7c 01 00 00       je     804a69b <exit@plt+0xc2b>`
Note that every single number is listed in hex--the addresses, on the left; the machine code, in the middle; and the constants in the assembly, on the right.  A binary file display tool is called a "hex dump".  A binary file editor is called a "hex editor".  That's how common hex is, so for the rest of the class to make sense, you've gotta learn it!

## Hex & Bitwise Operations

Remember that every hex digit represents four bits.  So if you shift a hex constant by four bits, it shifts by one entire hex digit:
0xf0d<<4 == 0xf0d0
0xf0d>>4 == 0xf0

Bitwise operators make perfect sense working with hex digits, because they operate on the underlying bits of those digits:
0xff0 & 0x0ff == 0x0f0
0xff0 | 0x0ff == 0xfff
0xff0 ^ 0x0ff == 0xf0f

You can use these bitwise operators to peel off the hex digits of a number, to print out stuff in hex:
`int v=1024+15;for (int digit=7;digit>=0;digit--) {	char *digitTable="0123456789abcdef";	int d=(v>>(digit*4))&0xF;	std::cout<<digitTable[d];}std::cout<<std::endl;return v;`

(Try this in NetRun now!)

## Arithmetic In Binary, and Signed Numbers

We can do arithmetic in binary or hex by hand just like we do in decimal.  To add, line up the digits, add corresponding digits, and if the per-digit sum is greater than the base, carry to the next digit.  Easy, right?  To subtract, do the exact same thing, but "borrow" from the next digit if the per-digit difference is less than 0.

For example, in binary 01+01=10 because the first digit overflows, and the extra "1" is carried to the next digit.  Similarly, 1111+1=10000 because all the digits overflow in sequence.  In general, adding a "1" will cause carries to ripple up through ones, flipping them to zeros, until it finally reaches a zero, which it will flip to a one.

Addition in hex is exactly the same--it's a bit tricker to add the larger digits involved, but carries work exactly the same.  So  0x8+0x8=0x10, and 0xffff+0x1=0x10000.

Subtraction in binary seems really bizarre until you remember than 10-1=1 in binary--that is, "1" is the one-less-than-zero digit in binary; just like "9" is the corresponding one-below-zero digit in decimal.  So when you have to borrow, you just flip the zeros to ones until you hit a one.

Subtraction actually shows you how to represent negative numbers in binary.  Consider -1==0-1: you just keep on flipping zeros to ones forever, "borrowing" against the future that never comes (just like the US banking system!).  So -1 is represented (in principle) by an infinite number of ones, just like +1 is represented (in principle) by an infinite number of zeros followed by a one.  In practice, we only store a finite number (usually 32 or 64) of bits, so on a 32-bit machine we define 32 ones as -1.  The highest bit, then, can be thought of as a "sign bit"--negative numbers have that bit set, positive numbers have that bit clear.  Suprisingly, it make sense to think of the sign bit in digit n as having value -2n, instead of value 2n for unsigned numbers.  What's weirder is that addition and subtraction are exactly the same for signed or unsigned numbers--try it!  (All other operations are different, though: comparison, multiplication, division, etc.)

Signed versus unsigned numbers are clearer if you draw them out for a 3-bit machine:
 Bit Pattern Unsigned Value Signed Value 000 0 0 001 1 1 010 2 2 011 3 3 100 4 -4 101 5 -3 110 6 -2 111 7 -1

Note that everything but Java supports "signed" as well as "unsigned" integers explicitly.  In C/C++/C#, typecasting to an unsigned value does nothing to the bits, and is hence a zero-clock operation.  One way to speed up a "zero to n" bounds test on a signed value is to turn it into an unsigned value and just compare against n--that is, "if ((x>=0)&&(x<n)) ..." is the same as "if (((unsigned int)x)<n) ..." do exactly the same thing, because negative signed x's will turn into really huge unsigned x's!

In hex, -1 (in binary, all ones) is written in hex as 0xFFffFFff (on a 32-bit machine).

Another way to think about -1 is that it's like "99999" on a car's odometer--if you drive one more mile, you'll be at "00000", so you're really at mile -1!