Signed vs Unsigned Numbers

CS 301 Lecture, Dr. Lawlor

We can do arithmetic in binary or hex by hand just like we do in decimal. To add, line up the digits, add corresponding digits, and if the per-digit sum is greater than the base, carry to the next digit. Easy, right? To subtract, do the exact same thing, but "borrow" from the next digit if the per-digit difference is less than 0.

For example, in binary 01+01=10 because the first digit overflows, and the extra "1" is carried to the next digit. Similarly, 1111+1=10000 because all the digits overflow in sequence. In general, adding a "1" will cause carries to ripple up through ones, flipping them to zeros, until it finally reaches a zero, which it will flip to a one.

Addition in hex is exactly the same--it's a bit tricker to add the larger digits involved, but carries work exactly the same. So 0x8+0x8=0x10, and 0xffff+0x1=0x10000.

Subtraction in binary seems really bizarre until you remember than 10-1=1 in binary--that is, "1" is the one-less-than-zero digit in binary; just like "9" is the corresponding one-below-zero digit in decimal. So when you have to borrow, you just flip the zeros to ones until you hit a one.

Subtraction actually shows you how to represent negative numbers in binary. Consider -1==0-1: you just keep on flipping zeros to ones forever, "borrowing" against the future that never comes (just like a credit card!). So -1 is represented (in principle) by an infinite number of ones, just like +1 is represented (in principle) by an infinite number of zeros followed by a one. In practice, we only store a finite number (usually 32 or 64) of bits, so on a 32-bit machine we define 32 ones as -1. The highest bit, then, can be thought of as a "sign bit"--negative numbers have that bit set, positive numbers have that bit clear. Suprisingly, it make sense to think of the sign bit in digit n as having value -2ⁿ, instead of value 2ⁿ for unsigned numbers. What's weirder is that addition and subtraction are exactly the same for signed or unsigned numbers--try it! (All other operations are different for signed and unsigned, though: comparison, multiplication, division, etc.)

Signed versus unsigned numbers are clearer if you draw them out for a 3-bit machine:

Bit Pattern	Unsigned Value	Signed Value
000	0	0
001	1	1
010	2	2
011	3	3
100	4	-4
101	5	-3
110	6	-2
111	7	-1

Note that everything but Java supports "signed" as well as "unsigned" integers explicitly. In C/C++/C#, typecasting to an unsigned value does nothing to the bits, and is hence a zero-clock operation.

Wraparound

You can watch the positive-to-negative wraparound happen pretty quickly with 8-bit signed characters:

signed char i=100;
while (i!=-100) {
	cout<<(int)i<<"\n";
	i++;
}
return i;

(Try this in NetRun now!)

This prints out 100, 101, ... 126, 127, -128, -127, ... -100.

Here's the corresponding transition for "signed short", a 16-bit signed value:

signed short i=(1<<15)-10;
while (i!=-((1<<15)-10)) {
	cout<<(int)i<<"\n";
	i++;
}
return i;

(Try this in NetRun now!)

The biggest positive signed short is 32767, adding one gets you to -32768.

And finally for "signed int", which is just plain int:

signed int i=(1<<31)-10;
while (i!=-((1<<31)-10)) {
	cout<<(int)i<<"\n";
	i++;
}
return i;

(Try this in NetRun now!)

Again, the biggest signed int is 2147483647, after which we wrap around to -2147483648.

This has odd consequences. For example, surprisingly this loop is NOT an infinite loop, because we can count way upwards to billions, wrap around to negative, and then stop:

int i,n=10;
for (i=n-1;i>=0;i++) {
	// .. do something ...
}
return i;

(Try this in NetRun now!)

Another example: if we have an "unsigned int n" with the value 3 billion, this loop will run forever:

int i;
unsigned int n=3000000000u;
for (i=0;i<n;i++) {
	// .. do something ...
}
return i;

(Try this in NetRun now!)

The trouble is that we increment all the way up to 1<<31, wrap around to a negative value, increment upward again to zero, and repeat.

This is why the compiler warns you about "comparison between signed and unsigned": for certain values, it can be an infinite loop!

Assembly Signed and Unsigned

Surprisingly, there *is* no signed or unsigned add or subtract--the bits do the same thing in both cases, so they're actually the same exact operation and assembly instruction for both signed and unsigned.

Comparisons come in signed and unsigned flavors, because when you're comparing positive and negative signed numbers, the sign bit counts for a negative value. On x86, you use the same "cmp" instruction, but "jl" and "jg" (jump if Less, or Greater) do a signed comparison, while "ja" and "jb" (jump if Above, or Below) do an unsigned comparison. For example, comparing the bit pattern X=all-zeros to Y=all-ones, as an unsigned comparison, X<Y, because Y is really big; while as a signed comparison X>Y, because Y is negative one.

So "jump if above" (unsigned comparison) does not jump, returning 0xB0B:

mov eax,0x0        ; all-zeros
mov ecx,0xffFFffFF ; all-ones
cmp eax,ecx
ja did_jump
mov eax,0xb0b
ret

did_jump:
mov eax,0xd00d
ret

(Try this in NetRun now!)

While "jump if greater" (signed comparison) actually jumps, returning 0xD00D:

mov eax,0x0        ; all-zeros
mov ecx,0xffFFffFF ; all-ones
cmp eax,ecx
jg did_jump
mov eax,0xb0b
ret

did_jump:
mov eax,0xd00d
ret

(Try this in NetRun now!)

You can actually take advantage of this signed/unsigned comparison difference. One way to speed up a "zero to n" bounds test on a signed value is to turn it into an unsigned value and just compare against n--that is, "if ((x>=0)&&(x<n)) ..." is the same as "if (((unsigned int)x)<n) ...", because negative signed x's will turn into really huge unsigned x's!

One other place where signed and unsigned differ is in bit shifting. Left shifts ("shl" in x86) overwrite the sign bit, which is fine until you overflow. But right shifts actually have to fill in the sign bit--what should they fill in?

For unsigned values, the incoming bits are always set to zero. In x86, the "shr" instruction (SHift Right) does this.
For signed values, the incoming bits are set to copies of the sign bit. This means -2000000000>>2 stays negative even after the shift. In x86, the "sar" (Shift Arithmetic Right) instruction does this.

If you use the appropriate flavor of right-shift, then right-shift acts like divide-with-round-down. This means -1/8 == -1, since that's really what happens when you round down to the nearest integer. I actually prefer this to the C/C++/Java/C# standard arithmetic, where -1/8 == 0, because those languages round towards zero instead of always rounding down!

mov eax,-2000000000
sar eax,1  ; signed right-shift
ret

(Try this in NetRun now!)