Signed vs Unsigned Numbers
CS 301 Lecture, Dr. Lawlor
We can do arithmetic in binary or hex by hand just like we do in
decimal. To add, line up the digits, add corresponding digits, and if
the per-digit sum is greater than the base, carry to the next digit. Easy,
right? To subtract, do the exact same thing, but "borrow" from
the next digit if the per-digit difference is less than 0.
For example, in binary 01+01=10 because the first digit overflows, and
the extra "1" is carried to the next digit. Similarly,
1111+1=10000 because all the digits overflow in sequence. In
general, adding a "1" will cause carries to ripple up through ones,
flipping them to zeros, until it finally reaches a zero, which it will
flip to a one.
Addition in hex is exactly the same--it's a bit tricker to add the
larger digits involved, but carries work exactly the same.
So 0x8+0x8=0x10, and 0xffff+0x1=0x10000.
Subtraction in binary seems really bizarre until you remember than
10-1=1 in binary--that is, "1" is the one-less-than-zero digit in
binary; just like "9" is the corresponding one-below-zero digit in decimal. So when
you have to borrow, you just flip the zeros to ones until you hit a
one.
Subtraction actually shows you how to represent negative numbers in
binary. Consider -1==0-1: you just keep on flipping zeros to ones
forever, "borrowing" against the future that never comes (just like a credit card!). So -1
is represented (in principle) by an infinite number of ones, just like
+1 is represented (in principle) by an infinite number of zeros
followed by a one. In practice, we only store a finite number
(usually 32 or 64) of bits, so on a 32-bit machine we define 32 ones as
-1. The highest bit, then, can be thought of as a "sign
bit"--negative numbers have that bit set, positive numbers have that
bit clear. Suprisingly, it make sense to think of the sign bit in
digit n as having value -2n, instead of value 2n for unsigned numbers. What's weirder is that addition and subtraction are exactly the same
for signed or unsigned numbers--try it! (All other operations are
different for signed and unsigned, though: comparison, multiplication, division, etc.)
Signed versus unsigned numbers are clearer if you draw them out for a 3-bit machine:
Bit Pattern
|
Unsigned Value
|
Signed Value
|
000
|
0
|
0
|
001
|
1
|
1
|
010
|
2
|
2
|
011
|
3
|
3
|
100
|
4
|
-4
|
101
|
5
|
-3
|
110
|
6
|
-2
|
111
|
7
|
-1
|
Note that everything but Java supports "signed" as well as "unsigned"
integers explicitly. In C/C++/C#, typecasting to an unsigned
value does nothing to the bits, and is hence a zero-clock
operation.
Wraparound
You can watch the positive-to-negative wraparound happen pretty quickly with 8-bit signed characters:
signed char i=100;
while (i!=-100) {
cout<<(int)i<<"\n";
i++;
}
return i;
(Try this in NetRun now!)
This prints out 100, 101, ... 126, 127, -128, -127, ... -100.
Here's the corresponding transition for "signed short", a 16-bit signed value:
signed short i=(1<<15)-10;
while (i!=-((1<<15)-10)) {
cout<<(int)i<<"\n";
i++;
}
return i;
(Try this in NetRun now!)
The biggest positive signed short is 32767, adding one gets you to -32768.
And finally for "signed int", which is just plain int:
signed int i=(1<<31)-10;
while (i!=-((1<<31)-10)) {
cout<<(int)i<<"\n";
i++;
}
return i;
(Try this in NetRun now!)
Again, the biggest signed int is 2147483647, after which we wrap around to -2147483648.
This has odd consequences. For example, surprisingly this loop is
NOT an infinite loop, because we can count way upwards to billions,
wrap around to negative, and then stop:
int i,n=10;
for (i=n-1;i>=0;i++) {
// .. do something ...
}
return i;
(Try this in NetRun now!)
Another example: if we have an "unsigned int n" with the value 3 billion, this loop will run forever:
int i;
unsigned int n=3000000000u;
for (i=0;i<n;i++) {
// .. do something ...
}
return i;
(Try this in NetRun now!)
The trouble is that we increment all the way up to 1<<31, wrap
around to a negative value, increment upward again to zero, and repeat.
This is why the compiler warns you about "comparison between signed and
unsigned": for certain values, it can be an infinite loop!
Assembly Signed and Unsigned
Surprisingly, there *is* no signed or unsigned add or subtract--the
bits do the same thing in both cases, so they're actually the same
exact operation and assembly instruction for both signed and unsigned.
Comparisons come in signed and unsigned flavors, because when you're
comparing positive and negative signed numbers, the sign bit counts for
a negative value. On x86, you use the same "cmp" instruction, but
"jl" and "jg" (jump if Less, or Greater) do a signed comparison, while
"ja" and "jb" (jump if Above, or Below) do an unsigned
comparison. For example, comparing the bit pattern X=all-zeros to
Y=all-ones, as an unsigned comparison, X<Y, because Y is really big;
while as a signed comparison X>Y, because Y is negative one.
So "jump if above" (unsigned comparison) does not jump, returning 0xB0B:
mov eax,0x0 ; all-zeros
mov ecx,0xffFFffFF ; all-ones
cmp eax,ecx
ja did_jump
mov eax,0xb0b
ret
did_jump:
mov eax,0xd00d
ret
(Try this in NetRun now!)
While "jump if greater" (signed comparison) actually jumps, returning 0xD00D:
mov eax,0x0 ; all-zeros
mov ecx,0xffFFffFF ; all-ones
cmp eax,ecx
jg did_jump
mov eax,0xb0b
ret
did_jump:
mov eax,0xd00d
ret
(Try this in NetRun now!)
You can actually take advantage of this signed/unsigned comparison
difference. One way to speed up a "zero to n" bounds test on a
signed value is to turn it into an unsigned value and just compare
against n--that is, "if ((x>=0)&&(x<n)) ..." is the same
as "if (((unsigned int)x)<n) ...", because
negative signed x's will turn into really huge unsigned x's!
One other place where signed and unsigned differ is in bit
shifting. Left shifts ("shl" in x86) overwrite the sign bit,
which is fine until you overflow. But right shifts actually have
to fill in the sign bit--what should they fill in?
- For unsigned values, the incoming bits are always set to zero. In x86, the "shr" instruction (SHift Right) does this.
- For signed values, the incoming bits are set to copies of the
sign bit. This means -2000000000>>2 stays negative even
after the shift. In x86, the "sar" (Shift Arithmetic Right)
instruction does this.
If you use the appropriate flavor of right-shift, then right-shift
acts like divide-with-round-down. This means -1/8 == -1, since
that's really what happens when you round down to the nearest
integer. I actually prefer this to the C/C++/Java/C# standard
arithmetic, where -1/8 == 0, because those languages round towards zero
instead of always rounding down!
mov eax,-2000000000
sar eax,1 ; signed right-shift
ret
(Try this in NetRun now!)