Like C++ variables, registers are actually available in several sizes:
Curiously, you can write a 64-bit value into rax, then read off the low 32 bits from eax, or the low 16 bitx from ax, or the low 8 bits from al--it's just one register, but they keep on extending it!
|
For example,
mov rcx,0xf00d00d2beefc03; load a big 64-bit constant
mov eax,ecx; pull out low 32 bits (0x2beefc03)
ret
Here's the full list of x86 registers. The 64 bit registers are shown in red. "Scratch" registers any function is allowed to overwrite, and use for anything you want without asking anybody. "Preserved" registers have to be put back ("save" the register) if you use them.
Name |
Notes | Type |
64-bit long |
32-bit int |
16-bit short |
8-bit char |
rax |
Values are returned from
functions in this register. |
scratch |
rax | eax | ax | ah and al |
rcx |
Typical scratch register. Some instructions also use it as a counter. | scratch |
rcx | ecx | cx | ch and cl |
rdx |
Scratch register. | scratch |
rdx | edx | dx | dh and dl |
rbx |
Preserved register: don't use it without saving it! | preserved |
rbx | ebx | bx | bh and bl |
rsp |
The
stack pointer. Points to the top of the stack (details coming soon!) |
preserved | rsp | esp | sp | spl |
rbp |
Preserved register. Sometimes used to store the old value of the stack pointer, or the "base". | preserved | rbp | ebp | bp | bpl |
rsi |
Scratch register. Also used to pass function argument #2 in 64-bit Linux | scratch | rsi | esi | si | sil |
rdi |
Scratch register. Function argument #1 in 64-bit Linux | scratch | rdi | edi | di | dil |
r8 |
Scratch register. These
were added in 64-bit mode, so they have numbers, not names. |
scratch | r8 | r8d | r8w | r8b |
r9 |
Scratch register. | scratch | r9 | r9d | r9w | r9b |
r10 |
Scratch register. | scratch | r10 | r10d | r10w | r10b |
r11 |
Scratch register. | scratch | r11 | r11d | r11w | r11b |
r12 |
Preserved
register. You can use it, but you need to save and
restore it. |
preserved | r12 | r12d | r12w | r12b |
r13 |
Preserved register. | preserved | r13 | r13d | r13w | r13b |
r14 |
Preserved register. | preserved | r14 | r14d | r14w | r14b |
r15 |
Preserved register. | preserved | r15 | r15d | r15w | r15b |
Source
Size |
|||||
64 bit rcx |
32 bit ecx |
16 bit cx |
8 bit cl |
Notes |
|
64 bit rax |
mov rax,rcx |
movsxd
rax,ecx |
movsx
rax,cx |
movsx
rax,cl |
Writes to whole register |
32 bit eax |
mov eax,ecx |
mov eax,ecx |
movsx
eax,cx |
movsx
eax,cl |
Top half of destination gets zeroed |
16 bit ax |
mov ax,cx |
mov ax,cx |
mov ax,cx |
movsx
ax,cl |
Only affects low 16 bits, rest unchanged. |
8 bit al |
mov al,cl |
mov al,cl | mov al,cl | mov al,cl |
Only affects low 8 bits, rest unchanged. |
int big=1024*1024*1024; return big*4;On my machine, "int" is 32 bits, which is +-2 billion in binary, so this actually returns 0?!
Program complete. Return 0 (0x0)
You can extract the value of each bit. For example:
int value=1; /* value to test, starts at first (lowest) bit */ for (int bit=0;bit<100;bit++) { std::cout<<"at bit "<<bit<<" the value is "<<value<<"\n"; value=value+value; /* moves over by one bit */ if (value==0) break; } return 0;
Because "int" currently has 32 bits, if you start at one, and add a variable to itself 32 times, the one overflows and is lost completely.
In assembly, there's a handy instruction "jo" (jump if overflow) to check for overflow from the previous instruction. The C++ compiler doesn't bother to use jo, though!
mov edi,1 ; loop variable mov eax,0 ; counter start: add eax,1 ; increment bit counter add edi,edi ; add variable to itself jo noes ; check for overflow in the above add cmp edi,0 jne start ret noes: ; called for overflow mov eax,999 ret
Notice the above program returns 999 on overflow, which somebody else will need to check for. (Responding correctly to overflow is actually quite difficult--see, e.g., Ariane 5 explosion, caused by poor handling of a detected overflow. Ironically, ignoring the overflow would have caused no problems!)
If you watch closely right before overflow, you see something funny happen:
signed char value=1; /* value to test, starts at first (lowest) bit */ for (int bit=0;bit<100;bit++) { std::cout<<"at bit "<<bit<<" the value is "<<(long)value<<"\n"; value=value+value; /* moves over by one bit (value=value<<1 would work too) */ if (value==0) break; } return 0;
This prints out:
at bit 0 the value is 1 at bit 1 the value is 2 at bit 2 the value is 4 at bit 3 the value is 8 at bit 4 the value is 16 at bit 5 the value is 32 at bit 6 the value is 64 at bit 7 the value is -128 Program complete. Return 0 (0x0)
Wait, the last bit's value is -128? Yes, it really is!
This negative high bit is called the "sign bit", and it has a negative value in two's complement signed numbers. This means to represent -1, for example, you set not only the high bit, but all the other bits as well: in unsigned, this is the largest possible value. The reason binary 11111111 represents -1 is the same reason you might choose 9999 to represent -1 on a 4-digit odometer: if you add one, you wrap around and hit zero.
A very cool thing about two's complement is addition is the same operation whether the numbers are signed or unsigned--we just interpret the result differently. Subtraction is also identical for signed and unsigned. Register names are identical in assembly for signed and unsigned. However, when you change register sizes using an instruction like "movsxd rax,eax", when you check for overflow, when you compare numbers, multiply or divide, or shift bits, you need to know if the number is signed (has a sign bit) or unsigned (no sign bit, no negative numbers).
Signed | Unsigned | Language |
int | unsigned int | C++, int is signed by default. |
signed char | unsigned char | C++, char may be signed or unsigned. |
movsxd | movzxd | Assembly, sign extend or zero extend to change register sizes. |
jo | jc | Assembly, overflow is calculated for signed values, carry for unsigned values. |
jg | ja | Assembly, jump greater is signed, jump above is unsigned. |
jl | jb | Assembly, jump less signed, jump below unsigned. |
imul | mul | Assembly, imul is signed (and more modern), mul is for unsigned (and ancient and horrible!). idiv/div work similarly. |