Registers in x86 Assembly

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

Like C++ variables, registers are actually available in several sizes:

Curiously, you can write a 64-bit value into rax, then read off the low 32 bits from eax, or the low 16 bitx from ax, or the low 8 bits from al--it's just one register, but they keep on extending it!

rax: 64-bit
eax: 32-bit
ax: 16-bit
ah al

For example,

mov rcx,0xf00d00d2beefc03; load a big 64-bit constant
mov eax,ecx; pull out low 32 bits (0x2beefc03)

(Try this in NetRun now!)

Here's the full list of x86 registers.  The 64 bit registers are shown in red.  "Scratch" registers any function is allowed to overwrite, and use for anything you want without asking anybody.  "Preserved" registers have to be put back ("save" the register) if you use them.

Notes Type
Values are returned from functions in this register. 
rax eax ax ah and al
Typical scratch register.  Some instructions also use it as a counter. scratch
rcx ecx cx ch and cl
Scratch register. scratch
rdx edx dx dh and dl
Preserved register: don't use it without saving it! preserved
rbx ebx bx bh and bl
The stack pointer.  Points to the top of the stack (details coming soon!)
preserved rsp esp sp spl
Preserved register.  Sometimes used to store the old value of the stack pointer, or the "base". preserved rbp ebp bp bpl
Scratch register.  Also used to pass function argument #2 in 64-bit Linux scratch rsi esi si sil
Scratch register.  Function argument #1 in 64-bit Linux scratch rdi edi di dil
Scratch register.  These were added in 64-bit mode, so they have numbers, not names.
scratch r8 r8d r8w r8b
Scratch register. scratch r9 r9d r9w r9b
Scratch register. scratch r10 r10d r10w r10b
Scratch register. scratch r11 r11d r11w r11b
Preserved register.  You can use it, but you need to save and restore it.
preserved r12 r12d r12w r12b
Preserved register. preserved r13 r13d r13w r13b
Preserved register. preserved r14 r14d r14w r14b
Preserved register. preserved r15 r15d r15w r15b

You can convert values between different register sizes using different instructions:

Source Size

64 bit rcx
32 bit ecx
16 bit cx
8 bit cl
64 bit rax
mov rax,rcx
movsxd rax,ecx
movsx rax,cx
movsx rax,cl
Writes to whole register
32 bit eax
mov eax,ecx
mov eax,ecx
movsx eax,cx
movsx eax,cl
Top half of destination gets zeroed
16 bit ax
mov ax,cx
mov ax,cx
mov ax,cx
movsx ax,cl
Only affects low 16 bits, rest unchanged.
8 bit al
mov al,cl
mov al,cl mov al,cl mov al,cl
Only affects low 8 bits, rest unchanged.


The fact is, variables on a computer only have so many bits.  If the value gets bigger than can fit in those bits, the extra bits first go negative and then "overflow".  By default they're then ignored completely.

int big=1024*1024*1024;
return big*4;

(Try this in NetRun now!)

On my machine, "int" is 32 bits, which is +-2 billion in binary, so this actually returns 0?!
	Program complete.  Return 0 (0x0)

You can extract the value of each bit.  For example:

int value=1; /* value to test, starts at first (lowest) bit */
for (int bit=0;bit<100;bit++) {
	std::cout<<"at bit "<<bit<<" the value is "<<value<<"\n";
	value=value+value; /* moves over by one bit */
	if (value==0) break;
return 0;

(Try this in NetRun now!)

Because "int" currently has 32 bits, if you start at one, and add a variable to itself 32 times, the one overflows and is lost completely. 

In assembly, there's a handy instruction "jo" (jump if overflow) to check for overflow from the previous instruction.  The C++ compiler doesn't bother to use jo, though!

mov edi,1 ; loop variable
mov eax,0 ; counter

	add eax,1 ; increment bit counter

	add edi,edi ; add variable to itself
	jo noes ; check for overflow in the above add

	cmp edi,0
	jne start


noes: ; called for overflow
	mov eax,999

(Try this in NetRun now!)

Notice the above program returns 999 on overflow, which somebody else will need to check for.  (Responding correctly to overflow is actually quite difficult--see, e.g., Ariane 5 explosion, caused by poor handling of a detected overflow.  Ironically, ignoring the overflow would have caused no problems!)

Signed versus Unsigned Numbers

If you watch closely right before overflow, you see something funny happen:

signed char value=1; /* value to test, starts at first (lowest) bit */
for (int bit=0;bit<100;bit++) {
	std::cout<<"at bit "<<bit<<" the value is "<<(long)value<<"\n";
	value=value+value; /* moves over by one bit (value=value<<1 would work too) */
	if (value==0) break;
return 0;

(Try this in NetRun now!)

This prints out:

at bit 0 the value is 1
at bit 1 the value is 2
at bit 2 the value is 4
at bit 3 the value is 8
at bit 4 the value is 16
at bit 5 the value is 32
at bit 6 the value is 64
at bit 7 the value is -128 
Program complete.  Return 0 (0x0)

Wait, the last bit's value is -128?  Yes, it really is!

This negative high bit is called the "sign bit", and it has a negative value in two's complement signed numbers.  This means to represent -1, for example, you set not only the high bit, but all the other bits as well: in unsigned, this is the largest possible value.  The reason binary 11111111 represents -1 is the same reason you might choose 9999 to represent -1 on a 4-digit odometer: if you add one, you wrap around and hit zero.

A very cool thing about two's complement is addition is the same operation whether the numbers are signed or unsigned--we just interpret the result differently.  Subtraction is also identical for signed and unsigned.  Register names are identical in assembly for signed and unsigned.  However, when you change register sizes using an instruction like "movsxd rax,eax", when you check for overflow, when you compare numbers, multiply or divide, or shift bits, you need to know if the number is signed (has a sign bit) or unsigned (no sign bit, no negative numbers).

Signed Unsigned Language
int unsigned int C++, int is signed by default.
signed char unsigned char C++, char may be signed or unsigned.
movsxd movzxd Assembly, sign extend or zero extend to change register sizes.
jo jc Assembly, overflow is calculated for signed values, carry for unsigned values.
jg ja Assembly, jump greater is signed, jump above is unsigned.
jl jb Assembly, jump less signed, jump below unsigned.
imul mul Assembly, imul is signed (and more modern), mul is for unsigned (and ancient and horrible!). idiv/div work similarly.