# Machine Code in x86

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

## Basics of Machine Code

The basic idea with machine code is to use binary bytes to represent a computation.  Different machines use different bytes, but Intel x86 machines use "0xc3" to represent the "ret" instruction, and "0xb8" to represent the "load a 32-bit constant into eax" instruction.

```   0:	b8 05 00 00 00       	mov    eax,0x5
5:	c3                   	ret (Try this in NetRun now!)```
"mov" is an instruction, encoded with the operation code or "opcode" 0xb8.  Since mov takes an argument, the next 4 bytes are the constant to move into eax.

The opcode 0xb9 moves a constant into ecx.  0xba moves a constant into edx.
```   0:	b8 05 00 00 00       	mov    eax,0x5
5:	b9 05 00 00 00       	mov    ecx,0x5
a:	ba 05 00 00 00       	mov    edx,0x5```
x86 register numbering is a bit bizarre:
 Number 0 1 2 3 4 5 6 7 Int Register eax ecx edx ebx esp ebp esi edi

(Note that ebx is not in the order you'd expect!)

## Function Pointers

In assembly language, there's nothing special about function pointers, they are declared and used like any other pointer:
```  call lilfunc
ret

lilfunc:
xor eax,eax
ret```

This means you can just as easily copy the address of a function to a register as any other value:

```  mov rcx,lilfunc ; rcx = address of lilfunc
call rcx
ret```

(Try this in NetRun now!)

In C++, the same concept exists, but you need the special ugly "function pointer" syntax
```int bar(void) {
return 3;
}

int foo(void) {
typedef int (*fnptr)(void); // fnptr: returns int, parameters void
fnptr f=(fnptr)bar; // f points to bar
return f(); // calls bar
}
```

Because a pointer to a function is just a pointer to bytes, you can manually declare bytes of machine code, and then run them:

```const unsigned char bytes[]={
0x33,0xC0, // xor eax,eax
0xc3  // ret
};

int foo(void) {
typedef int (*fnptr)(void); // fnptr: returns int, parameters void
fnptr f=(fnptr)bytes; // f points to bytes
return f(); // run the bytes
}
```

(Try this in NetRun now!)

## x86 Register Sizes

x86 specifies register sizes using prefix bytes.  For example, the same "0xb8" instruction that loads a 32-bit constant into eax can be used with a "0x66" prefix to load a 16-bit constant, or a "0x48" REX prefix to load a 64-bit constant.

Here we're loading the same constant 0x12 into all the different sizes of eax:

```   0:	48 b8 12 00 00 00 00 00 00 00 	mov    rax,0x12
a:	   b8 12 00 00 00       	mov    eax,0x12
f:	66 b8 12 00          		mov    ax,0x12
13:	   b0 12                	mov    al,0x12
15:	   c3                   	ret 	```

## x86 Instructions

Disassembling your assembly or compiled code shows you both the instructions and the machine code that implements them.

Not only are there hundreds of different x86 instructions, there can be dozens of different machine code encodings for a given instruction (see opcodes in numerical order).  Here are a few examples:

 asm machine code Description add 0x03 ModR/M Add one 32-bit register to another. mov 0x8B ModR/M Move one 32-bit register to another. mov 0xB8 DWORD Move a 32-bit constant into register eax. ret 0xc3 Returns from current function. xor 0x33 ModR/M XOR one 32-bit register with another. xor 0x34 BYTE XOR register al with this 8-bit constant.

## Encoding memory and register operands

That last byte of most x86 instructions is called "ModR/M" in the Intel documentation, and it specifies what the source and destination are.  It's just a bit field giving a selector called "mod" (which indicates whether r/m is treated as a plain register or a memory address), one register called "reg/opcode" (which is usually the destination register, and determines the column in the ModR/M table), and a final register called "r/m" (usually the source register, which selects the row of the ModR/M table).  These are stored in a single byte with this format:
 mod reg/opcode r/m 2 bits, selects memory or register access mode:   0: memory at register r/m   1: memory at register r/m+byte offset   2: memory at register r/m + 32-bit offset   3: register r/m itself (not memory) 3 bits, usually a destination register number. For some instructions, this is actually extra opcode bits. 3 bits, usually a source register number. Treated as a pointer for mod!=3, treated as an ordinary register for mod==3. If r/m==4, indicates the real memory source is a SIB byte.

This is the
"ModR/M" table for the meaning of ModR/M byte values:
 r32(/r) reg= EAX 000 ECX 001 EDX 010 EBX 011 ESP 100 EBP 101 ESI 110 EDI 111 effective address mod R/M value of mod R/M byte (hex) [RAX] [RCX] [RDX] [RBX] [SIB] [RIP + DWORD] [RSI] [RDI] 00 000 001 010 011 100 101 110 111 00 01 02 03 04 05 06 07 08 09 0A 0B 0C 0D 0E 0F 10 11 12 13 14 15 16 17 18 19 1A 1B 1C 1D 1E 1F 20 21 22 23 24 25 26 27 28 29 2A 2B 2C 2D 2E 2F 30 31 32 33 34 35 36 37 38 39 3A 3B 3C 3D 3E 3F [RAX + BYTE] [RCX + BYTE] [RDX + BYTE] [RBX + BYTE] [SIB + BYTE] [RBP + BYTE] [RSI + BYTE] [RDI + BYTE] 01 000 001 010 011 100 101 110 111 40 41 42 43 44 45 46 47 48 49 4A 4B 4C 4D 4E 4F 50 51 52 53 54 55 56 57 58 59 5A 5B 5C 5D 5E 5F 60 61 62 63 64 65 66 67 68 69 6A 6B 6C 6D 6E 6F 70 71 72 73 74 75 76 77 78 79 7A 7B 7C 7D 7E 7F [RAX + DWORD] [RCX + DWORD] [RDX + DWORD] [RBX + DWORD] [SIB + DWORD] [RBP + DWORD] [RSI + DWORD] [RDI + DWORD] 10 000 001 010 011 100 101 110 111 80 81 82 83 84 85 86 87 88 89 8A 8B 8C 8D 8E 8F 90 91 92 93 94 95 96 97 98 99 9A 9B 9C 9D 9E 9F A0 A1 A2 A3 A4 A5 A6 A7 A8 A9 AA AB AC AD AE AF B0 B1 B2 B3 B4 B5 B6 B7 B8 B9 BA BB BC BD BE BF EAX ECX EDX EBX ESP EBP ESI EDI 11 000 001 010 011 100 101 110 111 C0 C1 C2 C3 C4 C5 C6 C7 C8 C9 CA CB CC CD CE CF D0 D1 D2 D3 D4 D5 D6 D7 D8 D9 DA DB DC DD DE DF E0 E1 E2 E3 E4 E5 E6 E7 E8 E9 EA EB EC ED EE EF F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 FA FB FC FD FE FF

ModR/M is much easier to write in octal, not hex, since the 3-bit fields match exactly with octal digits.  You can write octal in C/C++ with a leading 0, so "0301" is octal for 011 000 001 (binary) or 0xC1 (hex).

For example, a ModR/M byte like:
• ModR/M==0312 (octal again) means mod==3 (so we're only touching registers, not memory), reg/opcode==1 (meaning destination is xmm1 or ecx, register 1), and r/m==2 (meaning the source register is xmm2 or edx, register 2)
• ModR/M==0007 (in octal) is composed of mod==0 (so accessing memory), reg/opcode==0 (meaning the normal-destination register is eax, register 0), and r/m==5 (meaning the source is memory pointed to by edi, register 7).