The basic idea with machine code is to use binary bytes to represent a computation. Different machines use different bytes, but Intel x86 machines use "0xc3" to represent the "ret" instruction, and "0xb8" to represent the "load a 32-bit constant into eax" instruction.
0: b8 05 00 00 00 mov eax,0x5 5: c3 ret"mov" is an instruction, encoded with the operation code or "opcode" 0xb8. Since mov takes an argument, the next 4 bytes are the constant to move into eax.
(Try this in NetRun now!)
0: b8 05 00 00 00 mov eax,0x5 5: b9 05 00 00 00 mov ecx,0x5 a: ba 05 00 00 00 mov edx,0x5x86 register numbering is a bit bizarre:
Number |
0 |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
Int
Register |
eax |
ecx |
edx |
ebx |
esp |
ebp |
esi |
edi |
asm |
machine code |
Description |
add |
0x03 ModR/M |
Add one 32-bit register to another. |
mov |
0x8B ModR/M |
Move one 32-bit register to another. |
mov |
0xB8 DWORD |
Move a 32-bit constant into register eax. |
ret |
0xc3 |
Returns from current function. |
xor |
0x33 ModR/M |
XOR one 32-bit register with another. |
xor |
0x34 BYTE |
XOR register al with this 8-bit constant. |
call lilfunc ret lilfunc: xor eax,eax ret
This means you
can just as easily copy the address of a function to a register as
any other value:
mov rcx,lilfunc ; rcx = address of lilfunc call rcx retIn C++, the same concept exists, but you need the special ugly "function pointer" syntax.
int bar(void) { return 3; } int foo(void) { typedef int (*fnptr)(void); // fnptr: returns int, parameters void fnptr f=(fnptr)bar; // f points to bar return f(); // calls bar }
Because a
pointer to a function is just a pointer to bytes, you can manually
declare bytes of machine code, and then run them:
const unsigned char bytes[]={ 0x33,0xC0, // xor eax,eax 0xc3 // ret }; int foo(void) { typedef int (*fnptr)(void); // fnptr: returns int, parameters void fnptr f=(fnptr)bytes; // f points to bytes return f(); // run the bytes }
Many advanced computer security attacks use "shellcode", a little chunk of machine code normally written as a hex escaped C/C++ string (and sent to the target as a string):
const char *bytes="\x33\xc0\xc3"; int foo(void) { typedef int (*fnptr)(void); // fnptr: returns int, parameters void fnptr f=(fnptr)bytes; // f points to bytes return f(); // run the bytes }
The machine code doesn't need to be stored in an array. Since this long int has the same bytes (you read the bytes from the little end, right to left), you can execute a *long* like a *function*:
const long bytes=0xc3c033; int foo(void) { typedef int (*fnptr)(void); // fnptr: returns int, parameters void fnptr f=(fnptr)&bytes; // f points to bytes return f(); // run the bytes }
One complication: as a security feature, you can only execute the read-only parts of your memory space. Read-write memory, like section .data or the stack, is not executable by default, and the machine will crash before running code from there unless you specifically ask the OS to make the memory executable.
x86 specifies register sizes using prefix bytes. For
example, the same "0xb8" instruction that loads a 32-bit constant
into eax can be used with a "0x66" prefix to load a 16-bit
constant, or a "0x48" REX
prefix to load a 64-bit constant.
Here we're loading the same constant 0x12 into all the different
sizes of eax:
0: 48 b8 12 00 00 00 00 00 00 00 mov rax,0x12 a: b8 12 00 00 00 mov eax,0x12 f: 66 b8 12 00 mov ax,0x12 13: b0 12 mov al,0x12
15: c3 ret
mod |
reg/opcode |
r/m |
2 bits, selects memory or
register access mode: 0: memory at register r/m 1: memory at register r/m+byte offset 2: memory at register r/m + 32-bit offset 3: register r/m itself (not memory) |
3 bits, usually a destination
register number. For some instructions, this is actually extra opcode bits. |
3 bits, usually a source
register number. Treated as a pointer for mod!=3, treated as an ordinary register for mod==3. If r/m==4, indicates the real memory source is a SIB byte. |
r32(/r) reg= |
EAX 000 |
ECX 001 |
EDX 010 |
EBX 011 |
ESP 100 |
EBP 101 |
ESI 110 |
EDI 111 |
||
effective address | mod | R/M | value of mod R/M byte (hex) | |||||||
[RAX] [RCX] [RDX] [RBX] [SIB] [RIP + DWORD] [RSI] [RDI] |
00 |
000 001 010 011 100 101 110 111 |
00 01 02 03 04 05 06 07 |
08 09 0A 0B 0C 0D 0E 0F |
10 11 12 13 14 15 16 17 |
18 19 1A 1B 1C 1D 1E 1F |
20 21 22 23 24 25 26 27 |
28 29 2A 2B 2C 2D 2E 2F |
30 31 32 33 34 35 36 37 |
38 39 3A 3B 3C 3D 3E 3F |
[RAX + BYTE] [RCX + BYTE] [RDX + BYTE] [RBX + BYTE] [SIB + BYTE] [RBP + BYTE] [RSI + BYTE] [RDI + BYTE] |
01 |
000 001 010 011 100 101 110 111 |
40 41 42 43 44 45 46 47 |
48 49 4A 4B 4C 4D 4E 4F |
50 51 52 53 54 55 56 57 |
58 59 5A 5B 5C 5D 5E 5F |
60 61 62 63 64 65 66 67 |
68 69 6A 6B 6C 6D 6E 6F |
70 71 72 73 74 75 76 77 |
78 79 7A 7B 7C 7D 7E 7F |
[RAX + DWORD] [RCX + DWORD] [RDX + DWORD] [RBX + DWORD] [SIB + DWORD] [RBP + DWORD] [RSI + DWORD] [RDI + DWORD] |
10 |
000 001 010 011 100 101 110 111 |
80 81 82 83 84 85 86 87 |
88 89 8A 8B 8C 8D 8E 8F |
90 91 92 93 94 95 96 97 |
98 99 9A 9B 9C 9D 9E 9F |
A0 A1 A2 A3 A4 A5 A6 A7 |
A8 A9 AA AB AC AD AE AF |
B0 B1 B2 B3 B4 B5 B6 B7 |
B8 B9 BA BB BC BD BE BF |
EAX ECX EDX EBX ESP EBP ESI EDI |
11 |
000 001 010 011 100 101 110 111 |
C0 C1 C2 C3 C4 C5 C6 C7 |
C8 C9 CA CB CC CD CE CF |
D0 D1 D2 D3 D4 D5 D6 D7 |
D8 D9 DA DB DC DD DE DF |
E0 E1 E2 E3 E4 E5 E6 E7 |
E8 E9 EA EB EC ED EE EF |
F0 F1 F2 F3 F4 F5 F6 F7 |
F8 F9 FA FB FC FD FE FF |