Dynamic Translation: Building Machine Code at Runtime

CS 301: Assembly Language Programming Lecture, Dr. Lawlor

Function Pointers

In assembly language, there's nothing special about function pointers:

  call lilfunc
  ret

lilfunc:
  xor eax,eax
  ret

(Try this in NetRun now!)

This means you can just as easily copy the address of a function to a register as any other value:

  mov rcx,lilfunc ; rcx = address of lilfunc
  call rcx
  ret

(Try this in NetRun now!)

In C++, the same concept exists, but you need the special ugly "function pointer" syntax.

int bar(void) {
	return 3;
}

int foo(void) {
	typedef int (*fnptr)(void); // fnptr: returns int, parameters void
	fnptr f=(fnptr)bar; // f points to bar
	return f(); // calls bar
}

(Try this in NetRun now!)

Because a pointer to a function is just a pointer to bytes, you can manually declare bytes of machine code, and then run them:

const unsigned char bytes[]={
	0x33,0xC0, // xor eax,eax
	0xc3  // ret
};

int foo(void) {
	typedef int (*fnptr)(void); // fnptr: returns int, parameters void
	fnptr f=(fnptr)bytes; // f points to bytes
	return f(); // run the bytes
}

(Try this in NetRun now!)

x86 Instructions

Disassembling your assembly or compiled code shows you both the instructions and the machine code that implements them.

Not only are there hundreds of different x86 instructions, there can be dozens of different machine code encodings for a given instruction (see opcodes in numerical order). Here are a few examples:

asm	machine code	Description
add	0x03 ModR/M	Add one 32-bit register to another.
mov	0x8B ModR/M	Move one 32-bit register to another.
mov	0xB8 DWORD	Move a 32-bit constant into register eax.
ret	0xc3	Returns from current function.
xor	0x33 ModR/M	XOR one 32-bit register with another.
xor	0x34 BYTE	XOR register al with this 8-bit constant.

That last byte of most x86 instructions is called "ModR/M" in the Intel documentation, and it specifies what the source and destination are. It's just a bit field giving a selector called "mod" (which indicates whether r/m is treated as a plain register or a memory address), one register called "reg/opcode" (which is usually the destination register, and determines the column in the ModR/M table), and a final register called "r/m" (usually the source register, which selects the row of the ModR/M table). These are stored in a single byte with this format:

mod	reg/opcode	r/m
2 bits, selects memory or register access mode: 0: memory at register r/m 1: memory at register r/m+byte offset 2: memory at register r/m + 32-bit offset 3: register r/m itself (not memory)	3 bits, usually a destination register number. For some instructions, this is actually extra opcode bits.	3 bits, usually a source register number. Treated as a pointer for mod!=3, treated as an ordinary register for mod==3. If r/m==4, indicates the real memory source is a SIB byte.

This is the "ModR/M" table for the meaning of ModR/M byte values:

r32(/r) reg=			EAX 000	ECX 001	EDX 010	EBX 011	ESP 100	EBP 101	ESI 110	EDI 111
effective address	mod	R/M	value of mod R/M byte (hex)
[RAX] [RCX] [RDX] [RBX] [SIB] [RIP + DWORD] [RSI] [RDI]	00	000 001 010 011 100 101 110 111	00 01 02 03 04 05 06 07	08 09 0A 0B 0C 0D 0E 0F	10 11 12 13 14 15 16 17	18 19 1A 1B 1C 1D 1E 1F	20 21 22 23 24 25 26 27	28 29 2A 2B 2C 2D 2E 2F	30 31 32 33 34 35 36 37	38 39 3A 3B 3C 3D 3E 3F
[RAX + BYTE] [RCX + BYTE] [RDX + BYTE] [RBX + BYTE] [SIB + BYTE] [RBP + BYTE] [RSI + BYTE] [RDI + BYTE]	01	000 001 010 011 100 101 110 111	40 41 42 43 44 45 46 47	48 49 4A 4B 4C 4D 4E 4F	50 51 52 53 54 55 56 57	58 59 5A 5B 5C 5D 5E 5F	60 61 62 63 64 65 66 67	68 69 6A 6B 6C 6D 6E 6F	70 71 72 73 74 75 76 77	78 79 7A 7B 7C 7D 7E 7F
[RAX + DWORD] [RCX + DWORD] [RDX + DWORD] [RBX + DWORD] [SIB + DWORD] [RBP + DWORD] [RSI + DWORD] [RDI + DWORD]	10	000 001 010 011 100 101 110 111	80 81 82 83 84 85 86 87	88 89 8A 8B 8C 8D 8E 8F	90 91 92 93 94 95 96 97	98 99 9A 9B 9C 9D 9E 9F	A0 A1 A2 A3 A4 A5 A6 A7	A8 A9 AA AB AC AD AE AF	B0 B1 B2 B3 B4 B5 B6 B7	B8 B9 BA BB BC BD BE BF
EAX ECX EDX EBX ESP EBP ESI EDI	11	000 001 010 011 100 101 110 111	C0 C1 C2 C3 C4 C5 C6 C7	C8 C9 CA CB CC CD CE CF	D0 D1 D2 D3 D4 D5 D6 D7	D8 D9 DA DB DC DD DE DF	E0 E1 E2 E3 E4 E5 E6 E7	E8 E9 EA EB EC ED EE EF	F0 F1 F2 F3 F4 F5 F6 F7	F8 F9 FA FB FC FD FE FF

ModR/M is much easier to write in octal, not hex, since the 3-bit fields match exactly with octal digits. You can write octal in C/C++ with a leading 0, so "0301" is octal for 011 000 001 (binary) or 0xC1 (hex).

x86 register numbering is about as bizarre as you've come to expect:

Number	0	1	2	3	4	5	6	7
Int Register	eax	ecx	edx	ebx	esp	ebp	esi	edi

For example, a ModR/M byte like:

ModR/M==0312 (octal again) means mod==3 (so we're only touching registers, not memory), reg/opcode==1 (meaning destination is xmm1 or ecx, register 1), and r/m==2 (meaning the source register is xmm2 or edx, register 2)
ModR/M==0007 (in octal) is composed of mod==0 (so accessing memory), reg/opcode==0 (meaning the normal-destination register is eax, register 0), and r/m==5 (meaning the source is memory pointed to by edi, register 7).

Interpreters and Dynamic Translation Performance Analysis

People who write assemblers, compilers, and linkers need to know about machine code. But lots of other people do too--people who write fast interpreters.

A typical problem is where we want to do some simple operations depending on the input file, but they're really slow when interpreted normally (read the line of code to interpret, figure out what it's asking, and do it; instead of "just do it"). The solution is to write a version of your interpreter loop that spits out real machine code to solve the problem. If you can make your interpreter use registers (e.g., because you've only got a few variables), you'll get excellent performance--just as good as compiled assembly!

Lots of people do this technique, often called dynamic binary translation:

Java does it to Java bytecode. They call this "Just-In-Time compilation", or JIT.
Microsoft does it to Common Language Runtime bytecode to make C# run fast.
Google does it in their fast JavaScript "V8" engine.
Haskell does it in the compiled version of that language.
VMware does dynamic translation to convert privileged operating system x86 instructions into unprivileged virtual instructions. This is essentially patching the operating system binary at runtime.
Lots of companies do this to run code for one processor on another processor. Apple's actually done this twice: in 1994 during the transition from 68000 to PowerPC (the "68k emulator"), and again in 2006 during the transition to x86 ("Rosetta").