Memory Access in Machine Code

CS 441 Lecture, Dr. Lawlor

There are several places you can get data to work on:
You can look at some executables to figure out how common these different places are.  On 32-bit x86:
Registers used:
30.9% "eax" lines (eax is the return result register, and general scratch)
5.7% "ebx" lines (this register is only used for accessing globals inside DLL code)
10.3% "ecx" lines
15.5% "edx" lines
11.7% "esp" lines (note that "push" and "pop" implicitly change esp, so this should be about 5% higher)
25.9% "ebp" lines (the bread-and-butter stack access base register)
12.0% "esi" lines
8.6% "edi" lines

Features used:
66.0% "0x" lines (immediate-mode constants)
69.6% "," lines (two-operand instructions)
36.7% "+" lines (address calculated as sum)
1.2% "*" lines (address calculated with scaled index)
48.1% "\[" lines (explicit memory accesses)
2.8% "BYTE PTR" lines (char-sized memory access)
0.4% "WORD PTR" lines (short-sized memory access)
40.7% "DWORD PTR" lines (int or float-sized memory)
0.1% "QWORD PTR" lines (double-sized memory)
So the "typical" x86 instruction would be an int-sized load or store between a register, often eax, and a memory location, often something on the stack referenced by ebp.  Something like 50% of instructions are indeed of this form! 

In some ways, this high proportion of data movement seems quite wasteful.  But then, think about all the "wasteful" non-programming activity you do in a typical day (sleeping, eating, moving around, thinking about what to type, etc)!

x86 has a truly amazing array of different methods to access RAM.

Immediate data is stored right in the instruction.
   	b8 07 00 00 00       	mov    eax,0x7

(Try this in NetRun now!)

Register direct data is accessed from another register:
  	89 c8                	mov    eax,ecx

(Try this in NetRun now!)

Absolute addresses have the address hardcoded into the instruction.  These are used for global variables (at a known fixed address).
   	8b 04 25 00 00 00 00 	mov    eax,DWORD [address]

(Try this in NetRun now!)

Register indirect addresses read the memory address from another register (here rcx, the 64-bit version of ecx).  This is just a pointer dereference:

   	8b 01                	mov    eax,DWORD PTR [rcx]

(Try this in NetRun now!)

Register indirect with offset adds a constant offset to a given base address.  This is handy for accessing a class field via a pointer:

  	8b 41 04             	mov    eax,DWORD PTR [rcx+0x4]
(Try this in NetRun now!)

Register indirect with scaled index adds another register times a constant (must be 1, 2, 4, or 8).  This is handy for array indexing:
  	8b 04 91             	mov    eax,DWORD PTR [rcx+rdx*4]
(Try this in NetRun now!)

Register indirect with scaled index plus offset combines both forms.  This is useful to access an array inside a class, or a field of a class in an array of class objects.
   	8b 44 91 10          	mov    eax,DWORD PTR [rcx+rdx*4+0x10]
(Try this in NetRun now!)

By comparison, MIPS lets you read from a register directly, or use a register as a pointer with a small fixed offset.  It's a much simpler setup, but considering how often programs read memory, x86's baroque complexity actually almost makes sense here.

Adding Memory Access to an Instruction Set

Here's the little register-to-register 64-bit instruction set we developed in class.  It has no simulated memory, so no memory access instructions:
long program[]={
/*OPC D A B C V */
0xadd0020000007FFF, /* 0: load regs[2] */
0xadd0010000000a7F, /* 1: load regs[1] */
0xb51001001000002F, /* 2: left-shift */
0xa7d001001001F0FF, /* 3: and */
0xb1e000001002002F, /* 4: branch: if regs[1]<=regs[2] goto line 2 */
0x1E70010000000000, /* return */
};
long regs[0xfff+1]; regs[0]=0;
int pc=0; /* program[pc] is the current instruction */
while (true) {
long inst=program[pc++];/* fetch instruction */
/* "decode" instruction */
int opcode=(inst>>(4*13))&0xfff, C=(inst>>(4*1))&0xfff;
int D=(inst>>(4*10))&0xfff,A=(inst>>(4*7))&0xfff,B=(inst>>(4*4))&0xfff;
int V=(inst)&0xf;
switch (opcode) {
case 0xb51: regs[D]=regs[A]<<(regs[B]+C); break;
case 0xb1e: if (regs[A] <= regs[B]) pc=regs[D]+C; break;
case 0xadd: regs[D]=regs[A] + regs[B] + C; break;
case 0x371: regs[D]=regs[A] * regs[B] * C; break;
case 0xa7d: regs[D]=regs[A] & regs[B] & C; break;
case 0x1E7: return regs[D];
}
if (V>0xA) { /* print out "all" registers */
std::cout<<" before pc="<<pc<<" regs "<<regs[1]<<" "<<regs[2]<<"\n";
}
}

(Try this in NetRun now!)

Here's one way to add memory operations: I've made 0x510 a "STOre" instruction.  I need to add simulated memory too!
long program[]={
/*OPC D A B C V */
0xadd003000000020F, /* 0: load regs[3] (memory pointer) */
0xadd0020000007FFF, /* 1: load regs[2] (comparison limit) */
0xadd0010000000a7F, /* 2: load regs[1] (loop value) */
0xb51001001000002F, /* 3: left-shift */
0xa7d001001001F0FF, /* 4: and */
0x510001003000000F, /* 5: store regs[1] into address regs[3] */
0xadd003003000001F, /* 6: increment regs[3] */
0xb1e000001002003F, /* 4: branch: if regs[1]<=regs[2] goto line 3 */
0x1E70010000000000, /* return */
};
long regs[0xfff+1]; regs[0]=0; /* simulated registers */
long mem[0xffff]; /* simulated memory space */
int pc=0; /* program[pc] is the current instruction */
while (true) {
long inst=program[pc++];/* fetch instruction */
/* "decode" instruction */
int opcode=(inst>>(4*13))&0xfff, C=(inst>>(4*1))&0xfff;
int D=(inst>>(4*10))&0xfff,A=(inst>>(4*7))&0xfff,B=(inst>>(4*4))&0xfff;
int V=(inst)&0xf;
switch (opcode) {
case 0x510: mem[regs[A]+C*regs[B]]=regs[D]; break;
case 0xb51: regs[D]=regs[A]<<(regs[B]+C); break;
case 0xb1e: if (regs[A] <= regs[B]) pc=regs[D]+C; break;
case 0xadd: regs[D]=regs[A] + regs[B] + C; break;
case 0x371: regs[D]=regs[A] * regs[B] * C; break;
case 0xa7d: regs[D]=regs[A] & regs[B] & C; break;
case 0x1E7: return regs[D];
}
if (V>0xA) { /* print out "all" registers */
std::cout<<" before pc="<<pc<<" regs "<<regs[1]<<" "<<regs[2]<<"\n";
}
}

(Try this in NetRun now!)

Note above that the memory address I've chosen include both an unscaled pointer (regs[A]) and a scaled index (C*regs[B]).  The only thing I'm missing compared to x86 is scaled index plus offset!