Names: Big and Little Endian Memory Access
CS 301 Lecture, Dr. Lawlor
So finally, after weeks of preparation, here's what you've been waiting for: assembly language!
Machine Code: Assembly Code:
Address Instruction Operands
0: b8 07 00 00 00 mov eax,0x7
5: c3 ret
Here's the terminology we'll be using for the rest of the semester:
- Address. A byte
count, indicating where you are in a big list of bytes. The first
byte has address zero. An address can be thought of as an array
index into a big array of chars.
- Machine Code.
A set of bytes that the CPU can treat as instructions and execute. For
example, the byte "0xC3" tells the CPU to execute the instruction "ret"
(return from function). Human beings can write machine code (see HW2!), but usually humans write Assembly Code instead.
- Assembly Code. The
human-readable counterpart to machine code. Assembly code is
line-oriented human readable text that lots of people write by
hand. You use an "assembler" to turn assembly code into
executable machine code. You can use a "disassembler" to turn
executable code into assembly code (try the NetRun "Disassemble"
checkbox for the code above!).
- Instruction. One command
for the CPU to execute. Can be thought of as: one line of
assembly code, or one group of machine code bytes. Depending on the
context, sometimes "instruction" includes the operands ("the 'push ebp'
instruction"), sometimes it doesn't ("the 'push' instruction, with
operand 'ebp'.").
- Operands. The
parameters taken by various instructions. These parameters can be
constants, like the "0x7" above, addresses in memory (which we'll talk
about in a week or so), or registers.
- Registers. Registers are little storage locations built into the CPU. They're used like variables in assembly language--you
spend most of your time putting values into registers, doing arithmetic
on register values, and moving values around between registers.
For example, "ebp", "esp", and "eax" above are all registers.
Unlike variables, there are a fixed number of registers (built into the
design of the CPU), and the registers have fixed names (built into the
assembler).
In terms of the little tables we built last class, the address is the
index into the table, and the machine code is the stuff in the
table. The stuff in the table can further be broken down into
separate instructions, each with constant operands and registers.
Big and Little Endian Memory Access
Let's say we ask the CPU to treat four bytes as a single integer, using a typecast like so:
const unsigned char table[]={
1,2,3,4
};
int foo(void) {
typedef int *myPtr;
myPtr p=(myPtr)table;
return p[0];
}
(Try this in NetRun now!)
This program returns "0x4030201", which is rather the opposite of what
you might expect. The mismatch here is that we write (arabic)
numerals right-to-left (just like arabic), but we write table entries
(and everything else) left-to-right.
So the CPU reads the first, leftmost table entry (1) to get the
lowest-valued byte (0x01), which we write on the right side
(0x...01). Similarly, the last table entry (4) is interpreted as
the highest-valued byte (0x04), which we write on the left side
(0x04...).
But this depends on the CPU! All x86 CPUs start with the
lowest-valued byte (the "little end" of the integer comes first, hence
"little endian"),
but many other CPUs, such as the PowerPC, MIPS, and SPARC CPUs, start
with the highest-valued byte (the "big end" of the integer, hence "big
endian"). So this same code above returns 0x01020304 on a
PowerPC--try this!
The big and little endian naming confusing exists even in the
non-computer world. Consider that the following are all
little-endian (starting with the least-significant information):
- Fairbanks, Alaska, USA
- John Smith, Carpet Cleaner
- Pittsburgh Technical Institute
Yet the following are all big-endian (starting with the biggest information):
You can see big- and little-endian byte storage going not just from bytes to ints, but also from ints to bytes:
int foo(void) {
int x=0xa0b0c0d0; /* Integer value we'll pick apart into bytes */
typedef unsigned char *myTable; /* We'll make it an array of chars */
myTable table=(myTable)&x; /* point to the bytes of the integer x */
for (int i=0;i<4;i++) /* print each byte of the integer x */
std::cout<<std::hex<<(int)table[i]<<" ";
std::cout<<std::endl;
return 0;
}
(Try this in NetRun now!)
This code prints "d0 c0 b0 a0" on a little-endian machine--the first byte is the lowest-value "0xd0".