Names: Big and Little Endian Memory Access

CS 301 Lecture, Dr. Lawlor

So finally, after weeks of preparation, here's what you've been waiting for: assembly language!

        Machine Code:           Assembly Code:
Address                    Instruction Operands
   0:	b8 07 00 00 00       	mov    eax,0x7
   5:	c3                   	ret

Here's the terminology we'll be using for the rest of the semester:

Address. A byte count, indicating where you are in a big list of bytes. The first byte has address zero. An address can be thought of as an array index into a big array of chars.
Machine Code. A set of bytes that the CPU can treat as instructions and execute. For example, the byte "0xC3" tells the CPU to execute the instruction "ret" (return from function). Human beings can write machine code (see HW2!), but usually humans write Assembly Code instead.
Assembly Code. The human-readable counterpart to machine code. Assembly code is line-oriented human readable text that lots of people write by hand. You use an "assembler" to turn assembly code into executable machine code. You can use a "disassembler" to turn executable code into assembly code (try the NetRun "Disassemble" checkbox for the code above!).
Instruction. One command for the CPU to execute. Can be thought of as: one line of assembly code, or one group of machine code bytes. Depending on the context, sometimes "instruction" includes the operands ("the 'push ebp' instruction"), sometimes it doesn't ("the 'push' instruction, with operand 'ebp'.").
Operands. The parameters taken by various instructions. These parameters can be constants, like the "0x7" above, addresses in memory (which we'll talk about in a week or so), or registers.
Registers. Registers are little storage locations built into the CPU. They're used like variables in assembly language--you spend most of your time putting values into registers, doing arithmetic on register values, and moving values around between registers. For example, "ebp", "esp", and "eax" above are all registers. Unlike variables, there are a fixed number of registers (built into the design of the CPU), and the registers have fixed names (built into the assembler).

In terms of the little tables we built last class, the address is the index into the table, and the machine code is the stuff in the table. The stuff in the table can further be broken down into separate instructions, each with constant operands and registers.

Big and Little Endian Memory Access

Let's say we ask the CPU to treat four bytes as a single integer, using a typecast like so:

const unsigned char table[]={
	1,2,3,4
};

int foo(void) {
	typedef int *myPtr;
	myPtr p=(myPtr)table;
	return p[0];
}

(Try this in NetRun now!)

This program returns "0x4030201", which is rather the opposite of what you might expect. The mismatch here is that we write (arabic) numerals right-to-left (just like arabic), but we write table entries (and everything else) left-to-right.

So the CPU reads the first, leftmost table entry (1) to get the lowest-valued byte (0x01), which we write on the right side (0x...01). Similarly, the last table entry (4) is interpreted as the highest-valued byte (0x04), which we write on the left side (0x04...).

But this depends on the CPU! All x86 CPUs start with the lowest-valued byte (the "little end" of the integer comes first, hence "little endian"), but many other CPUs, such as the PowerPC, MIPS, and SPARC CPUs, start with the highest-valued byte (the "big end" of the integer, hence "big endian"). So this same code above returns 0x01020304 on a PowerPC--try this!

The big and little endian naming confusing exists even in the non-computer world. Consider that the following are all little-endian (starting with the least-significant information):

Fairbanks, Alaska, USA
John Smith, Carpet Cleaner
Pittsburgh Technical Institute

Yet the following are all big-endian (starting with the biggest information):

University of Alaska Fairbanks
Cleaners, Carpet: Smith, John (like in a phonebook)
907 474-7678
$7.32
École Polytechnique de Montréal

You can see big- and little-endian byte storage going not just from bytes to ints, but also from ints to bytes:

int foo(void) {
	int x=0xa0b0c0d0; /* Integer value we'll pick apart into bytes */
	typedef unsigned char *myTable; /* We'll make it an array of chars */
	myTable table=(myTable)&x; /* point to the bytes of the integer x */
	for (int i=0;i<4;i++) /* print each byte of the integer x */
		std::cout<<std::hex<<(int)table[i]<<" ";
	std::cout<<std::endl;
	return 0;
}

(Try this in NetRun now!)

This code prints "d0 c0 b0 a0" on a little-endian machine--the first byte is the lowest-value "0xd0".