Memory Map Manipulation

CS 301 Lecture, Dr. Lawlor

So your program's memory doesn't actually correspond 1-to-1 with the system's physical RAM; there's one layer of indirection called the "page table" that maps program "virtual addresses" into real "physical addresses".

There's a silly problem that if every byte of memory had one (say 4-byte) entry in the page table, *most* of your memory would be used just to store the page table!  So all real machines break up virtual memory into fairly large "pages" (around 4KB to 4MB in size) that all get mapped to adjacent places in physical memory.  This cuts the size of the page table by the page size; so for a 4GB virtual address space, instead of needing an absurd 16GB for 4 billion 32-bit byte pointers, you need just a svelte 16MB for 4 million 32-bit page pointers.  You can cut the space required even further by paging the page table--split up the page table into pieces that are pointed to by an even bigger table.

For example, 32-bit x86 first looks up a page directory (1024 32-bit pagetable pointers), each of which points to a page table (1024 32-bit page pointers), and each of these page table entries gives the physical address for a 4KB block of memory, one page.  64-bit machines are even worse, since their virtual address space is so much bigger; x86-64 uses four layers of tables before you finally reach the page you need!
Four level page table for 64-bit machine
If the system's designers weren't careful, looking up each memory access via two or four tables would result in memory accesses being 2x to 4x slower!  Luckily, modern machines use a special "pagetable cache" called the "translation lookaside buffer" or TLB.  The TLB just stores the virtual-to-physical mappings for the most recently accessed pages--if most TLB accesses are cache hits, memory accesses will be fast.  When you access a new page not in the TLB (a TLB miss), the CPU (or on PowerPC, a software interrupt) has to walk the page table to fill the TLB before the access can happen.  This can be slow, so sometimes people will choose large page sizes just so the TLB's fixed number of entries covers more memory!  A typical TLB holds from 32 to 128 pagetable entries, which is only 128KB to a few hundred megs depending on the page size.

So here's what the CPU does for a typical memory access:
  1. Program asks for a byte at virtual address 0xf00dead. 
  2. That's part of the page starting at virtual address 0xf00d000 (4K == 4096 == 212, so page addresses always end in 12 zero bits, or 3 hex zeros).
  3. Luckily, the TLB contains the entry for the page at virtual address 0xf00d000.  The physical address for that page is 0xcafe000.
  4. So the physical address we need is 0xcafeead (we've stuck on the low 12 bits from the original request).
  5. We check for this physical address in our cache.  It's there, so we return the program that byte.
In step 3, if the TLB didn't contain that entry, the CPU would find the page directory (0xf0-----), then index the page table (0x--0d---) to find the physical address.  It'd also stick this mapping into the TLB.  If that entry in the page table isn't valid, it segfaults.

A page table entry usually contains a bunch of access control bits indicating what operations are allowed by whom on that page.  For example, a page can be marked readonly to a particular process by just flipping a bit in that page's page table entry.

One really interesting idiom is segfault-mmap-continue: you can put recovery code in your segfault memory access failure signal handler to actually *create* the memory at the faulting address, and then resume the program.  The program doesn't even have to know this "page fault" happened.  This is how "virtual memory" (letting disk act as RAM) works.

Bottom line: the pagetable is the cool CPU hardware support the OS needs in order to do crazy stuff with memory. 

UNIX Mapping

The UNIX system calls to manipulate the page table are:
Here's an example of how to call mmap, to get 1MB of readable, writeable memory.  The first argument is a "suggested address" where you want the memory to go; try putting your own page-aligned address in there and see what happens!
#include <sys/mman.h>

int foo(void) {
int len=1024*1024;
void *addr=mmap((void *)0,len, /* address and data size (bytes) */
PROT_READ+PROT_WRITE,MAP_ANONYMOUS+MAP_SHARED, /* access flags */
-1,0); /* <- file descriptor (none needed) and file offset */
if (addr==MAP_FAILED) {perror("mmap"); exit(1);}

int *buf=(int *)addr; /* <- make mmap'd region into an int pointer */
buf[3]=7;
buf[2]=buf[3];
printf("mmap returned %p, which seems readable and writable\n",addr);
munmap(addr,len);

return 0;
}
(executable NetRun link)

The six arguments to mmap are:
  1. address, a pointer to the first byte to change.  This pointer and the length must both be a multiple of 4096 bytes (0x1000 bytes), since this is the size of a page.  You can round down to the nearest aligned page size with "ptr&~0xfff" (you may need to typecast the pointer to a "long" first).  Passing a zero pointer asks for the next unused area of memory.
  2. length, the number of bytes to change.  Must be a multiple of 4096 bytes (0x1000 bytes).
  3. access requested, some combination of PROT_READ+PROT_WRITE+PROT_EXEC.  In theory you can mark memory read-only, write-only, read-and-execute, etc.  The hardware will give you *at least* this access; although real machines might not be able to do every combination exactly in hardware.  For example, for decades x86 merged read and execute rights; they only split these (as XD/NX) during the 64-bit transition.
  4. flags, which are typically MAP_ANONYMOUS+MAP_SHARED.  MAP_ANONYMOUS is just plain memory, with no file attached.  MAP_SHARED makes your writes visible to anybody else that has the same piece of memory mapped; the alternative is MAP_PRIVATE, which gives you a unique scratch copy of the memory.
  5. a file descriptor, a previously opened file to use as the initial contents of the memory.  PROT_WRITE and MAP_SHARED can be used to change the file, by writing data to memory. Not used for an anonymous mapping, so typically left as -1.
  6. a file offset, the location in the file to start the mapping.  Not used for an anonymous mapping, so typically left as 0.
Mmap gets used for lots of different purposes:
Purpose
Code
You just want some memory from the OS.  void *mem=mmap(0,length, PROT_READ+PROT_WRITE, MAP_ANONYMOUS+MAP_SHARED, -1,0);
You want to put some memory at a given location, for example to service a page fault, or operate with old code, so you pass in an address.  mmap((void *)0xabcde000,length, PROT_READ+PROT_WRITE, MAP_ANONYMOUS+MAP_SHARED, -1,0);
You want to mark a given location as unreadable, for example to cause pagefaults when people try to access there.
mmap((void *)0xabcde000,length, PROT_NONE, MAP_ANONYMOUS+MAP_SHARED, -1,0);
You want to create some executable memory, for example to write some machine code there.
void *mem=mmap(0,length, PROT_READ+PROT_WRITE+PROT_EXEC, MAP_ANONYMOUS+MAP_SHARED, -1,0);
You want to bring in the file "fd" for reading. 
void *mem=mmap(0,length, PROT_READ, MAP_ANONYMOUS+MAP_SHARED, fd,0);
You want to bring in the file "fd" for reading and writing.
void *mem=mmap(0,length, PROT_READ+PROT_WRITE, MAP_ANONYMOUS+MAP_SHARED, fd,0);

Nearly every combination of protection flags is useful for something:
Flags
What
Why
Weirdness
PROT_NONE
Disable all access to the memory. Basically requesting a page fault when accessed.  Used by "electric fence" to find memory access errors.
!
PROT_READ
Read only area.  Useful for input files, or big read-only tables.
no
PROT_WRITE
Write only area.  Can't be read, though.  Secure shared drop box?
!!
PROT_EXEC
Execute only area.  Secure code?
!!
PROT_READ+PROT_WRITE
Read-write access.  Most ordinary memory from "new" is allocated like this.  You can't execute code here, as a security feature.
no
PROT_READ+PROT_EXEC
Readable (for constants) and executable (for code).
Most programs are mapped this way.
no
PROT_WRITE+PROT_EXEC
Write and execute, but not read? Maybe for a dynamically generated program, plus security?
!!!
PROT_READ+PROT_WRITE+PROT_EXEC
Allow all access: do what thou wilt.
Once used for everything.  Good for dynamically created code.
!

Windows Memory Mapping

The Windows calls to manipulate the page table are: