Memory Map Manipulation

CS 301 Lecture, Dr. Lawlor

So your program's memory doesn't actually correspond 1-to-1 with the system's physical RAM; there's one layer of indirection called the "page table" that maps program "virtual addresses" into real "physical addresses".

There's a silly problem that if every byte of memory had one (say 4-byte) entry in the page table, *most* of your memory would be used just to store the page table! So all real machines break up virtual memory into fairly large "pages" (around 4KB to 4MB in size) that all get mapped to adjacent places in physical memory. This cuts the size of the page table by the page size; so for a 4GB virtual address space, instead of needing an absurd 16GB for 4 billion 32-bit byte pointers, you need just a svelte 16MB for 4 million 32-bit page pointers. You can cut the space required even further by paging the page table--split up the page table into pieces that are pointed to by an even bigger table.

For example, 32-bit x86 first looks up a page directory (1024 32-bit pagetable pointers), each of which points to a page table (1024 32-bit page pointers), and each of these page table entries gives the physical address for a 4KB block of memory, one page. 64-bit machines are even worse, since their virtual address space is so much bigger; x86-64 uses four layers of tables before you finally reach the page you need!
Four level page table for 64-bit machine

Four level page table for 64-bit machine

If the system's designers weren't careful, looking up each memory access via two or four tables would result in memory accesses being 2x to 4x slower! Luckily, modern machines use a special "pagetable cache" called the "translation lookaside buffer" or TLB. The TLB just stores the virtual-to-physical mappings for the most recently accessed pages--if most TLB accesses are cache hits, memory accesses will be fast. When you access a new page not in the TLB (a TLB miss), the CPU (or on PowerPC, a software interrupt) has to walk the page table to fill the TLB before the access can happen. This can be slow, so sometimes people will choose large page sizes just so the TLB's fixed number of entries covers more memory! A typical TLB holds from 32 to 128 pagetable entries, which is only 128KB to a few hundred megs depending on the page size.

So here's what the CPU does for a typical memory access:

Program asks for a byte at virtual address 0xf00dead.
That's part of the page starting at virtual address 0xf00d000 (4K == 4096 == 2¹², so page addresses always end in 12 zero bits, or 3 hex zeros).
Luckily, the TLB contains the entry for the page at virtual address 0xf00d000. The physical address for that page is 0xcafe000.
So the physical address we need is 0xcafeead (we've stuck on the low 12 bits from the original request).
We check for this physical address in our cache. It's there, so we return the program that byte.

In step 3, if the TLB didn't contain that entry, the CPU would find the page directory (0xf0-----), then index the page table (0x--0d---) to find the physical address. It'd also stick this mapping into the TLB. If that entry in the page table isn't valid, it segfaults.

A page table entry usually contains a bunch of access control bits indicating what operations are allowed by whom on that page. For example, a page can be marked readonly to a particular process by just flipping a bit in that page's page table entry.

One really interesting idiom is segfault-mmap-continue: you can put recovery code in your segfault memory access failure signal handler to actually *create* the memory at the faulting address, and then resume the program. The program doesn't even have to know this "page fault" happened. This is how "virtual memory" (letting disk act as RAM) works.

Bottom line: the pagetable is the cool CPU hardware support the OS needs in order to do crazy stuff with memory.

UNIX Mapping

The UNIX system calls to manipulate the page table are:

mmap to put physical memory at a given location in program virtual memory, and optionally copying a file's contents there.
munmap to remove physical memory from a given location in virtual memory.
mprotect changes your access rights on a particular piece of memory. For example, you can remove your right to write to a particular chunk of memory.
brk and sbrk are older (pre-mmap) calls that adjust the "heap boundary", adding zero-filled physical memory at the end of the heap virtual address space. They should almost always now be replaced with calls to mmap.

Here's an example of how to call mmap, to get 1MB of readable, writeable memory. The first argument is a "suggested address" where you want the memory to go; try putting your own page-aligned address in there and see what happens!

#include <sys/mman.h>

int foo(void) {
	int len=1024*1024; 
	void *addr=mmap((void *)0,len, /* address and data size (bytes) */
		PROT_READ+PROT_WRITE,MAP_ANONYMOUS+MAP_SHARED, /* access flags */
		-1,0); /* <- file descriptor (none needed) and file offset */
	if (addr==MAP_FAILED) {perror("mmap"); exit(1);}

	int *buf=(int *)addr; /* <- make mmap'd region into an int pointer */
	buf[3]=7;
	buf[2]=buf[3];
	printf("mmap returned %p, which seems readable and writable\n",addr);
	munmap(addr,len);

	return 0;
}

(executable NetRun link)

The six arguments to mmap are:

address, a pointer to the first byte to change. This pointer and the length must both be a multiple of 4096 bytes (0x1000 bytes), since this is the size of a page. You can round down to the nearest aligned page size with "ptr&~0xfff" (you may need to typecast the pointer to a "long" first). Passing a zero pointer asks for the next unused area of memory.
length, the number of bytes to change. Must be a multiple of 4096 bytes (0x1000 bytes).
access requested, some combination of PROT_READ+PROT_WRITE+PROT_EXEC. In theory you can mark memory read-only, write-only, read-and-execute, etc. The hardware will give you *at least* this access; although real machines might not be able to do every combination exactly in hardware. For example, for decades x86 merged read and execute rights; they only split these (as XD/NX) during the 64-bit transition.
flags, which are typically MAP_ANONYMOUS+MAP_SHARED. MAP_ANONYMOUS is just plain memory, with no file attached. MAP_SHARED makes your writes visible to anybody else that has the same piece of memory mapped; the alternative is MAP_PRIVATE, which gives you a unique scratch copy of the memory.
a file descriptor, a previously opened file to use as the initial contents of the memory. PROT_WRITE and MAP_SHARED can be used to change the file, by writing data to memory. Not used for an anonymous mapping, so typically left as -1.
a file offset, the location in the file to start the mapping. Not used for an anonymous mapping, so typically left as 0.

Mmap gets used for lots of different purposes:

Purpose	Code
You just want some memory from the OS.	void *mem=mmap(0,length, PROT_READ+PROT_WRITE, MAP_ANONYMOUS+MAP_SHARED, -1,0);
You want to put some memory at a given location, for example to service a page fault, or operate with old code, so you pass in an address.	mmap((void *)0xabcde000,length, PROT_READ+PROT_WRITE, MAP_ANONYMOUS+MAP_SHARED, -1,0);
You want to mark a given location as unreadable, for example to cause pagefaults when people try to access there.	mmap((void *)0xabcde000,length, PROT_NONE, MAP_ANONYMOUS+MAP_SHARED, -1,0);
You want to create some executable memory, for example to write some machine code there.	void *mem=mmap(0,length, PROT_READ+PROT_WRITE+PROT_EXEC, MAP_ANONYMOUS+MAP_SHARED, -1,0);
You want to bring in the file "fd" for reading.	void *mem=mmap(0,length, PROT_READ, MAP_ANONYMOUS+MAP_SHARED, fd,0);
You want to bring in the file "fd" for reading and writing.	void *mem=mmap(0,length, PROT_READ+PROT_WRITE, MAP_ANONYMOUS+MAP_SHARED, fd,0);

Nearly every combination of protection flags is useful for something:

Flags	What	Why	Weirdness
PROT_NONE	Disable all access to the memory.	Basically requesting a page fault when accessed. Used by "electric fence" to find memory access errors.	!
PROT_READ	Read only area.	Useful for input files, or big read-only tables.	no
PROT_WRITE	Write only area.	Can't be read, though. Secure shared drop box?	!!
PROT_EXEC	Execute only area.	Secure code?	!!
PROT_READ+PROT_WRITE	Read-write access.	Most ordinary memory from "new" is allocated like this. You can't execute code here, as a security feature.	no
PROT_READ+PROT_EXEC	Readable (for constants) and executable (for code).	Most programs are mapped this way.	no
PROT_WRITE+PROT_EXEC	Write and execute, but not read?	Maybe for a dynamically generated program, plus security?	!!!
PROT_READ+PROT_WRITE+PROT_EXEC	Allow all access: do what thou wilt.	Once used for everything. Good for dynamically created code.	!

Windows Memory Mapping

The Windows calls to manipulate the page table are:

VirtualAlloc puts physical memory at a given virtual address. You first have to MEM_RESERVE, then MEM_COMMIT a range of virtual addresses.
VirtualFree removes physical memory from a given virtual address.