Cache Replacement & Memory Map

CS 301 2007 Lecture, Dr. Lawlor

Anytime you have a cache, of any kind, you need to figure out what to do when the cache gets full. Generally, you face this problem when you've got a new element X to load into the cache--which cache slot do you place X into?

The simplest approach is a "direct mapped cache", where element X goes into cache slot X%N (where N is the size of the cache). Direct mapping means elements 1 and 2 will go into different (adjacent slot), but you can support many elements.

For example, the Pentium 4's L1 cache is 64KB in size and direct-mapped. This means address 0x0ABCD and address 0x1ABCD (which are 64KB apart) both get mapped to the same place in the cache. So even though this program is very fast (5.2ns/call):

enum {n=1024*1024};
char arr[n]; 

int foo(void) {
	arr[0]++;
	arr[12345]++;
	return 0;
}

(Try this in NetRun now!)

By contrast this very similar-looking program is very slow (20+ns/call), because both array elements (exactly 64KB apart) map to the same line of the cache, so the CPU keeps overwriting one with the other (called "cache thrashing"), and the cache is totally useless:

enum {n=1024*1024};
char arr[n]; 

int foo(void) {
	arr[0]++;
	arr[65536]++;
	return 0;
}

(Try this in NetRun now!)

In general, power-of-two jumps in memory can be very slow on direct-mapped machines. This is one of the only cases on computers where powers of two are not ideal!

Some machines avoid the thrashing of a direct-mapped cache by allowing a given memory address to sit in one of two or four slots, called two- or four-way "set associative caching" (described here, or in Bryant and O' Hallaron book, Chapter 6).

In general, there are many more complicated cache replacement algorithms, although most of them are too complicated to implement in hardware. The OS treats RAM as a cache of data on disk, and since disks are so slow, it can pay to be smart about choosing which page to write to disk.

321 Preview: Page Tables and Memory Mapping

So your program's memory doesn't actually correspond 1-to-1 with the system's physical RAM; there's one layer of indirection called the "page table" that maps program "virtual addresses" into real "physical addresses".
Bits in x86 page table entry.

A page table entry contains a bunch of access control bits indicating what operations are allowed by whom on that page. For example, a page can be marked readonly to a particular process by just flipping a bit in that page's page table entry. The CPU also updates the "Accessed" and "Dirty" bits in the pagetable for pages that have been recently accessed or written to; this assists the CPU in figuring out which pages to leave in RAM and which to flush out to disk. The CPU also honors the "Write-Through" (send cache writes all the way to RAM; basically don't cache writes) and "Cache Disabled" (don't cache reads or writes) bits in the pagetable; these bits are often set for weird memory-mapped hardware, like the screen. Think about what would happen if your program's writes to screen pixels were cached on the CPU instead of sent out to the graphics card's memory!

See Section 3.7 & 3.8 of the Intel x86 system programming guide for details on the page table. Or check out my 2006 lecture note for pictures of memory mapping actually in progress.

UNIX Mapping

The UNIX system calls to manipulate the page table are:

mmap to put physical memory at a given location in program virtual memory, and optionally copy a file's contents there.
munmap to remove physical memory from a given location in virtual memory.
mprotect changes your access rights on a particular piece of memory. For example, you can remove your right to write to a particular chunk of memory.

Here's an example of how to call mmap, to get 1MB of readable, writeable memory. The first argument is a "suggested address" where you want the memory to go; try putting some values in there and see what happens!

#include <sys/mman.h>

int foo(void) {
	int len=1024*1024; 
	void *addr=mmap((void *)0,len, /* address and data size (bytes) */
		PROT_READ+PROT_WRITE,MAP_ANONYMOUS+MAP_SHARED, /* access flags */
		-1,0); /* <- file descriptor (none needed) and file offset */
	if (addr==MAP_FAILED) {perror("mmap"); exit(1);}

	int *buf=(int *)addr; /* <- make mmap'd region into an int pointer */
	buf[3]=7;
	buf[2]=buf[3];
	printf("mmap returned %p, which seems readable and writable\n",addr);
	munmap(addr,len);

	return 0;
}

(executable NetRun link)

For example, you can map in some memory at the NULL pointer like so:

#include <sys/mman.h>

int foo(void) {
	int len=1024*1024; 
	void *addr=mmap((void *)NULL,len, /* address and data size (bytes) */
		PROT_READ+PROT_WRITE,MAP_FIXED+MAP_ANONYMOUS+MAP_SHARED, 
		-1,0);
	
	int *p=NULL;
	p[0]=10;
	
	printf("mmap returned %p, which seems readable and writable\n",addr);
	munmap(addr,len);

	return 0;
}

(Try this in NetRun now!)

Windows Memory Mapping

The Windows calls to manipulate the page table are:

VirtualAlloc puts physical memory at a given virtual address. You first have to MEM_RESERVE, then MEM_COMMIT a range of virtual addresses.
VirtualFree removes physical memory from a given virtual address.