Cache Replacement & Memory Map
CS 301 2007 Lecture, Dr. Lawlor
Anytime you have a cache, of any kind, you need to figure out what to
do when the cache gets full. Generally, you face this problem
when you've got a new element X to load into the cache--which cache
slot do you place X into?
The simplest approach is a "direct mapped cache",
where element X goes into cache slot X%N (where N is the size of the
cache). Direct mapping means elements 1 and 2 will go into
different (adjacent slot), but you can support many elements.
For example, the Pentium 4's L1 cache is 64KB in size and
direct-mapped. This means address 0x0ABCD and address 0x1ABCD
(which are 64KB apart) both get mapped to the same place in the
cache. So even though this program is very fast (5.2ns/call):
enum {n=1024*1024};
char arr[n];
int foo(void) {
arr[0]++;
arr[12345]++;
return 0;
}
(Try this in NetRun now!)
By contrast this very similar-looking program is very slow
(20+ns/call), because both array elements (exactly 64KB apart) map to
the same line of the cache, so the CPU keeps overwriting one with the
other (called "cache thrashing"), and the cache is totally useless:
enum {n=1024*1024};
char arr[n];
int foo(void) {
arr[0]++;
arr[65536]++;
return 0;
}
(Try this in NetRun now!)
In general, power-of-two jumps in memory can be very slow on
direct-mapped machines. This is one of the only cases on
computers where powers of two are not ideal!
Some machines avoid the thrashing of a direct-mapped cache by allowing
a given memory address to sit in one of two or four slots, called two-
or four-way "set associative caching" (described here, or in Bryant and O' Hallaron book, Chapter 6).
In general, there are many more complicated cache replacement algorithms,
although most of them are too complicated to implement in
hardware. The OS treats RAM as a cache of data on disk, and since
disks are so slow, it can pay to be smart about choosing which page to
write to disk.
321 Preview: Page Tables and Memory Mapping
So your program's memory doesn't actually correspond 1-to-1 with the system's physical
RAM; there's one layer of indirection called the "page table" that maps program
"virtual addresses" into real "physical addresses".

A
page table entry contains a bunch of access control bits
indicating what operations are allowed by whom on that page. For
example, a page can be marked readonly to a particular process by just
flipping a bit in that page's page table entry. The CPU also
updates the "Accessed" and "Dirty" bits in the pagetable for pages that
have been recently accessed or written to; this assists the CPU in
figuring out which pages to leave in RAM and which to flush out to
disk. The CPU also honors the "Write-Through" (send cache writes
all the way to RAM; basically don't cache writes) and "Cache Disabled"
(don't cache reads or writes) bits in the pagetable; these bits are
often set for weird memory-mapped hardware, like the screen.
Think about what would happen if your program's writes to screen pixels
were cached on the CPU instead of sent out to the graphics card's
memory!
See
Section 3.7 & 3.8 of the Intel x86 system programming guide for details on the page table. Or check out my 2006 lecture note for pictures of memory mapping actually in progress.
UNIX Mapping
The UNIX system calls to manipulate the page table are:
- mmap
to put physical memory at a given location in program virtual memory, and optionally copy a file's contents there.
- munmap to remove physical memory from a given location in virtual memory.
- mprotect
changes your access rights on a particular piece of memory. For
example, you can remove your right to write to a particular chunk of
memory.
Here's an example of how to call mmap, to get 1MB of readable,
writeable memory. The first argument is a "suggested address"
where you want the memory to go; try putting some values in there and
see what happens!
#include <sys/mman.h>
int foo(void) {
int len=1024*1024;
void *addr=mmap((void *)0,len, /* address and data size (bytes) */
PROT_READ+PROT_WRITE,MAP_ANONYMOUS+MAP_SHARED, /* access flags */
-1,0); /* <- file descriptor (none needed) and file offset */
if (addr==MAP_FAILED) {perror("mmap"); exit(1);}
int *buf=(int *)addr; /* <- make mmap'd region into an int pointer */
buf[3]=7;
buf[2]=buf[3];
printf("mmap returned %p, which seems readable and writable\n",addr);
munmap(addr,len);
return 0;
}
(executable NetRun link)
For example, you can map in some memory at the NULL pointer like so:
#include <sys/mman.h>
int foo(void) {
int len=1024*1024;
void *addr=mmap((void *)NULL,len, /* address and data size (bytes) */
PROT_READ+PROT_WRITE,MAP_FIXED+MAP_ANONYMOUS+MAP_SHARED,
-1,0);
int *p=NULL;
p[0]=10;
printf("mmap returned %p, which seems readable and writable\n",addr);
munmap(addr,len);
return 0;
}
(Try this in NetRun now!)
Windows Memory Mapping
The Windows calls to manipulate the page table are:
- VirtualAlloc
puts physical memory at a given virtual address. You first have
to MEM_RESERVE, then MEM_COMMIT a range of virtual addresses.
- VirtualFree removes physical memory from a given virtual address.