Distributed Memory with Fork & Mmap

CS 641 Lecture, Dr. Lawlor

Unlike threads, processes don't share memory by default. This substantially reduces the number of places you have to worry about race conditions and other parallel problems. However, there are times when you want to share some memory between processes--for example, to collect the pieces of the final result. You must allocate these shared regions in a special way--on UNIX, you can make a shared memory region really easily using mmap or with more code using shmget/shmat (see Beej's shm example). On Windows, you can create a shared-memory region with CreateFileMapping and then MapViewOfFile (see codeproject example).

Here's my UNIX example (works on Linux and Mac OS X, but not Windows):

#include <unistd.h> /* for fork */
#include <sys/mman.h> /* for mmap */
#include <sys/wait.h> /* for wait */

// Return len bytes of RAM that will automagically be shared after a fork().
void *shared_malloc(int len) {
	void *p=mmap(0,len, PROT_READ|PROT_WRITE, MAP_SHARED|MAP_ANON, -1,0);
	if (p==(void *)-1) printf("mmap failed!\n");
	return p;
}
void shared_free(void *p,int len) {
	munmap(p,len);
}

...
	// ... prepare program to be run in pieces ...
	// ... call shared_malloc for memory shared between pieces ...
	// Run the pieces as separate processes:
	int *pid=new int[n];
	for (int i=0;i<n;i++) {
		worker *w=new ... i'th piece of work ...
		pid[i]=fork();
		if (pid[i]==0) { /* I'm the child laborer */
			work(w);
			exit(0);
		}
	}
	// Wait until all pieces are finished (until children exit)
	for (int i=0;i<n;i++) {
		int ret;
		waitpid(pid[i],&ret,0);
	}
(Try this in NetRun now!)

Unlike the threaded version above, the separate pieces of this program can't possibly mess up each other's work--in fact, they have no way to communicate except by the shared_malloc regions allocated by the parent! I claim that for many real programs, this "private by default, shared only when explicitly asked" is the correct approach. It means all your old (buggy, global variable-laden) C++ code still works. It means you don't get annoying race conditions (as often--only in explicitly shared regions).

However, to correctly access variables in the shared memory region, you need to build your own synchronization data structure, such as a spinlock. Taking the assembly from the Wikipedia article and translating to gcc inline assembly, we get this, which works fine:

#include <unistd.h>
#include <sys/mman.h>
#include <sys/types.h>
#include <sys/wait.h>


volatile void *shared_malloc(int size) {
	void *ptr=mmap(0,size,
		PROT_READ+PROT_WRITE,
		MAP_SHARED+MAP_ANONYMOUS,
		-1,0);
	return ptr;
}

class scoped_shared_lock {
	volatile int *lock;
public:
	scoped_shared_lock(volatile int *lock_) {
		lock=lock_;
	/* FAIL: 
		while (*lock!=0) {} // wait for them to finish
		// when zero,	*lock=1; // we grab the lock
	*/
	// Atomic version, using assembly
		__asm__( 
			"4: movl $1,%%ecx\n"
			"xchg %%ecx,(%[ptr])\n"
			"test %%ecx,%%ecx\n"
			"jnz 4b\n"
		::[ptr] "r" (lock): "ecx");
	}
	~scoped_shared_lock() {
		*lock=0;
	}
};

int foo(void)
{
	volatile int *lock=(volatile int *)shared_malloc(sizeof(int));
	*lock=0; /* starts unlocked */
	volatile int &local=*(volatile int *)shared_malloc(sizeof(int));
	local=17;
	
	int pid=fork();
	if (pid!=0) { /* I am parent */
		for (int i=0;i<100;i++) {
			scoped_shared_lock lk(lock);
			local++; 
			std::cout<<"Parent's copy="<<local<<" (at "<<&local<<")\n";
			std::cout.flush();
		}
		int status=0;
		waitpid(pid,&status,0);
	} else {
		for (int i=0;i<100;i++) {
			scoped_shared_lock lk(lock);
			local++;
			std::cout<<"Child copy="<<local<<" (at "<<&local<<")\n";
			std::cout.flush();
		}
	}
	return local;
}

(Try this in NetRun now!)