Multithreading Basics: User-Level Threads

The Argument for Threads

So you're running along in one function. You'd like to stop running, pack up, and start running some other function for a while. Easy, right? Just CALL the other function--your return address will get pushed onto the stack, and the other function will run until it's done and RET back to you.

But sometimes, CALL and RET are not good enough. Sometimes, it's nice to be able to suspend function A, run function B for a while, then switch back to A before B is finished, then switch back to where we left off inside B, and so on. In other words, sometimes you want to be able to split up a few functions into several pieces, and switch off between the execution of each of those those pieces.

This is actually a form of parallelism--it lets you get several things done at the same time.

For example, if function A is reading stuff from the network, it'll spend a lot of time just waiting for the next piece of data to arrive. That's when it would make sense to run some other function B. If B is writing stuff to disk, it'll spend time waiting for the disk to finish writing. That's when you want to switch back to A!

Just to be clear, here's what we want to have happen:

Function	A	B
Code	for (int i=0;i<3;i++) { std::cout<<"A:"<<i<<"\n"; swap from A to B }	for (int i=0;i<3;i++) { std::cout<<" B:"<<i<<"\n"; swap from B to A }

So what we want is to be able to run one iteration of A, then run one iteration of B, and so on. In a real program, of course, A and B would be doing something useful instead of just printing stuff, and they'd only switch off when they need to wait for something slow (a disk, the network, the keyboard, etc.) instead of swapping every iteration.

Implementing Threads in User Space

So you can actually write code that switches between functions A and B like above. It's actually pretty easy to do so.

You've just got to save and restore everything that's different in A and B. This includes:

The instruction pointer. This will automatically get saved to the stack if we just make "swap" a normal function.
The saved registers (ebp, ebx, esi, and edi). It's easy enough to save these on the stack in assembly.
The stack pointer. "i" is probably sitting on the stack for both functions.

So the easy part is writing the swap routine:

; Usage: swap32(old struct pointer, new struct pointer)
global swap32
swap32:
mov eax,[esp+4] ; first argument: old thread struct pointer
mov ecx,[esp+8] ; second argument: new thread struct pointer
; Save registers
push ebp
push ebx
push esi
push edi
; Save old stack
mov [eax],esp
; Load new stack
mov esp,[ecx]
; Restore registers
pop edi
pop esi
pop ebx
pop ebp
ret

Now you've just got to call the swap routine to switch from running one function to running another--and then you can switch back!

struct thread {
	void *stack_pointer; /* <- accessed from assembly */
};
/* Save this old thread's stack; and load up from this new thread's stack. */
extern "C" void swap32(thread *old,thread *new_thread);

thread A, B; /* threads to run */
void fnA(void) {
	for (int i=0;i<3;i++) {
	    std::cout<<"A:"<<i<<"\n";   
	    swap32(&A,&B);
	}
}
void fnB(void) {
	for (int i=0;i<3;i++) {
	    std::cout<<"        B:"<<i<<"\n";   
	    swap32(&B,&A);
	}
}

int foo(void) {
	/* Set up a separate stack for B (nasty, but only needed once!) */
	int Blen=8192; /* small 8KB stack */
	char *Bstack=new char[Blen];
	void **Btop=(void **)(&Bstack[Blen]); /* end of B's stack */
	*(--Btop)=(void *)fnB; /* first swap will return to start of fnB */
	for (int crap=0;crap<4;crap++)
		*(--Btop)=0; /* set new ebp, ebx, esi, and edi all to zero */
	B.stack_pointer=(void *)Btop; /* B's stack starts here */
	
	/* Just call A.  It will swap to B, which will swap to A, and so on! */
	std::cout<<"Calling fnA\n";
	fnA();
	std::cout<<"Back to foo\n";
	return 0;
}

(executable NetRun link part 1, NetRun link part 2)

Note the trickiness of setting up B's stack so that A's first call to swap32 will actually fire up function B!

This code produces:

Calling fnA
A:0
        B:0
A:1
        B:1
A:2
        B:2
Back to foo
Program complete.  Return 0 (0x0)

Flavors of Threading

Writing your own assembly code to switch between threads is called "user-level threading", or "coroutines". There are several other ways to implement the same thing, like using the builtin OS routines setjmp/longjmp, makecontext/swapcontext, or Windows Fibers.

You can also get the OS itself to do the swapping whenever it likes; threads created and managed by the OS are called "kernel threads" or just plain "threads". You can create OS threads with pthread_create (on UNIX-like systems) or CreateThread (on Windows). You'll learn more about creating kernel threads in CS 321 next semester.

The big advantage of kernel threads is that the kernel knows when an I/O will block (for example, when you're reading a file), and can thus automatically swap out that thread with another thread. The big advantage of user-level threads is that you know exactly when every thead switch will occur (because you're explicitly calling the "swap" function!), so it's pretty rare for a thread switch to happen at "exactly the wrong instant" and hence screw up your program.

My buddy Gengbin and I wrote a decent paper on the tradeoffs between user-level and kernel threads for parallel programming.