Multicore: Locks, Deadlock, and Overhead

A "mutex", is just a special object with "lock" and "unlock" operations, built so that while you've got the mutex locked, no other thread can lock the mutex--think of the lock in a bathroom stall!

With C++11 threads, this looks like:

#include <mutex>
int count=0;
std::mutex count_lock;

long foo() {
	count_lock.lock();
	int i=count++;
	count_lock.unlock();
	return i;
}

(Try this in NetRun now!)

This supports long stretches of protected code, and generalizes to calling functions.   But there are a few problems:

First, mutex aquisition is quite slow, nearly 16ns per lock.  This means you want to lock and unlock as rarely as possible.  If you're just modifying an integer, it's over 2x faster to use an atomic variable (which in x86 assembly just adds a "lock" prefix to the instruction), taking just 6ns per operation:

#include <atomic>
std::atomic<int> count;

long foo() {
	int i=count++;
	return i;
}

(Try this in NetRun now!)

Second, nobody else can do the stuff protected by the lock while you've got it--it serializes the machine.  This means you want to leave the lock locked as short a time as possible.


It's possible to try to re-lock a lock you've locked yourself.  This means you're waiting for yourself to unlock it!  A common cause of this is because the unlock was skipped in some rare code path, such as when an exception happens between the lock and unlock.  The fix is to never manually lock and unlock, but to use an RAII "lock guard", which locks in the constructor, and unlocks in the destructor:

#include <mutex>
int count=0;
std::mutex count_lock;

long foo() {
	std::lock_guard<std::mutex> guard(count_lock); // locks
	return count++;
	// implicit unlock during destructor
}

 

(Try this in NetRun now!)

It's possible to wait on one lock while you hold another lock.  If the thread that holds that lock is waiting for your lock, you're both "deadlocked".  It's called "deadlock", because they're waiting for *each other*, like the old rule " When two trains approach each other at a crossing, both shall come to a full stop and neither shall start up again until the other has gone." 

Deadlock causes programs to hang, and it's an inevitable result of having different threads aquire locks in different orders.  Possible solutions:

How are locks implemented?

Inside std::mutex, a typical implementation is a compare-and-swap operation.  Grabbing the lock is more complex than it might appear, because it's possible for multiple threads to see mylock==0 (an unlocked lock), and all grab the lock simultaneously.  

In x86 assembly, we can use a "lock xchg" instruction to atomically swap the lock value in.  If there was a 0 we have aquired the lock; if there was a 1 somebody else got in first.  Adding a polite read-only check reduces memory coherence traffic for highly contended locks.

global mutex_my_lock
mutex_my_lock:
	mov rax,1 ; == locked

	; politely read lock
	retry:
		cmp QWORD[islocked],rax
		je retry
	
	; atomically contend for lock
	lock xchg QWORD[islocked],rax ; lock it
	cmp rax,0
	jne retry
	
	ret

global mutex_my_unlock
mutex_my_unlock:
	mov QWORD[islocked],0
	ret

section .data
islocked:
	dq 0 ; 0=unlocked.  1=locked

(Try this in NetRun now!)

In C++, we can use an atomic exchange operation, to check if it's zero, and if so write a one there.

#include <thread>
#include <mutex>
#include <atomic>
volatile int count=0;

volatile std::atomic<int> mylock;
class my_mutex_guard {
public:
	my_mutex_guard() { 
		while (1) {  // spin!
			// Be polite, by doing read-only waiting
			while (mylock==1) { /* delay! */ } 
			
			// TRY to grab the lock--swap in a 1
			if(0==std::atomic_exchange_explicit(&mylock, 1, std::memory_order_acquire))
            { // we got the zero back, we have the lock!
				return;
			} 
			// else we didn't get it, keep looping and try again 
		}
	}
	~my_mutex_guard() {
		std::atomic_store_explicit(&mylock, 0, std::memory_order_release); // release lock
	}
};

//std::mutex count_lock;
void work(void) {
	for (int i=0;i<100000;i++) {
		//std::lock_guard<std::mutex> lg(count_lock);
		my_mutex_guard tg;
		count++;
	}
}

long foo() {
	count=0;
	std::thread t0(work);
	std::thread t1(work);
	std::thread t2(work);
	work();

	t0.join(); 
	t1.join();
	t2.join();

	return count;
}

(Try this in NetRun now!)


Computer Architecture Lecture Note, 2014-2020, Dr. Orion LawlorUAF Computer Science Department.