Buffer Overflow Part 1: Direct Memory Manipulation

Computer Security Lecture, Dr. Lawlor

Directly modifying a running program's memory is a well known attack. Programs that are anything less than perfect in their manipulation of memory (buffer overrun, pointer crossing, uninitialized data, use after delete, time of check to time of use bugs, multithread race conditions) are vulnerable to these sorts of direct memory manipulation attacks.

An analogy: driving a car by pulling cables & wires under the hood

Successfully taking control of a program via direct memory manipulation is roughly analogous to "driving" a car while perched inside the open hood by directly pulling on the control cables using pliers. (For example, because you're in an action movie trying to stop an enemy's car, or you're debugging a self-driving car, or both at once.)

Under the hood of a typical car,
showing spark plug wires, fuel injectors, etc.

(1995 Buick Regal engine, from Wikimedia Commons, CC-SA)

For example:

To make the car accelerate, pull the throttle linkage (by the air intake, middle right)
To turn the engine off, pull a few of the spark plug wires (bottom left)
To disable power steering, cut the belt that drives the power steering pump (top left)

As with direct memory manipulation, driving a car this way has serious disadvantages:

This approach requires extremely detailed knowledge of how the machine operates. For example, the gas pedal is in the same spot on basically all cars, but the throttle linkage location and appearance varies dramatically between models.
Some things that are easy from the front end, like steering or braking, are much more difficult to accomplish because the relevant control parts are not exposed.
Often it's easy to deactivate functionality permanently, by breaking things, but a lot harder to re-enable it.

Toy Example

Here's some innocent looking library code to control a device. It should compile as plain C, C++, or C++11.

The run function is inspired by the structs in the linux kernel with embedded function pointers, like struct inode_operations.

/**
 MEESIL: Multipurpose Exploration Engine Software Integration Library
 
 An open-source library designed for the remote control of semi-autonomous
 mobile devices, such as remote control toys, smart thermostats, flying bombs, etc.
*/
#include <stdio.h>
#include <stdlib.h>

/** These should be enough for anyone.  
    If you want more, I find you wanting. */
#define MEESIL_MAX_ACTUATORS 4 
#define MEESIL_MAX_SENSORS 4 
#define MEESIL_MAX_OPCODES 8 

/* Forward declare the MEESIL struct */
struct MEESIL;

/** This user-defined function drives the actual machine:
  it writes m->actuators to physical world, 
  and reads world into m->sensors.
*/
typedef void (*MEESIL_RUN_FUNCTION)(struct MEESIL *m);

/** This struct represents the control state of a MEESIL vehicle. */
struct MEESIL {
	/** Control actuator values get loaded here: */
	int actuators[MEESIL_MAX_ACTUATORS];
	
	/** Input sensor values get loaded here: */
	int sensors[MEESIL_MAX_SENSORS];
	
	/** Control firmware gets loaded here: */
	unsigned char opcodes[MEESIL_MAX_OPCODES];
	
	/** This function actually runs the machine. */
	MEESIL_RUN_FUNCTION run;
};

/** Load MEESCRIPT byte opcodes from the input file, 
    until you reach a -1 opcode (or EOF). */
void meesil_load(struct MEESIL *m,FILE *input)
{
	int index=0;
	int opcode=0;
	while (1==fscanf(input,"%x",&opcode))
	{
		if (opcode==-1) return;
		m->opcodes[index++]=opcode;
	}
}

/** Debug function: print what's in the struct */
void meesil_print(struct MEESIL *m,const char *status)
{
	int i;
	printf("MEESIL state: %s\n",status);
	for (i=0;i<MEESIL_MAX_ACTUATORS;i++)
		printf("	actuators[%d]=%d	sensors[%d]=%d\n",
		       i,m->actuators[i],   i,m->sensors[i]);
	printf("	run=%p\n",m->run);
}

/** Run the meesil control loop forever. */
void meesil_run(struct MEESIL *m)
{
	int i=0;
	unsigned int index=0;
	while (1) {
		/* Read next step from the firmware */
		int opcode=m->opcodes[index++];
		if (opcode==1) for (i=0;i<MEESIL_MAX_ACTUATORS;i++) m->actuators[i]=0; /* STOP! */
		/* meesil_execute_opcode(m,opcode); <-- omitted for now */
		if (index>=MEESIL_MAX_OPCODES) index=0; /* wraparound */
		
		/* Write firmware values to real world */
		m->run(m);
	}
}


/** 
 Gosh, here's a function you hope you never need to call.  
 Really, nobody should ever call this.
 Not even sure why we put it in here.
*/
void meesil_self_destruct(void) {
	puts(" SELF DESTRUCT ACTIVATED.  Evac to MINIMUM SAFE DISTANCE of 150 meters in the next 10 seconds!");
	exit(1313);
}

void meesil_demo_run(struct MEESIL *m)
{
	meesil_print(m,"demo_run");
	
	static long nruns=0;
	nruns++;
	if (nruns>=2) {
		printf("That's all, you get the picture?\n");
		exit(0);
	}
}

void foo(void) {
	printf("MEESIL v0.1 active.\n");
	printf("  Self-destruct routine at %p\n", meesil_self_destruct);
	
	struct MEESIL *m=(struct MEESIL *)calloc(1,sizeof(struct MEESIL));
	m->run=meesil_demo_run;
	meesil_print(m,"before load");
	meesil_load(m,stdin);
	meesil_print(m,"after load");
	meesil_run(m);
}

(Try this in NetRun now!)

This works fine ... as long as we provide no more than 8 opcodes as input (MEESIL_MAX_OPCODES).

But what happens if we provide more than 8 opcodes as input? The load function keeps loading them, and will happily overwrite the run function pointer with any value we choose to provide. Let's provide opcodes that represent the address of the self-destruct function, like this:

1 2 3 4 5 6 7 8
4a 0f 40 00 00 00 00 00 -1

(Try this in NetRun now!)

Reminder from CS 301: two hex digits make one byte. Bytes get loaded into an int or pointer starting from the little end because x86 is little-endian. So the first byte in 0x400f4a is 0x4a, then 0x0f, then 0x04, then 5 zero bytes, which is why we load them in that order.

In the code, after we make meesil_load trash m->run, that load function actually finishes, and we start running the firmware before we finally call m->run(). Because our overwritten pointer can point literally anywhere, when the program hits m->run() it could jump off into bad memory and crash, or jump into the middle of some carefully selected bank account transfer function just after the point where it checks the account balance and authorization codes, or it could jump inside the data we just input and run arbitrary code, or as in this case we can call functions that are never statically called. Buffer overflows can break all the rules because walking off the end of an array, or using data after delete, or any of this class of bug results in "Undefined Behavior": the program might work fine, or it might fling open a portal to a dimension filled with evil robots, or anywhere in between.

Note that we don't actually need to write so many bytes. Since the target pointer already has the value 0x40101b, we just need to overwrite the first 2 bytes of the pointer value. (This sort of tweaking is often important if the input string can't have embedded binary nul chars.)

1 2 3 4 5 6 7 8
4a 0f -1

(Try this in NetRun now!)

In a real program, we usually don't get a nice clear printout with the address of vulnerable functions, but an attacker can still extract this data with a debugger (if an identical system is available), or by just exhaustively trying addresses until they find something useful.

This example has a clear and obvious in-memory relationship between the vulnerable buffer overrun (opcodes) and a function pointer (run). It's often not that simple. For example, if there was a bug allowing access beyond the end of the actuators or sensors array, we'd need to skip over or overwrite the opcodes array on our way to trashing the run function pointer. We might even have to carefully overwrite opcodes with *working* opcodes, so the code keeps running long enough for it to call the run function.

In general, for a memory manipulation attack to be feasible, you just need two things:

A manipulatable location in memory that affects the execution of the program. For example, a function pointer, a class with virtual functions vtable pointer, a function return address on the stack (stack smashing), an authentication code, etc.
Some attacker-controlled way to manipulate that location in memory. This can be simple and direct, as with many buffer overrun bugs; or highly contingent and probabilistic, like use-after-free or uninitialized data bugs.

C++ vtable / vptr Overwrite

Many big programs like web browsers and word processors are written in C++. Unlike plain C, it's rare to have structs with pointers to functions, instead C++ uses classes with virtual methods. But the same sort of direct memory manipulation attacks can target the virtual function table (vtable). Here we're overwriting the vtable itself, a more common attack targets the vtable pointer stored in each class.

class parent { public:
	virtual void print(void) { puts("parent print method: Yes, walter."); }
	virtual void exit(void) { puts("parent exit method: I'm done."); }
};

class child : public parent { public:
	virtual void print(void) { puts("child print method: Sup?"); }
	virtual void exit(void) { puts("child exit method: out"); }
};

// There's one vtable (virtual function table)
//   for each class--all parents point to the same parent vtable;
//   all children point to the same child vtable.
struct vtable {
	void *fun[10];
};
// Any class that has virtual methods has a vtable pointer, 
//   often a hidden first field in the class.
struct class_with_vtable {
	vtable *vt;
};

#include <sys/mman.h>

long foo(void)
{
	parent *p=new child;
	
	// Manually extract the child vtable pointer:
	vtable *v=((class_with_vtable *)p)->vt;
	
	printf("child instance contents: ");
	dump_hex(p,sizeof(child)*8);
	printf("vtable contents: ");
	dump_hex(v,sizeof(void *)*8*4);
	
	// Get read/write access to everything (otherwise writing to vtable segfaults)
	mprotect((void *)0x400000,1000*1000,PROT_READ|PROT_WRITE|PROT_EXEC);
	printf("Modifying the vtable\n");
	// Modify the child's vtable to exit when it calls print:
	v->fun[0]=(void *)0x4010d8; 
	printf("Modified the vtable, running virtual method:\n");
	
	p->print();
	return sizeof(child);
}

(Try this in NetRun now!)

Real-world Examples

libpng 2015 bug with technical description.

Android libstagefright 2015 bug.

Over the years, there have been a variety of security-relevant bugs in libpng and libjpeg.

Because you often receive png or jpeg images over the network (for example, in a browser via a banner ad, in an email program via an attachment, on a wordpress server from a user icon upload) most of these are network-exploitable. (Related: ImageTragick)