Return-Oriented Programming (ROP)

Computer Security Lecture, Dr. Lawlor

Due to W^X, attackers cannot usually upload machine code directly, which makes a direct execution shellcode-type attack impossible.

However, an attacker can still accomplish arbitrary computation by reusing isolated snippets of code, often called "gadgets", found in existing system libraries and program code.  Because return on x86 is implemented as "pop and jump there", an attacker can build up a pile of stuff on the stack, and chain gadgets together using function returns. 

Rop is essentially a complete workaround for W^X, but is still disrupted by stack canaries (which prevent writing the ROP chain in the first place), and disrupted by ASLR (which make the gadget locations hard to guess). 

Forging a Rop Chain

The basic idea is to pile a set of operations onto the stack, which incrementally get peeled away at runtime. 

In this perfectly ordinary 32-bit x86 assembly, we're calling puts and then exit in the usual way, using call instructions.
extern puts
extern exit

push stringy
call puts
call exit
pop eax ; not needed, due to exit
ret

stringy:
	db "OK",0

(Try this in NetRun now!)

Notice that since call == push and jump, we can save a few instructions and cycles by arranging for puts to return directly to exit.  This is sometimes done legitimately, as a tail call optimization. 
extern puts
extern exit

push stringy ; argument for puts
push exit ; fake return address for puts
jmp puts

stringy:
	db "OK",0

(Try this in NetRun now!)

Finally, since ret == pop + jump, this means push + ret == jump.  So we can "jmp puts" by loading up the stack with puts, and then return. 
extern puts
extern exit

push stringy ; argument for puts
push exit ; fake return address for puts
push puts ; start at puts
ret ; goes to puts

stringy:
	db "OK",0

(Try this in NetRun now!)

Shifting to C, and structuring the bytes more like an exploit, here we've trashed the stack so foo will return directly to puts, which will see its string argument "wha???" on the stack, then puts will "return" to exit.
const static long ropchain[]={
	0, // padding for x
	0x4019BC00, // puts
	0x4016E030, // exit (after puts)
	(long)"wha???" // puts first argument
};

long foo(void)
{
	long x;
	printf("Smashing stack...\n");
	memcpy(&x,ropchain,sizeof(ropchain));
	printf("Departing for madness...\n");
	return (long)puts;
}

(Try this in NetRun now!)

To build a longer rop chain, we would need to peel off the argument to puts, because we can't return to that string address.  To do this, we'd just find a version of ret that removes stuff from the stack, like:

0x00016a6f:     pop esi
0x00016a70: pop edi
0x00016a71: pop ebp
0x00016a72: ret

This lets us clean off up to three stack values before moving on to the next gadget. 

Gadget Addresses

Key to making rop work is finding gadget addresses.  For simple examples, you can find those addresses using a debugger.  If you don't have debug access, it can be difficult to find addresses, even without ASLR. 

Complicating this is the way at runtime, calls into a dynamic library via the normal linker will first jump to the Procedure Linkage Table (PLT), which grabs the actual function address from the Global Offset Table (GOT)'s PLT entry.

On NetRun's 32-bit machine, we can walk the GOT

push ebp
extern puts
extern dump_hex

push stringy
call puts ; Need to call it to load up the GOT
pop ecx

mov ebp,puts ; <-- function points to PLT
	push DWORD 128
	push ebp
	call dump_hex
	pop eax
	pop ecx

mov ebp, DWORD[ebp+2] ; <-- read GOT pointer
	push DWORD 32
	push ebp
	call dump_hex
	pop eax
	pop ecx

mov ebp,DWORD[ebp] ; <-- actual function pointer
	push stringy
	call 0x4019BC00
	pop ecx

mov eax,ebp
pop ebp
ret

stringy:
	db "OK",0

(Try this in NetRun now!)

On my 32-bit machine, this prints:

OK
dump_hex   (0x80489fc, 128 bits): 
  ff 25 20 c5 04 08 68 d0  00 00 00 e9 40 fe ff ff  
dump_hex   (0x804c520, 32 bits): 
  00 bc 19 40 
OK
Program complete.  Return 1075428352 (0x4019BC00)
If we examine the corresponding libc file, we find puts at a file offset of 0x5ac00.
$ objdump -TFC libc.so.6 | grep puts
0005ac00 g DF .text 0000018f GLIBC_2.0 _IO_puts
0005ac00 w DF .text 0000018f GLIBC_2.0 puts

So:
This indicates libc's "start address" or "base address" is 0x4019BC00 - 0x0005ac00 = 0x40141000.

(A library base address almost always ends in 3 hex zero digits, indicating it is aligned to 4096 bytes or one full page.)

If we use objdump to look up another symbol, like exit at 0x0002d030, we can add this function file offset to the library base address to find the runtime location of exit at 0x4016E030.  This actually works, as shown above!

If you have metasploit installed, you can dump the file offsets of promising rop gadgets using "msfrop":
  msfrop -v libc.so.6 > rop
(It takes half a minute, give it time.)

I like to also see the strings, symbol addresses, and the disassembly, so I usually start with:
  strings -t x libc.so.6 > str
objdump -TFC libc.so.6 > sym
objdump -xdrCF -M intel libc.so.6 > dis
Notice that msfrop finds way more gadgets than objdump | grep ret.  In particular, msfrop will flag *any* use of the byte 0xc3 (ordinary ret) or 0xc2 (retn, returns and also pops a constant number of bytes) that results in a valid stream of instructions.  For example, in the disassembly:
dis:
165ea: e8 a1 fc ff ff call 16290 <_Unwind_Find_FDE@plt+0x78> (File Offset: 0x16290)
165ef: 81 c3 05 ea 11 00 add ebx,0x11ea05
rop:
libc.so.6 gadget:
0x000165eb: mov eax, [81fffffch]
0x000165f0: ret
Notice msfrop has extracted a gadget from between two existing instructions. 

msfrop doesn't appear to work on x64 binaries yet, although ROPgadget works well.  (It's built on the capstone disassembler; I had to "sudo apt-get install python-capstone" and then "pip install ropgadget"). 
   ROPgadget --binary libc.so.6 --depth 10 > gadget
This finds way more gadgets, including many useful things like indirect jumps ("jmp eax").

x64 rop

Rop is most effective on 32-bit machines.  64-bit executables are more difficult to rop chain because (1) function arguments are passed in registers instead of the stack, and (2) most addresses contain leading zeros, null bytes which stop a C string copy. 

Here's the equivalent GOT / PLT walking code on x64:
push rbp
extern puts
extern dump_hex

mov rdi, stringy
call puts ; Need to call it to load up the GOT

mov rbp,puts ; <-- function points to PLT
	mov rsi, 128
	mov rdi, rbp
	call dump_hex

mov eax, DWORD[rbp+2] ; <-- read offset to GOT pointer
add rbp, rax ; <-- GOT is RIP-relative
add rbp, 6 ;<-- GOT load is 6 bytes long
	mov rsi, 64
	mov rdi, rbp
	call dump_hex

mov rbp, QWORD[rbp] ; <-- load actual function pointer
	mov rdi, stringy
	call rbp ; <- call puts directly

mov rax,rbp
pop rbp
ret

stringy:
	db "OK",0

(Try this in NetRun now!)

See r0pbaby writeup for good details on building x64 rop gadgets. 

See Metasploit RopDB for more tools.

See "one_gadget" for code to find a gadget to exec a shell.