180 likes | 344 Views
CSC407: Software Architecture Fall 2006 Reverse Engineering. Greg Wilson BA 4234 gvwilson@cs.utoronto.ca. Reminder. Midterm is in class on Thursday (Nov 9) 50 minutes 3 questions 20 marks worth 10% You may bring any printed material you like, but no electronic aids. Reverse Engineering.
E N D
CSC407: Software ArchitectureFall 2006Reverse Engineering Greg Wilson BA 4234 gvwilson@cs.utoronto.ca
Reminder • Midterm is in class on Thursday (Nov 9) • 50 minutes • 3 questions • 20 marks • worth 10% • You may bring any printed material you like, but no electronic aids
Reverse Engineering • Reverse engineering is the process of analyzing a subject system in order to: • identify its components and their relationships; and • create representations of the system in other forms or at higher levels of abstraction • Sometimes also called design recovery • Basically, how is this thing put together?
Why Do It? • Usually driven by maintenance needs • Want to know how the system actually works before making any changes • "Official" design may not exist, or may be out of date • Want to see if system conforms to regulations • Want to inspect someone else's implementation • To copy ideas, or look for security holes • Samba, OpenOffice, etc. • Legal issues around this are far from clear…
Pieces of the Puzzle • A running C program contains instructions from: • Static linking: in executable • Dynamic linking: in a library known at compile time, but loaded at runtime • Runtime linking: in a library whose identity isn't known until runtime • Interacts with CPU, memory, file system, network, etc. • Every interaction is part of the implicit specification
Dynamic Analysis • Linux: /proc • Every running process represented as a directory • Contents contain lots of useful information $ od -c /proc/708/cmdline 0000000 / u s r / s b i n / e x i m 4 \0 0000020 - b d \0 - q 3 0 m \0 $ ls -Vjut /proc/708/exe lrwxrwxrwx 1 root root /proc/708/exe -> /usr/sbin/exim4 • Windows: Sysinternals Process Explorer
File System Activity • Linux's lsof(8) shows all open files from all processes • "File" can be socket, directory, keyboard, screen, … • See also netstat(8) • Windows: Sysinternals' Filemon and Regmon • Windows registry provides centralized hierarchical storage for application values • Much better than Unix's mix of configuration files, environment variables, /etc/rc, and so on
System Calls • Linux: strace(1) shows system calls as a program makes them • Can be told to follow forks • Windows: Rohitab Batra's API Monitor • http://www.rohitab.com/apimonitor/
strace output $ strace cat example.txt execve("/bin/cat", ["cat", "example.txt"], [/* 16 vars */]) = 0 uname({sys="Linux", node="pyre", ...}) = 0 brk(0) = 0x804d000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=18896, ...}) = 0 close(3) = 0 open("/lib/tls/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260O\1"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1270928, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40155000 mprotect(0x4014b000, 20480, PROT_READ) = 0 munmap(0x40018000, 18896) = 0 brk(0) = 0x804d000 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 open("example.txt", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 read(3, "", 4096) = 0 close(3) = 0 close(1) = 0 exit_group(0) = ?
Static Analysis • Constants, functions, and classes are all represented in well-defined ways • So are loops and conditionals • Well, unoptimized ones, anyway… • Use a disassembler to extract static information • Usually can't recover variable or method names from production code • So it's rather like marking first-year assignments
Example int f0017(int p00, char * p01) { int v00; char * v01; v00 = p00; v01 = p01; while ((v00 < p00) && (*v01 != 0) && (*v01 != 32)) { v00 += 1; v01 += 1; } return (v00 < p00) && (*v01 == 32); }
Getting Linkage Information • Linux's ldd(1) shows: • What libraries a program is linked against • Whether they are static or dynamic • Libraries' locations in program's address space • See also nm(1) • Windows: depends • Shows which functions from which DLLs… • …recursively • See also dumpbin
Understanding Graphs • These tools help you build a graph of: • Who calls whom • What data is passed where • Problem then becomes one of pattern matching • By machines: look for subgraphs that have certain properties • By human beings: display information in recognizable ways • Filtering, rearrangement, highlighting, etc.
What Do You Want to Know? • Increasing emphasis on taking goals and prior knowledge into account when recovering architecture • Goals: why you want to know something determines what you need to know • Prior knowledge: there are only so many ways to cook a crocodile • Example: trace execution of related use cases to identify "interesting" subset of program
Summing Up • Links to papers have been put on course web site • We'll talk about them in a couple of weeks • Midterm is Thursday • Amit Chandel will lecture next week and the week after on performance modeling • Mmm… math… • I'll be in my office Wednesday at 13:00 to answer questions • I have some resumes to give back