1 / 18

CSC407: Software Architecture Fall 2006 Reverse Engineering

CSC407: Software Architecture Fall 2006 Reverse Engineering. Greg Wilson BA 4234 gvwilson@cs.utoronto.ca. Reminder. Midterm is in class on Thursday (Nov 9) 50 minutes 3 questions 20 marks worth 10% You may bring any printed material you like, but no electronic aids. Reverse Engineering.

acacia
Download Presentation

CSC407: Software Architecture Fall 2006 Reverse Engineering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. CSC407: Software ArchitectureFall 2006Reverse Engineering Greg Wilson BA 4234 gvwilson@cs.utoronto.ca

  2. Reminder • Midterm is in class on Thursday (Nov 9) • 50 minutes • 3 questions • 20 marks • worth 10% • You may bring any printed material you like, but no electronic aids

  3. Reverse Engineering • Reverse engineering is the process of analyzing a subject system in order to: • identify its components and their relationships; and • create representations of the system in other forms or at higher levels of abstraction • Sometimes also called design recovery • Basically, how is this thing put together?

  4. Why Do It? • Usually driven by maintenance needs • Want to know how the system actually works before making any changes • "Official" design may not exist, or may be out of date • Want to see if system conforms to regulations • Want to inspect someone else's implementation • To copy ideas, or look for security holes • Samba, OpenOffice, etc. • Legal issues around this are far from clear…

  5. Pieces of the Puzzle • A running C program contains instructions from: • Static linking: in executable • Dynamic linking: in a library known at compile time, but loaded at runtime • Runtime linking: in a library whose identity isn't known until runtime • Interacts with CPU, memory, file system, network, etc. • Every interaction is part of the implicit specification

  6. Dynamic Analysis • Linux: /proc • Every running process represented as a directory • Contents contain lots of useful information $ od -c /proc/708/cmdline 0000000 / u s r / s b i n / e x i m 4 \0 0000020 - b d \0 - q 3 0 m \0 $ ls -Vjut /proc/708/exe lrwxrwxrwx 1 root root /proc/708/exe -> /usr/sbin/exim4 • Windows: Sysinternals Process Explorer

  7. File System Activity • Linux's lsof(8) shows all open files from all processes • "File" can be socket, directory, keyboard, screen, … • See also netstat(8) • Windows: Sysinternals' Filemon and Regmon • Windows registry provides centralized hierarchical storage for application values • Much better than Unix's mix of configuration files, environment variables, /etc/rc, and so on

  8. System Calls • Linux: strace(1) shows system calls as a program makes them • Can be told to follow forks • Windows: Rohitab Batra's API Monitor • http://www.rohitab.com/apimonitor/

  9. strace output $ strace cat example.txt execve("/bin/cat", ["cat", "example.txt"], [/* 16 vars */]) = 0 uname({sys="Linux", node="pyre", ...}) = 0 brk(0) = 0x804d000 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40017000 access("/etc/ld.so.preload", R_OK) = -1 ENOENT (No such file or directory) open("/etc/ld.so.cache", O_RDONLY) = 3 fstat64(3, {st_mode=S_IFREG|0644, st_size=18896, ...}) = 0 close(3) = 0 open("/lib/tls/libc.so.6", O_RDONLY) = 3 read(3, "\177ELF\1\1\1\0\0\0\0\0\0\0\0\0\3\0\3\0\1\0\0\0\260O\1"..., 512) = 512 fstat64(3, {st_mode=S_IFREG|0755, st_size=1270928, ...}) = 0 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x40155000 mprotect(0x4014b000, 20480, PROT_READ) = 0 munmap(0x40018000, 18896) = 0 brk(0) = 0x804d000 fstat64(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 0), ...}) = 0 open("example.txt", O_RDONLY|O_LARGEFILE) = 3 fstat64(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0 read(3, "", 4096) = 0 close(3) = 0 close(1) = 0 exit_group(0) = ?

  10. API Monitor

  11. Static Analysis • Constants, functions, and classes are all represented in well-defined ways • So are loops and conditionals • Well, unoptimized ones, anyway… • Use a disassembler to extract static information • Usually can't recover variable or method names from production code • So it's rather like marking first-year assignments

  12. Example int f0017(int p00, char * p01) { int v00; char * v01; v00 = p00; v01 = p01; while ((v00 < p00) && (*v01 != 0) && (*v01 != 32)) { v00 += 1; v01 += 1; } return (v00 < p00) && (*v01 == 32); }

  13. Getting Linkage Information • Linux's ldd(1) shows: • What libraries a program is linked against • Whether they are static or dynamic • Libraries' locations in program's address space • See also nm(1) • Windows: depends • Shows which functions from which DLLs… • …recursively • See also dumpbin

  14. Understanding Graphs • These tools help you build a graph of: • Who calls whom • What data is passed where • Problem then becomes one of pattern matching • By machines: look for subgraphs that have certain properties • By human beings: display information in recognizable ways • Filtering, rearrangement, highlighting, etc.

  15. Creole (University of Victoria)

  16. What Do You Want to Know? • Increasing emphasis on taking goals and prior knowledge into account when recovering architecture • Goals: why you want to know something determines what you need to know • Prior knowledge: there are only so many ways to cook a crocodile • Example: trace execution of related use cases to identify "interesting" subset of program

  17. Tarantula (Georgia Tech)

  18. Summing Up • Links to papers have been put on course web site • We'll talk about them in a couple of weeks • Midterm is Thursday • Amit Chandel will lecture next week and the week after on performance modeling • Mmm… math… • I'll be in my office Wednesday at 13:00 to answer questions • I have some resumes to give back

More Related