410 likes | 528 Views
Lecture 16. Software Reverse Engineering. Grading. Algorithm for deciding your final grades: Final score: 10% class participation + 40% homework + 50 % project Rank the list: around 50% of A (subject to change), the rest will be B Project grading:
E N D
Lecture 16 Software Reverse Engineering
Grading • Algorithm for deciding your final grades: • Final score: 10% class participation + 40% homework + 50% project • Rank the list: around 50% of A (subject to change), the rest will be B • Project grading: • Signup (2%) + Proposal (10%) + Mid-point check (8%): 20% • Overall score: 80% • Presentation (10%) • Documents (30%) • Quality of your (part of) work (40%) -- your score • 1 person group: you can do less work, but the quality should be good
A Roadmap for Today • Software reverse engineering: an introduction • Static approaches: a case study • Dynamic approaches: a case study • Reverse engineering tools
What is Software Reverse Engineering • Determining structure or behavior of software by building static or dynamic models • The process of analyzing a subject system to create representations of the system at a higher level of abstraction [Chikofsky90] • Goals: • Understand malware : security • Understand legacy code: software maintenance • Input: source code or binary code Output: invariants, architecture, API rules, … code is not changed
Why Reverse Engineering? • Software maintenance is ”modification of a software to correct faults, to improve performance or to adapt to a changed environment”. (ASNI/IEEE Std 729) • Software maintenance accounts for 50%~90% of total costs in software life-cycle. • Reverse engineering is part of maintenance process and can facilitate this practice. Through reverse engineering, cost can be reduced and value can be added.
Applications of Reverse Engineering • Program comprehension, visualization • Software reuse • Document • Design discovery • Software verification • Modify software • Change of the environment • Redesign the software
Software Reverse Engineering Overall Approaches • Two general techniques: static and dynamic analysis • Static analysis: search source code • Dynamic analysis: running programs with given input, and collect and analyze runtime information • Two steps: • Collect info • Compilation phases (from source code) • Profilers, logs, debuggers • Abstract info and build models • Mining understandable, high-level models
Static models • Based on code structure, dependency, architecture • Example models: • Class diagrams • Design patterns • Dependency graphs at the levels of components, functions and variables • Contracts • Aspects
Static Approaches • Static: only process source code, not execute the programs • Advantages: • No executables required • No input needed • Types of Static Analyses (some of them done in compilers) • Control and data flow analysis • Type checking: types and a set of operations associated with types • Dependency analysis • Slicing and dicing (different ways to partition the software)
Case study: Static Control-Flow Analysis for Reverse Engineering of UML Sequence Diagrams • Existing work: UML class diagram and UML sequence diagram • Tools: Together ControlCenter by Borland and Eclipse UML by Omondo • UML sequence diagram: • Software understanding • used for testing - interactions among collaborating objects
A Graph Representation for a Program • Control flow graph (CFG): <N, E> N: a set of statements in a program E: represent control transfer between two statements bar() 1 bar(); s = (char*)malloc(80); x[10] = ‘\0’; if(strlen(t)<8) strcpy(s,t); else strcat(x,t); s = (char*)malloc(80) 2 x[10] = ‘0’ 3 strlen(t) < 8 4 yes no 5 strcpy(s,t) strcat(x,t) 6
Case study: Static Control-Flow Analysis for Reverse Engineering of UML Sequence Diagrams • Two Challenges: • How should a CFG be mapped to UML? • Can UML 2.0 enough to specify the discovered control flows? • Sequence diagram: objects and message exchange • Four types of control primitives: • Opt • Loop • Alt • Break
Control Flow Analysis • Control flow analysis • Find a branch node (alt/opt edges) • Find a Merge point • Find the header of the loop • All of the Loop exit edges • No exceptional flow is considered, as any program point potentially throws an asynchronous exceptions in Java
Design Decisions for Analysis • Tradeoffs between precision and size of sequence diagram: Mapping with replication – Full Precise
Design Decisions for Analysis • Not precise, no replication
Applications of Dependency Graphs • Security check • Guidance for refactoring • Regression Testing
Summary for Static Techniques • Static approaches advantages and disadvantages: • No executable and input needed • Potentially imprecise: e.g., infeasible • References: • Case study: static control-flow analysis for reverse engineering of UML sequence diagrams • Dependency: combining slicing and constraint solving for validate of measurement software
Abstracting the dynamic model • Finding behavior patterns, repeating sequences of events • E.g. socket protocol, secure API sequences • Using static abstractions • E.g. representing interactions between high-level software elements in sequence diagrams • Dynamic information is combined with the high-level static model
Dynamic models • Finding out the run-time behaviour of software • debugger, profiler, source code instrumentation • Visualisation: • scenarios (sequence diagrams) • State diagrams • (hierarchical) graphs
Other Information can be Found Using Dynamic Approaches • Object creation and related dependencies • Dynamic binding, polymorphism • Method calls (virtual calls and function pointers) • Looking for dead code/reachability analysis • Memory management • Performance and related problems • Concurrency
Case Study: dynamic analysis to find program invariants • Program invariant: a property that holds at a certain point or points of a program • Dynamic invariant detection: runs a program, observes the values that the program computes, and reports the properties that were true over the observed executions • Types of invariants • Constant • Non-zero • Range: a < x < b • Linear: y = ax+b • Ordering: a than b ……
Case Study: dynamic analysis to find program invariants • Use of the invariants: • Generate test inputs, predict incompatibilities of component integrations, repairing inconsistent data structures, check correctness • Reference: http://pag.csail.mit.edu/daikon
A stack example Fields: Object[] theArray; // Array that contains the stack elements. inttopOfStack; // Index of top element. -1 if stack is empty Methods: void push(Object x) // Insert x void pop() // Remove most recently inserted item Object top() // Return most recently inserted item Object topAndPop() // Remove and return most recently inserted item booleanisEmpty() // Return true if empty; else false booleanisFull() // Return true if full; else false void makeEmpty() // Remove all items
Steps to Run Daikon to Infer Invariants for Stack • Create simple test class: StackArTester • Daikon instruments the code and analyzes the resulting execution traces • Outputs procedural pre/post conditions and also object invariants hold at every public method entry and exit
Daikon Output for the Stack Example Object invariants for StackAr this.theArray != null this.theArray.getClass() == java.lang.Object[].class this.topOfStack >= -1 this.topOfStack <= this.theArray.length - 1 this.theArray[0..this.topOfStack] elements != null this.theArray[this.topOfStack+1..] elements == null Pre-conditions for the StackAr constructor capacity >= 0 Post-conditions for the StackAr constructor orig(capacity) == this.theArray.length this.topOfStack == -1 this.theArray[] elements == null Post-conditions for the isFull method this.theArray == orig(this.theArray) this.theArray[] == orig(this.theArray[]) this.topOfStack == orig(this.topOfStack) (return == false) <==> (this.topOfStack < this.theArray.length - 1) (return == true) <==> (this.topOfStack == this.theArray.length - 1)
Daikon Internal design • Grammar of variables: global, input, parameters, return • Grammar of predicates: (75 templates) • conditional predicate • supplied template • Program points: entry and exit
Daikon Internal Structures • Instrumenters (language dependent) • Inference engine (generate-and-check algorithm) • Test a set of parameters against traces • Assume all invariants possible and then exclude ones that contradict with the observed values • Optimizations • Equal variables • Dynamically constant variables • Suppress weaker variables • Variable hierachy
Summary for Dynamic techniques • Need a set of good test cases • Challenges of scalabilities • Precise techniques
Reverse engineering for OO software • Dynamic behavior may be hard to detect from static model (creating and deleting objects, garbage collection, dynamic binding,…)-> this emphasises dynamic modelling • Pure object languages support encapsulation (classes, packages,…)-> helps in static reverse engineering -> increases usability of metrics • OO paradigm supports the use of design patterns-> reusability applications (pattern recognition)
Tools • Rigi (University of Victoria, Canada) • http://www.rigi.csc.uvic.ca/ • a research prototype that represents an open and public domain reverse engineering tool • user programmable • analysis for: C, C++, COBOL, PL/AS, LaTeX • SNIFF+ (TakeFive Software) • a software development environment that also provides reverse engineering capabilities
Tools • McCabe’s Visual Reengineering Toolset and Visual Quality Toolset • various views • software metrics (complexity and structuredness) • shown as specific colors on the views • Logiscope (CS Verilog) • reverse eng, code testing, static and dynamic testing, metrics • analysis for: C, C++, Java, ADA • ESW (Viasoft Inc.) • forward and reverse engineering (maintenance), metrics, testing
Tools • Refine (Reasoning Systems Inc.) • an open and programmable tool that works in the Refinery environment • tools for generating source code parsing and conversion tools • features for analyzing and re-engineering code • analysis for: Ada, C, Cobol • Imagix4D (Imagix Corp.) • http://www.powersoftware.com/english/im/index.html • a closed tool that provides a large set of built-in functionalities • several views (also 3D) • analysis for: C/C++
CodeCrawler: * a reverse engineering tool that combines metrics and graphs to visualize OO systems
Tools for OO languages • Produce a class diagram from code • Rational Rose (Rational Software Corp.) • Paradigm Plus (Computer Associates International) • OEW (Innovative Software GmbH) • Graphical Designer (Advanced Software Technologies Inc.) • Domain Objects (Domain ObjectsInc.) • COOL:Jex (Sterling Software Inc.) • Fujaba (Paderborn University) • ...
Wild & Crazy Ideas How good software needs to be? As a consumer, you will feel comfortable to take an airplane with the failure rate of: A: 0 B: 0.000001 and below D: 0.001 and below C: 0.01 and below What about stock software, mobile phone, …