440 likes | 546 Views
Analyzing and Transforming Binary Code for Fun & Profit. Ramakrishnan Venkitaraman Gopal Gupta The University of Texas at Dallas. 11/15/2004. Software Engineering Crisis. Companies. Cost of Project. Software Reuse & System Integration. But, the Integrated System does not work.
E N D
Analyzing and Transforming Binary Code for Fun & Profit Ramakrishnan Venkitaraman Gopal Gupta The University of Texas at Dallas 11/15/2004
Companies Cost of Project Software Reuse & System Integration But, the Integrated System does not work
Motivation • Facilitate software reuse in the DSP industry • DSP h/w manufacturers are interested in developing DSP software COTS components so that time to market is small • DSP components generally available only in binary form (no source code) • DSP software uses low-level optimizations for efficiency • Need to ensure that these optimizations do not interfere with reusability
Our Framework • We develop necessary and sufficient conditions that ensure that a software binary is reusable • We relate these conditions to TI’s XDAIS standard • We show how static analysis can be used to check if these conditions hold • We illustrate this through analysis for detecting hard coded pointers
Conditions to ensure reusablility • C1: The binary code should not change during execution in a way that link-time symbol resolution will become invalid • C2: The binary code should not be written in a way that it needs to be located starting from some fixed location in the virtual memory
Broadening the Conditions • C1 and C2 are hard to characterize and even harder to detect • So, broaden the conditions C1 and C2 to get conditions C3 and C4
Framework to ensure reusability • C3: The binary code is re-entrant • No self-modifying code • Should not make link-time symbol resolution invalid • C4: The binary code should not contain any hard-wired memory addresses • Binaries should not be assumed to be located at a fixed virtual memory location
TI XDAIS Standard • Contains 35 rules and 15 guidelines • SIX General Programming Rules • No tool currently exists to check for compliance • We want to build a tool to ENFORCE software compliance for these rules
XDAIS – General Programming Rules • All programs should follow the runtime conventions of TI’s C programming language • Programs must be re-entrant • No hard coded data memory locations • No hard coded program memory locations • Algorithms must characterize their ROM-ability • No peripheral device accessed directly
Advantages Of Compliant Code • Allows system integrators to easily migrate between TI DSP chips • Subsystems from multiple software vendors can be integrated into a single system • Programs are framework-agnostic: the same program can be efficiently used in virtually any application
XDAIS vs. Our Framework • Rule 1 is not really a programming rule, since it requires compliance with TI's definition of the C Language • Rules 2 through 5 are manifestations of conditions C3 and C4 above. • Rules 2 and 5 correspond to condition C3 • Rules 3, 4, and 6 correspond to condition C4
XDAIS – General Programming Rules • All programs should follow the runtime conventions of TI’s C programming language • Programs must be re-entrant • No hard coded data memory locations • No hard coded program memory locations • Algorithms must characterize their ROM-ability • No peripheral device accessed directly
Problem and Solution • Problem: Detection of hard coded addresses in programs without accessing source code. • Solution: “Static Program Analysis of Assembly Code”
Some examples showing hardcoding void main() { int *p, val; p = ….; val = …; if(val) p = 0x900; else p = malloc(…); *p; } Example3: Conditional Hardcoding void main() { int * p = 0x8800; // Some code *p = …; } Example1: Directly Hardcoded void main() { int *p = 0x80; int *q = p; //Some code *q = …; } Example2: Indirectly Hardcoded NOTE: We don’t care if a pointer is hard coded and is never dereferenced.
Static Analysis • Un-decidability: Impossible to build a tool that will precisely detect hard coding • Static Analysis: defined as any analysis of a program carried out without completely executing the program
Interest in Static Analysis • “We actually went out and bought for 30 million dollars, a company that was in the business of building static analysis tools and now we want to focus on applying these tools to large-scale software systems” • Remarks by Bill Gates, 17th Annual ACM Conference on Object-Oriented Programming, Systems, Languages and Application, November 2002.
Hard Coded Addresses • Bad Programming Practice. • Results in non relocatable code. • Results in non reusable code.
Overview Of Our Approach • Input: Object Code of the Software • Output: Compliant or Not Compliant status Obtain Basic Blocks Split Into Functions Disassemble Object Code Output the Result Static Analysis Obtain Flow Graph Activity Diagram for our Static Analyzer
Basic Aim Of Analysis • Find a pathto trace pointer origin. • Problem: Exponential Complexity • Static Analysis approximation makes it linear
Analyzing Source Code – Easy #include<stdio.h> void main() { int *p, *q; //some code p = (int*)8000; //some code q = p; //some code *q = 5; } So, the program is not compliant with the standard P IS HARD CODED { { p } } { { q } }
Analyzing Assembly Code is Hard • Problem • No type information is available • Instruction level pipeline and parallelism • Solution • Backward analysis • Use Abstract Interpretation
Analyzing Assembly – Hard {{ }} 000007A0 main: 000007A0 07BD09C2 SUB.D2 SP,0x8,SP 000007A4 020FA02A MVK.S2 0x1f40,B4 000007A8 023C22F6 STW.D2T2 B4,*+SP[0x1] 000007AC 00002000 NOP 2 000007B0 023C42F6 STW.D2T2 B4,*+SP[0x2] 000007B4 00002000 NOP 2 000007B8 0280A042 MVK.D2 5,B5 000007BC 029002F6 STW.D2T2 B5,*+B4[0x0] 000007C0 00002000 NOP 2 000007C4 008C8362 BNOP.S2 B3,4 000007C8 07BD0942 ADD.D2 SP,0x8,SP 000007CC 00000000 NOP 000007D0 00000000 NOP B4 = 0x1f40 So, B4 is HARD CODED Code is NOT Compliant { { B4} } {{ B4 }}
Abstract Interpretation Based Analysis • Domains from which variables draw their values are approximated by abstract domains • The original domains are called concrete domains
Lattice Abstraction • Lattice based abstraction is used to determine pointer hard-coded ness.
Contexts • Contexts to Abstract Contexts • Abstract Context to Context
Phases In Analysis • Phase 1: Find the set of dereferenced pointers • Phase 2: Check the safety of dereferenced pointers
Building Unsafe Sets (Phase 1) • The first element is added to the unsafe set during pointer dereferencing. • E.g.If“*Reg” in the disassembled code, the unsafe set is initialized to {Reg}. • ‘N’ Pointers Dereferenced ‘N’ Unsafe sets • Maintained as SOUS (Set Of Unsafe Sets)
Populating Unsafe Sets (Phase 2) • For e.g., if • Reg = reg1 + reg2, the element “Reg” is deleted from the unsafe set, and the elements “reg1”, “reg2”, are inserted into the unsafe set. • Contents of the unsafe set will now become {reg1, reg2}.
Pointer Arithmetic • All pointer operations are abstracted during analysis
Handling Loops • Complex:# iterations of loop may not be known until runtime. • Cycle the loop until the unsafe set reaches a “fixed point”. • No new information is added to the unsafe set during successive iterations.
Merging Information • If no merging, then exponential complexity. • Mandatory when loops • Information loss. Block A If (Cond) Then Block B Else Block C Block D Block E
Extensive Compliance Checking • Handle all cases that occur in programs • Single pointer, double pointer, triple pointer… • Global pointer variables • Static and Dynamic arrays
Extensive Compliance Checking • Loops – all forms (e.g. for, while…) • Function calls • Pipelining and Parallelism • Merging information from multiple paths
Proof – Analysis is Sound • Consistency of α and γ functions is established by showing the existence of Galois Connection. That is, • x = α(γ(x)) • y belongs to γ(α(y))
Related Work • UNO Project – Bell Labs • Analyze at source level • TI XDAIS Standard • Contains35 rules and 15 guidelines. • SIXGeneral Programming Rules. • No tool currently exists to check for compliance.
Current Status and Future Work • Prototype Implementation done • But, context insensitive, intra-procedural • Extend to context sensitive, inter-procedural. • Extend compliance check for other rules.
So… • Software reuse is an important issue in the industry, particularly the DSP industry • Checking compatibility of code w/ reusability standards at assembly level is possible • A Static Analysis based technique is useful and practical
WOW!!!! It works… Software Reuse & System Integration Select ONLY Compliant Software
More Information • R.Venkitaraman and G.Gupta, Static Program Analysis of Embedded Executable Assembly Code. Compilers, Architecture, and Synthesis for Embedded Systems (ACM CASES), September 2004 • R.Venkitaraman and G.Gupta, Framework for Safe Reuse of Software Binaries. ICDCIT, December 2004 • Masters Thesis Report – R.Venkitaraman, Framework for Safe Reuse Of Software Binaries, The University of Texas at Dallas