210 likes | 292 Views
A Static Program Analyzer to increase software reuse. Ramakrishnan Venkitaraman and Gopal Gupta. Cost of software always on the rise. Why do we need a software standard?. Lack of software reuse because of lack of software standards Non availability of a rich set of COTS components
E N D
A Static Program Analyzer to increase software reuse Ramakrishnan Venkitaraman and Gopal Gupta Department of Computer Science
Cost of software always on the rise Source: Data and Analysis center for Software
Why do we need a software standard? • Lack of software reuse because of lack of software standards • Non availability of a rich set of COTS components • Time to market new products measured in years rather than months • Incompatibilities make integration of software from multiple vendors impossible The discussion refers mainly to DSP software but the problems are comparable to any software development process
TI TMS320 DSP Algorithm Standard • Contains 34 rules and 15 guidelines • Intended to enable a rich set of COTS marketplace and significantly reduce the time to market for new products • Will allow system integrators to integrate compliant algorithms from multiple vendors into a single system • Reduces time to market, increases software quality and software reuse
General Programming Rules • No tool currently exists to check for compliance • Programs must be relocatable • No hard coded data memory locations • No hard coded program memory locations • Programs must be reusable • Algorithms must be re-entrant
Hard Coded Addresses • Generally a bad programming practice unless you are programming for device drivers • Results in non relocatable code • Results in non reusable code • A pointer variable is said to be NOT hard coded if • If the address is derived from a call to memory allocation routines like “malloc” or “calloc” • If the address is derived as a function of the “stack pointer” • If the address is derived from another pointer that is legitimate.
Static Program Analysis • Static program analysis (or static analysis for brevity) is defined as any analysis of a program carried out without completely executing the program • The traditional data-flow analysis found in compiler back-ends is an example of static analysis • Another example of static analysis is abstract interpretation, in which a program's data and operations are approximated and the program abstractly executed
Basic Blocks and Flow Graph s • A “Basic Block” is a sequence of consecutive statements in which flow of control enters at the beginning and leaves at the end without halting or possibility of branching except at the end. • The basic blocks form the nodes in a directed graph called the “Control Flow-Graph”. This graph will help us to visualize and arrive at all possible paths through which program control could flow at runtime. All such paths must be analyzed for compliance.
Overview of our approach • Input: Object Code of the algorithm • Output: Compliant / Not Compliant status Activity Diagram for our Static Analyzer
Our Algorithm for Static Analysis • Get the disassembled code from the input object code • From the disassembled code, get the basic blocks and construct the flow-graph • Analyze the flow-graph and check for the dereferencing of pointer variables • For each such dereferencing, scan back and find out from where did this pointer get its value from (involves the formation of unsafe sets which are explained later) • If the original source of this pointer is hard coded, then declare that the algorithm is not compliant (“unsafe") • If the original source from of this pointer is legitimate then declare that dereferencing is safe • The algorithm is declared to be safe if and only if all such pointer dereferencing are safe
Phases in Static Analysis of the Flow Graph • Phase 1: The analyzer detects statements in the disassembled code which correspond to the dereferencing of pointer variables by scanning downwardsin the flow graph • Phase 2: The analyzer checks whether any dereferencing detected in phase 1 is safe by scanning upwardsin the flow graph
Building Unsafe Sets • “Unsafe Set” is the set of registers which may potentially contain hard coded references • First element is added to the unsafe set when phase 1 detects dereferencing of a pointer • Example: If we find “ *Reg ” in the analyzed code, the unsafe set is initialized to {*Reg} Note: Most Examples used in the presentation use the ‘C’ programming language for easy understanding while the real analysis is done at the Assembly Language level.
Building unsafe sets (continued) • Phase 2 populates the equivalence set by “scanning backwards” • For example if we find • Reg = Reg1 + Reg2, the element “Reg” is deleted from the unsafe set and the elements “Reg1” and “Reg2” are inserted into the unsafe set • Contents of the unsafe set will now become {Reg1, Reg2} • Now we scan backwards searching for both “Reg1” and “Reg2” in this case
Analysis Stops when… • All pointer dereferencing in the program are declared to be “safe” (not hard coded) OR • At least one of the pointer dereferencing in the program is declared to be “unsafe” (hard coded)
Handling Loops • Complex because the number of iterations of the loop may not be known until runtime • We scan and cycle through the loop until the unsafe set reaches a “Fixed Point” • A Fixed Point is reached when • The unsafe set repeats itself at the same point in the loop during successive iterations • No new information is added to the unsafe set during successive iterations
Handling Function Calls • Similar to a Branch statement • Marks the beginning and end of basic blocks • Recursive function calls are handled as if they were looping constructs
Handling Parallelism • The || characters signify that an instruction is to execute in parallel with the previous instruction • Instructions A, B, C are executed in parellel • Example Instruction A || Instruction B || Instruction C • Handle/Skip parallel instructions encountered during phase 2 until an instruction in the previous cycle is found
Current Work • Current work includes fine tuning the handling of loops and extending our system for the remaining rules • The development and testing of the tool is currently in progress • The system is being developed using the ‘C’ programming language
Related Work and Conclusion • Compared to Dynamic Analysis, Static Analysis can give correct results for a larger set of cases because of the very nature of the analysis • Our work so far can be regarded as an attempt to demonstrate the efficacy of static analysis to perform these checks and aid in software reuse
References • Ramakrishnan Venkitaraman and Gopal Gupta,“Static Program Analysis to Detect Hard Coded Addresses and its Application to TI's DSP Processor”, CS department technical report UTD CS-23-03 For More information, contact ramakrishnan@student.utdallas.edu