190 likes | 313 Views
Automatic Extraction of Function Bodies from Software Binaries. Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC 2005. Outline. Authors Motivation Function Extraction Experimental Results References. Gaurav Mittal.
E N D
Automatic Extraction of Function Bodies from Software Binaries Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee (Northwestern University) ASP-DAC 2005
Outline • Authors • Motivation • Function Extraction • Experimental Results • References
Gaurav Mittal • 2009 Gaurav Mittal, David Zaretsky, Prithviraj Banerjee: Streaming implementation of a sequential decompression algorithm on an FPGA. FPGA 2009: 283 • Lei Gao, David Zaretsky, Gaurav Mittal, Dan Schonfeld, Prith Banerjee: A software pipelining algorithm in high-level synthesis for FPGA architectures. ISQED 2009: 297-302 • 2007 David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Balanced Scheduling and Operation Chaining in High-Level Synthesis for FPGA Designs. ISQED 2007: 595-601 • Gaurav Mittal, David Zaretsky, Xiaoyong Tang, Prithviraj Banerjee: An Overview of a Compiler for Mapping Software Binaries to Hardware. IEEE Trans. VLSI Syst. 15(11): 1177-1190 (2007) • 2006 Gaurav Mittal, Sushrutha Locharam, Sreela Sasi, Glenn R. Shaffer, Ajith K. Kumar: An Efficient Video Enhancement Method Using LA*B* Analysis. AVSS 2006: 66 • Gaurav Mittal, Sreela Sasi: Robust Preprocessing Algorithm for Face Recognition. CRV 2006: 57 • David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Dynamic Template Generation for Resource Sharing in Control and Data Flow Graphs. VLSI Design 2006: 465-468 • 2005 Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee: Automatic extraction of function bodies from software binaries. ASP-DAC 2005: 928-931 • David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Generation of Control and Data Flow Graphs from Scheduled and Pipelined Assembly Code. LCPC 2005: 76-90
David Zaretsky • 2009 Gaurav Mittal, David Zaretsky, Prithviraj Banerjee: Streaming implementation of a sequential decompression algorithm on an FPGA. FPGA 2009: 283 • Lei Gao, David Zaretsky, Gaurav Mittal, Dan Schonfeld, Prith Banerjee: A software pipelining algorithm in high-level synthesis for FPGA architectures. ISQED 2009: 297-302 • 2007 David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Balanced Scheduling and Operation Chaining in High-Level Synthesis for FPGA Designs. ISQED 2007: 595-601 • Gaurav Mittal, David Zaretsky, Xiaoyong Tang, Prithviraj Banerjee: An Overview of a Compiler for Mapping Software Binaries to Hardware. IEEE Trans. VLSI Syst. 15(11): 1177-1190 (2007) • 2006 David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Dynamic Template Generation for Resource Sharing in Control and Data Flow Graphs. VLSI Design 2006: 465-468 • 2005 Gaurav Mittal, David Zaretsky, Gokhan Memik, Prith Banerjee: Automatic extraction of function bodies from software binaries. ASP-DAC 2005: 928-931 • David Zaretsky, Gaurav Mittal, Robert P. Dick, Prith Banerjee: Generation of Control and Data Flow Graphs from Scheduled and Pipelined Assembly Code. LCPC 2005: 76-90
Gokhan Memik • 2009 Yan Pan, Joonho Kong, Serkan Ozdemir, Gokhan Memik, Sung Woo Chung: Selective wordline voltage boosting for caches to manage yield under process variations. DAC 2009: 57-62 • Yan Pan, Prabhat Kumar, John Kim, Gokhan Memik, Yu Zhang, Alok N. Choudhary: Firefly: illuminating future network-on-chip with nanophotonics. ISCA 2009: 429-440 • Bin Lin, Arindam Mallik, Peter A. Dinda, Gokhan Memik, Robert P. Dick: User- and process-driven dynamic voltage and frequency scaling. ISPASS 2009: 11-22 • Yu Zhang, Berkin Özisikyilmaz, Gokhan Memik, John Kim, Alok N. Choudhary: Analyzing the impact of on-chip network traffic on program phases for CMPs. ISPASS 2009: 218-226 • Alex Shye, Benjamin Scholbrock, Gokhan Memik: Into the wild: studying real user activity patterns to guide power optimizations for mobile architectures. MICRO 2009: 168-178 • 2008 Arindam Mallik, Jack Cosgrove, Robert P. Dick, Gokhan Memik, Peter A. Dinda: PICSEL: measuring user-perceived performance to control dynamic frequency scaling. ASPLOS 2008: 70-79 • Alex Shye, Yan Pan, Benjamin Scholbrock, J. Scott Miller, Gokhan Memik, Peter A. Dinda, Robert P. Dick: Power to the people: Leveraging human physiological traits to control microprocessor frequency. MICRO 2008: 188-199 • Abhishek Das, Berkin Özisikyilmaz, Serkan Ozdemir, Gokhan Memik, Joseph Zambreno, Alok N. Choudhary: Evaluating the effects of cache redundancy on profit. MICRO 2008: 388-398
Prith Banerjee • 2010 Prith Banerjee: An Intelligent IT Infrastructure for the Future. ICDCN 2010: 1 • 2009 Nikolaos D. Liveris, Hai Zhou, Prithviraj Banerjee: Complete-k-distinguishability for retiming and resynthesis equivalence checking without restricting synthesis. ASP-DAC 2009: 636-641 • Prith Banerjee, Chandrakant D. Patel, Cullen Bash, Parthasarathy Ranganathan: Sustainable data centers: enabled by supply and demand side management. DAC 2009: 884-887 • Gaurav Mittal, David Zaretsky, Prithviraj Banerjee: Streaming implementation of a sequential decompression algorithm on an FPGA. FPGA 2009: 283 • Prith Banerjee: An intelligent IT infrastructure for the future. HPCA 2009: 3-4 • Lei Gao, David Zaretsky, Gaurav Mittal, Dan Schonfeld, Prith Banerjee: A software pipelining algorithm in high-level synthesis for FPGA architectures. ISQED 2009: 297-302 • 2008 Nikolaos D. Liveris, Hai Zhou, Prithviraj Banerjee: A dynamic-programming algorithm for reducing the energy consumption of pipelined System-Level streaming applications. ASP-DAC 2008: 42-48 • Nikolaos D. Liveris, Hai Zhou, Robert P. Dick, Prithviraj Banerjee: State space abstraction for parameterized self-stabilizing embedded systems. EMSOFT 2008: 11-20
Asia and South Pacific Design Automation Conference 2010 • Deadline for Paper Submission: 5 PM JST (GMT+9) July 19 (Mon), 2010 • Deadline for University LSI Design Contest: 5 PM JST (GMT+9) July 19 (Mon), 2010 • Notification of acceptance: September 24 (Fri), 2010 • Deadline for Final Version: 5 PM JST (GMT+9) November 15 (Mon.), 2010
Motivation 0x0610 L1: 0x0610 B CallAddEx 0x0614 MVK 0x05dc,A4 0x0618 MVK 0x0628,B3 0x061C MVKH 0x0000,A4 0x0620 MVKH 0x0000,B3 0x0624 NOP 0x0628 RL1: 0x0628 LDW *+SP[0x1],B4 0x062C NOP 4 0x0630 ADD B4,0x1,B4 0x0634 CMPGT 10,B4,B0 0x0638 [B0] B L1 0x063C NOP 4 0x0640 STW B4,*+SP[0x1] 0x0644 L2: 0x0644 ZERO A4 0x0648 LDW *++SP[0x2],B3 0x064C NOP 4 0x0650 B B3 0x05DC add_ex: 0x05DC SUB SP,0x8,SP 0x05E0 STW A4,*+SP[0x1] 0x05E4 NOP 2 0x05E8 B B3 0x05EC ADD SP,0x8,SP 0x05F0 NOP 4 0x05F4 main: 0x05F4 STW B3,*SP--[0x2] 0x05F8 NOP 2 0x05FC ZERO B4 0x0600 CMPGT 10,B4,B0 0x0604 [!B0]B L2 0x0608 NOP 4 0x060C STW B4,*+SP[0x1] Address Op Operands 0x05A0 CallAddEx: 0x05A0 STW B3,*SP--[0x4] 0x05A4 NOP 2 0x05A8 STW B4,*+SP[0x2] 0x05AC STW A4,*+SP[0x1] 0x05B0 NOP 2 0x05B4 MV A4,B4 0x05B8 B B4 0x05BC LDW *+SP[0x2],A4 0x05C0 MVK 0x05cc,B3 0x05C4 MVKH 0x0000,B3 0x05C8 NOP 2 0x05CC LDW *++SP[0x4],B3 0x05D0 NOP 4 0x05D4 B B3 0x05D8 NOP 5 For example, for the TI chip series the caller prologue needs the return address to be moved to register ‘B3’ before the branch is executed. On the other hand, the callee epilogue consists of a jump to register ‘B3’. However, it might not be possible to determine these destinations in all cases. For example, it is not clear by simple inspection if the branch at instruction 0x05B8 is to function ‘add_ex’. This would require knowledge of the input parameter at compile time and may not be available for complicated real world applications. Thus, if the destination of the call to ‘add_ex’ is not recognized, ‘add_ex’ will not be recognized as a function using caller-prologues alone.
Function Extraction • Their main contribution in this paper is an algorithm to extract function bodies from the binaries, where the function boundaries are not clear. • They use the procedure calling convention to recognize caller prologues and callee epilogues. Initially, these prologues and epilogues are assumed to determine the function bodies. Following steps perform refinement on this initial function list to extract the final list of function bodies. During this process, the heuristic needs to maintain information on the identified functions. It also needs a function mapping instruction addresses to labels and one mapping labels to instruction pointers. This information is maintained as three separate hash structures to reduce processing time. Finally, a function call graph is generated. This is used to identify procedures that can be moved to hardware. Ongoing work on hardware/software partitioning will try to automate the selection process. However, such techniques are out of the scope of this paper. They measure the success of their algorithm by the fraction of functions discovered. • First Pass: Caller Prologues and Callee Epilogues. • Second Pass: Merging Function Bodies. • Third Pass: Disjoint Set formation.
First Pass: Caller Prologues and Callee Epilogues • This pass traverses the instruction list from top to bottom while searching for caller prologues. • The purpose of this first pass is to simplify the identification process in the subsequent passes.
Address Op Operands 0x05A0 CallAddEx: 0x05A0 STW B3,*SP--[0x4] 0x05A4 NOP 2 0x05A8 STW B4,*+SP[0x2] 0x05AC STW A4,*+SP[0x1] 0x05B0 NOP 2 0x05B4 MV A4,B4 0x05B8 B B4 0x05BC LDW *+SP[0x2],A4 0x05C0 MVK 0x05cc,B3 0x05C4 MVKH 0x0000,B3 0x05C8 NOP 2 0x05CC LDW *++SP[0x4],B3 0x05D0 NOP 4 0x05D4 B B3 0x05D8 NOP 5 0x05DC add_ex: 0x05DC SUB SP,0x8,SP 0x05E0 STW A4,*+SP[0x1] 0x05E4 NOP 2 0x05E8 B B3 0x05EC ADD SP,0x8,SP 0x05F0 NOP 4 0x05F4 main: 0x05F4 STW B3,*SP--[0x2] 0x05F8 NOP 2 0x05FC ZERO B4 0x0600 CMPGT 10,B4,B0 0x0604 [!B0]B L2 0x0608 NOP 4 0x060C STW B4,*+SP[0x1] 0x0610 L1: 0x0610 B CallAddEx 0x0614 MVK 0x05dc,A4 0x0618 MVK 0x0628,B3 0x061C MVKH 0x0000,A4 0x0620 MVKH 0x0000,B3 0x0624 NOP 0x0628 RL1: 0x0628 LDW *+SP[0x1],B4 0x062C NOP 4 0x0630 ADD B4,0x1,B4 0x0634 CMPGT 10,B4,B0 0x0638 [B0] B L1 0x063C NOP 4 0x0640 STW B4,*+SP[0x1] 0x0644 L2: 0x0644 ZERO A4 0x0648 LDW *++SP[0x2],B3 0x064C NOP 4 0x0650 B B3 the call to ‘CallAddEx’ at 0x0610 and the return address are easily recognized, as are the call at 0x05B8 and the return address 0x05CC. In the latter, however, the destination in register ‘B4’ is not clear. While identifying prologues, the last calculated value within B3 is used if one is not found in the current block. In pipelined code, it is possible for other branches to break up the caller prologue to prevent the call from being made in some circumstances[15][17]. For the TI code, the return address in B3 is compared to the projected return address; the branch is identified as a prologue only on a match.
Second Pass: Merging Function Bodies • The second pass looks for return calls within function bodies. When none are found, it merges a copy of the adjacent function’s body with the function that was being processed. • This assumes that the function bodies are not scattered as fragments within the binary. Callee epilogues are used to recognize functions returns. The function returns are changed to artificial jumps to a new label, a control sink, attached to the end of the instruction list as discussed earlier. This aids the interval analysis in the third pass. The generated instruction lists are used by the next pass, which also extracts erroneously merged function bodies.
Address Op Operands 0x05A0 CallAddEx: 0x05A0 STW B3,*SP--[0x4] 0x05A4 NOP 2 0x05A8 STW B4,*+SP[0x2] 0x05AC STW A4,*+SP[0x1] 0x05B0 NOP 2 0x05B4 MV A4,B4 0x05B8 B B4 0x05BC LDW *+SP[0x2],A4 0x05C0 MVK 0x05cc,B3 0x05C4 MVKH 0x0000,B3 0x05C8 NOP 2 0x05CC LDW *++SP[0x4],B3 0x05D0 NOP 4 0x05D4 B B3 0x05D8 NOP 5 0x05DC add_ex: 0x05DC SUB SP,0x8,SP 0x05E0 STW A4,*+SP[0x1] 0x05E4 NOP 2 0x05E8 B B3 0x05EC ADD SP,0x8,SP 0x05F0 NOP 4 0x05F4 main: 0x05F4 STW B3,*SP--[0x2] 0x05F8 NOP 2 0x05FC ZERO B4 0x0600 CMPGT 10,B4,B0 0x0604 [!B0]B L2 0x0608 NOP 4 0x060C STW B4,*+SP[0x1] 0x0610 L1: 0x0610 B CallAddEx 0x0614 MVK 0x05dc,A4 0x0618 MVK 0x0628,B3 0x061C MVKH 0x0000,A4 0x0620 MVKH 0x0000,B3 0x0624 NOP 0x0628 RL1: 0x0628 LDW *+SP[0x1],B4 0x062C NOP 4 0x0630 ADD B4,0x1,B4 0x0634 CMPGT 10,B4,B0 0x0638 [B0] B L1 0x063C NOP 4 0x0640 STW B4,*+SP[0x1] 0x0644 L2: 0x0644 ZERO A4 0x0648 LDW *++SP[0x2],B3 0x064C NOP 4 0x0650 B B3 In the example code from Figure 1, there is no call to ‘main’ and ‘add_ex’ is not recognized as a call destination. Thus the second pass merges ‘add_ex’ and ‘main’ with ‘CallAddEx’. Hence, even though callee epilogues are recognized in pass two, there is no need to split the instruction list.
Third Pass: Disjoint Set formation • This part of the heuristic works on the individual function bodies one by one and tries to weed out function bodies not recognized in pass one. Particularly, the third pass traverses the list of function bodies generated by pass two and analyzes the basic blocks in each function body to recognize any possible errors from the first pass. • To perform this task, first a control and data flow graph is generated from the instruction list. Information from the first pass is used to generate some of the missing edges. The call instructions are connected to the destinations of their corresponding returns. • After this step, perform induction and interval analysis. Induction analysis attempts to identify the values contained in destination registers.
Third Pass: Disjoint Set formation Figure 2.1 shows the CDFG for the code from Figure 1 and its interval graph. This graph is generated at the third stage. Block BBLK_8 contains the control sink. The basic block 1 (BBLK_1) and basic block 2 (BBLK_2) are disjoint because of the call in line 0x05D4 of Figure 1. After interval analysis, the third pass forms the final interval graph depicted in Figure 2.2. Block BBLK_3 in Figure 2.2 contains the instruction (NOP) bearing the control sink. It denotes the leaf node. If it is removed, the other three blocks, viz. BBLK_0, BBLK_1, BBLK_2, form disjoint sets representing the functions ‘CallAddEx’, ‘add_ex’, and ‘main’, respectively.
Experimental Results Table 1 shows the extraction results. Columns 2 and 3 show the time (seconds) taken for call graph generation and function extraction, respectively. Columns 4-6 and 9 show the number of functions recognized by each stage of their algorithm and those by calling conventions alone[9] (pass 1 – pass 2). Function calls identified by pass 1 but not be assigned instruction bodies and were deleted (“Null FNs”). This was due to the incomplete nature of the selected code fragments. The total number of functions found (total FNs) is ‘Pass 3 + Null FNs’. The rightmost column of Table 1 presents a comparison of their algorithm to function extraction using only procedure calling conventions.
Related Work • There has been some related work in the field of binary translation. CodeMorphing on the Crusoe processor [12], the Dynamo system [13], and BOA [14] are good examples. Cifuentes et al [3][4] have presented a detailed analysis of different strategies. Our approach is unique in its choice of a reconfigurable target platform. It introduces flexibility and several new research questions. • Cifuentes et al [11] have reported algorithms for identifying function calls from assembly programs using predefined procedure call interfaces. An approach that is not applicable to all DSP binaries. They also discuss the use of use-def analysis for identifying function arguments. Baily and Davidson [9] introduced a formal model to specify procedure-calling conventions. Mike Van Emmerik used idioms to identify library functions [10]. The calling conventions and idioms help in identifying caller/callee prologues and epilogues. In hand-written and/or optimized assembly it is possible that the code comprising these conventions has been moved. If functions pointers are passed as arguments to other functions, it is very likely to miss the called functions completely. They also do not prove sufficient to identify complete function bodies.
References • V. Bala et al, “Dynamo: A Transparent Dynamic Optimization System,” Proc. ACM SIGPLAN Conf. On Programming Language Design and Implementation (PLDI), June 2000. • M. Gschwind et al, “Dynamic and Transparent Binary Translation,” IEEE Computer Magazine, Vol. 33, No. 3, pp. 54-59, March 2000. • GMittal et al, "Automatic Translation of Software Binaries onto FPGAs,” Proc. Design Automation Conference, San Diego, Jun. 2004. • D C Zaretsky et al, “Evaluation of Scheduling and Allocation Algorithms While Mapping Software Assembly onto FPGAs,” Proc. Great Lakes Symp. On VLSI, Apr 2004, Boston, MA, USA. • D C Zaretsky et al, “Overview of the FREEDOM Compiler for Mapping DSP Software to FPGAs”, IEEE Symposium on Field-Programmable Custom Computing Machines, April 21, 2004.