200 likes | 390 Views
PMaCinst: A Binary Instrumentation Library for PowerPC/AIX. Mustafa M. Tikir, Michael Laurenzano , Laura Carrington and Allan Snavely. Common Uses for Program Instrumentation. Code Profiles Basic block/Instruction count Operation results Microarchitectural study Branch outcomes
E N D
PMaCinst: A Binary Instrumentation Library for PowerPC/AIX Mustafa M. Tikir, Michael Laurenzano, Laura Carrington and Allan Snavely
Common Uses for Program Instrumentation • Code Profiles • Basic block/Instruction count • Operation results • Microarchitectural study • Branch outcomes • Memory addresses • Bug checking • Memory leaks • Uninitialized data
PMaCinst Overview • Basic steps for instrumentation • Parse executable • Identify XCOFF and program objects • Modify program objects • Translate modifications to XCOFF objects • Dump XCOFF objects to an executable file • Allows tool writers to • Insert assembly code • Insert calls to shared Library • Inject user-defined data
Executable Parsing – XCOFF File Structure Mandatory Structures File Header Optional Sections Auxiliary Header Comment Section Section Headers Exception Section TEXT Section Line Info Table TypChk Section Relocation Table DATA section Debug Section TOC BSS Section Overflow Section Loader Section Pad Section Relocation Table String Table Symbol Table Symbol Table
Identifying Program Structure – Function Size • Function size is not always in symbol table • Must use traceback table marker (gdb) disass main Dump of assembler code for function main: 0x10000358 <main+0>: st r31,-4(r1) 0x1000035c <main+4>: stu r1,-40(r1) 0x10000360 <main+8>: mr r31,r1 0x10000364 <main+12>: mr r3,r0 0x10000368 <main+16>: l r1,0(r1) 0x1000036c <main+20>: l r31,-4(r1) 0x10000370 <main+24>: br 0x10000374 <main+28>: .long 0x0 0x10000378 <main+32>: .long 0x2060 0x1000037c <main+36>: l r0,1(r1) 0x10000380 <main+40>: .long 0x1c 0x10000384 <main+44>: .long 0x46d61 0x10000388 <main+48>: xoril r14,r11,7936 End of assembler dump. Code for a trivial main function compiled with gcc int main(){} traceback table begins with a null word
Identifying Program Structure – Indirect Jumps • Generating CFG when indirect jumps are present • Use compiler-specific code patterns 0x10000cf0 <dum+56>: cmpli 7,r0,11 0x10000cf4 <dum+60>: bgt 7,0x10000dec <dum+308> 0x10000cf8 <dum+64>: l r0,104(r31) 0x10000cfc <dum+68>: rlinm r9,r0,2,0,29 0x10000d00 <dum+72>: l r0,124(r2) 0x10000d04 <dum+76>: cax r9,r9,r0 0x10000d08 <dum+80>: l r9,0(r9) 0x10000d0c <dum+84>: l r0,124(r2) 0x10000d10 <dum+88>: cax r9,r9,r0 0x10000d14 <dum+92>: mtctr r9 0x10000d18 <dum+96>: bctr 0x10000d1c <dum+100>: .long 0x30 0x10000d20 <dum+104>: .long 0x48 0x10000d24 <dum+108>: .long 0xd0 0x10000d28 <dum+112>: .long 0x60 0x10000d2c <dum+116>: .long 0xd0 0x10000d30 <dum+120>: .long 0x78 0x10000d34 <dum+124>: .long 0xd0 0x10000d38 <dum+128>: .long 0x90 0x10000d3c <dum+132>: .long 0xa0 0x10000d40 <dum+136>: .long 0xd0 0x10000d44 <dum+140>: .long 0xb0 0x10000d48 <dum+144>: .long 0xc0
PMaCinst API // Step I : Executable parsing XCoffFile file(programName); file.parse(); // Step II : Code/Data Injection XCoffFileGen fileGen(file); fileGen.instrument(); // Step III: New Executable fileGen.dump(); class XCoffFileGen { virtual void selectInstrumentationPoints(char* blockFile) virtual void printInstrumentationPoints() virtual void reserveDataForInstrumentation() virtual void initializeReservedData(DataSection* dataSect,BaseGen* gen) virtual uint32_t generateSharedLibFuncWrapper(uint32_t libFuncIdx,uint64_t funcCallAddr,uint32_t genBufferOffset,BaseGen* gen) virtual uint32_t generateCodeForInst(uint32_t instPointIdx,uint64_t instStubAddress,TextSection* textSect, BaseGen* gen,uint32_t genBufferOffset) }
Code Insertion – Assembly Sequence With Instrumentation Without Instrumentation Original Code Application Code Original Code Jump to Instrmentation Relocatable Instruction Original Code Relocated Instruction Application Code Added Code Inserted Code Jump to Original Code
Code Insertion – Shared Library Function With Instrumentation Original Code Without Instrumentation Application Code Jump to Instrmentation Original Code Original Code Relocated Instruction Relocatable Instruction Pre-Call Wrapper Jump to Function Shared Library Post-Call Wrapper Application Code Shared Lib Function Jump to Original Code Jump to Post-Call Stub Added Code
Data Injection With Instrumentation Without Instrumentation Data Data Table of Contents Table of Contents User-Specified Data Unintialized Data (BSS) Unintialized Data (BSS) DATA+BSS Sections DATA+BSS Sections
Example 1: BasicBlockTracer • Print information about blocks as they execute. Basic Block n Application Code Data Registers Jump to Instrmentation Table of Contents Basic Block n (cont’d) Register Storage User-Specified Data bb[0] bb[1] . . . Relocated Instruction Save Regs Put &bb[n] into arg reg bb[n-1] bb[n] bb[n+1] bb[n+2] Jump to Function Shared Library Restore Regs Unintialized Data (BSS) Print data found in bb[n] Jump to Original Code Jump to Post-Call Stub Added Code DATA Section
Example 2: BasicBlockCounter • Count execution frequencies for all blocks Basic Block n Application Code Data Registers Jump to Instrmentation Table of Contents Basic Block n (cont’d) Register Storage User-Specified Data ctr[0] ctr[1] ctr[2] ctr[3] ++ . . . Relocated Instruction Save $ra Load ctr[n] to $ra $ra = $ra + 1 Store $ra to ctr[n] Restore $ra ctr[n-1] ctr[n] ctr[n+1] ctr[n+2] . . . Unintialized Data (BSS) Jump to Original Code Added Code DATA Section
Example 3: CacheSimulator • Use addresses from program in multiple cache simulations. Basic Block n DATA Section Application Code Data Jump to Instrmentation Registers Basic Block n (cont’d) Table of Contents Jump to Block n Register Storage User-Specified Data addr[0] addr[1] Relocated Instruction . . . Save Regs Put &addr into arg reg Put K into arg reg Buffsize = 0 Save Regs Store address to addr[buffsize] Increment buffsize Buffsize > K? Restore Regs addr[K-3] addr[K-2] addr[K-1] addr[K] 0 ++ Shared Library Jump to Function buffsize=0 buffsize=K+1 buffsize=K buffsize=K-1 Process K addresses found in addr[] Restore Regs Unintialized Data (BSS) Branch Jump to Original Code Jump to Post-Call Stub Added Code
Results • Time spent instrumenting is proportional to # of blocks and complexity of CFGs Table 1. Time spent doing instrumentation for BasicBlockCounter
Results (continued) Table 2. Slowdown ratios due to instrumentation for BasicBlockCounter and CacheSimulator
Future Work • Higher-level API • Automatically generated assembly code • Automatically generated call stubs: register saving and argument passing • Tool-maintained data buffers • Greatly reduce function call overhead • Only works for asynchronous purposes • Data flow analysis framework • Live Register Analysis • Multithreading support