210 likes | 388 Views
Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. C.K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, K. Hazelwood Presented by: Michael Laurenzano. What is Program Instrumentation?.
E N D
Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation C.K. Luk, R. Cohn, R. Muth, H. Patil, A. Klauser, G. Lowney, S. Wallace, V.J. Reddi, K. Hazelwood Presented by: Michael Laurenzano
What is Program Instrumentation? • Inserting extra code into an application to observe its behavior • Example: Cache Simulation for (int i = 0; i < LENGTH; i++) { CacheSim(&A[i]); A[i] = (double)i; CacheSim(&B[i]); B[i] = (double)i; CacheSim(&C[i]); C[i] = (double)i; }
Uses of Program Instrumentation • Code Profiles • Basic block/Instruction count • Operation results • Microarchitectural study • Branch outcomes • Memory addresses • Bug checking • Memory leaks • Uninitialized data
Pin System Layout The code being analyzed
Pin System Layout Tells us where and how to perform analysis The code being analyzed
Pin System Layout Tells us where and how to perform analysis Combines application and pintool code to create instrumented code The code being analyzed
Pin System Layout Tells us where and how to perform analysis Combines application and pintool code to create instrumented code Stores the Instrumented code created by the JIT The code being analyzed
Pin System Layout Tells us where and how to perform analysis Combines application and pintool code to create instrumented code Stores the Instrumented code created by the JIT The code being analyzed Controls execution, maintains data structures, tracks program state
Simplified Instrumentation • Transfer control to VM at an application control transfer • Look for instrumented version of branch target in code cache • If found: execute instrumented code • If not: compile the code, insert into code cache, execute new code • Repeat
Trace Linking • Transfer control directly between traces • Branch target must be known statically • Target trace must be present in code cache Regular Execution Pin w/o Trace Linking Pin w/ Trace Linking Trace 1 Sequence 1 Trace 1 Virtual Machine Sequence 2 Trace 2 Trace 2
Trace Linking (Indirect) • “Unknown” targets are usually somewhat predictable • Function typically returns to a few locations (few call sites) • Indirect Jump usually goes to a few locations • Try several predicted targets to see if we can avoid VM intervention • Short target lists are maintained for each indirect branch • If we exhaust this list, use the VM
Function Cloning • Most common indirect control transfer is a function return • Create a function instance for each call site • Return address is then unique and known for each function instance • Turns this indirect control transfer into a direct control transfer • Code bloat! • Implemented by keeping a call stack for each instrumented instruction sequence • Keep last 4 in call stack • Call stack represented as a 64-bit integer
Register Bindings • Register re-allocation occurs so that Pin can use registers • The register bindings can be different from one trace to the next • When compiling, keep register bindings from the previous trace if possible • When linking traces, modify the register bindings before going to the next trace • Usually only a few registers are mismatched in practice
Optimization – Inlined Analysis Routines Without Inlining With Inlining Application Application Bridge Code Bridge Routine Analysis Code Analysis Routine Bridge Code Application Bridge Routine - 2 fewer calls and 2 fewer returns Application • Other optimizations: constant • folding, code relocation
Optimization – eflags Register Liveness • The x86 eflags register is treated as a bit-vector containing state information • This register can be modified as a side-effect of some instructions • eflags might not be live when we reach analysis routine • If this is the case, we do not need to save/restore it
Optimization – Call Scheduling • User can specify that the routine be put anywhere in the particular scope • Anywhere in instruction, basic block, function, program, etc. • Pin can schedule the call according to best performance • Perhaps at a point where few registers need to be saved • How well will this actually work?