520 likes | 615 Views
Dynamic Compilation and Modification. CS 671 April 15, 2008. Compiler. High-Level Programming Languages. Machine Code. Error Messages. High-Level Programming Languages. Front End. Back End. Machine Code. So Far… Static Compilation. Digging Deeper…. Compiler. Error Messages.
E N D
Dynamic Compilation and Modification CS 671 April 15, 2008
Compiler High-Level Programming Languages Machine Code Error Messages High-Level Programming Languages Front End Back End Machine Code So Far… Static Compilation Digging Deeper… Compiler Error Messages
Alternatives to the Traditional Model • Static Compilation • All work is done “ahead-of-time” • Just-in-Time Compilation • Postpone some compilation tasks • Multiversioning and Dynamic Feedback • Include multiple options in binary • Dynamic Binary Optimization • Traditional compilation model • Executables can adapt
Move More of Compilation to Run Time • Execution environment may be quite different from the assumptions made at compile time • Dynamically loaded libraries • User inputs • Hardware configurations • Dependence on software vendors • Apps on tap • Incorporate profiling
Just-in-Time Compilation • Ship bytecodes (think IR) rather than binaries • Binaries execute on machines • Bytecodes execute on virtual machines Compiler High-Level Programming Languages Machine Code Front End Back End Error Messages
javac source bytecode java bytecode execute Just-in-Time Compilation • javacthe Java bytecode compiler • javathe Java virtual machine • Bytecode: machine independent, portable • Step One: “Compile” Circle.java • % javac Circle.java -> Circle.class • Step Two: “Execute” • % java Circle.class
Bytecodes • Each frame contains local variables and an operand stack • Instruction set • Load/store between locals and operand stack • Arithmetic on operand stack • Object creation and method invocation • Array/field accesses • Control transfers and exceptions • The type of the operand stack at each program point is known at compile time
Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b)
Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 0
Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 0 2
Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 0 42 2
Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b 7 c 0 42 2
Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 0 49 2
Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 0 98
Bytecodes (cont.) • Example: • iconst 2 • iload a • iload b • iadd • imul • istore c • Computes: c := 2 * (a + b) 42 a 7 b c 98
Executing Bytecode • java Circle.class - What happens? • Interpreting • map each bytecode to a machine code sequence, • for each bytecode, execute the sequence • Translation to machine code • map all the bytecodes to machine code (or a higher level intermediate representation) • massage them (e.g., remove redundancies) • execute the machine code
Hotspot Compilation • A hybrid approach • Initially interpret • Find the “hot” (frequently executed) methods • Translate only hot methods to machine code
MyApp JVM VM VM VM VM VM VM P III P IV 21164 21264 PA-8000 PA-7000 The Virtual Machine • An extreme version of an old idea • Previously: • Now: MyApp MyApp MyApp x86 alpha pa-risc P III P IV 21164 21264 PA-8000 PA-7000
Compile-Time Multiversioning • Multiple versions of code sections are generated at compile-time • Most appropriate variant is selected at runtime based upon characteristics of the input data and/or machine environment • Multiple variants can cause code explosion • Thus typically only a few versions are created
modified binary ???? binary Another Alternative • Modify a traditional application as it executes • Why? • Don’t have source code!
A Dynamic Optimization System? • Transforms* an application at run time • * {translate, optimize, extend} Application Transform Profile Code Cache Execute
Classification • Dynamic binary optimizers(x86 x86opt) • Complement the static compiler • User inputs, phases, DLLs, hardware features • Examples: DynamoRIO, Mojo, Strata • Dynamic translators(x86 PPC) • Convert applications to run on a new architecture • Examples: Rosetta, Transmeta CMS, DAISY • Binary instrumentation(x86 x86instr) • Inspect and/or add features to existing applications • Examples: Pin, Valgrind • JITs + adaptive systems(Java bytecode x86)
Dynamic Instrumentation Demo • Pin • Four architectures – IA32, EM64T, IPF, XScale • Four OSes – Linux, FreeBSD, MacOS, Windows
What is Instrumentation? • A technique that inserts extra code into a program to collect runtime information • Instrumentation approaches: • Source instrumentation: • Instrument source programs • Binary instrumentation: • Instrument executables directly
Why use Dynamic Instrumentation? • No need to recompile or relink • Discover code at runtime • Handle dynamically-generated code • Attach to running processes
How is Instrumentation used in PL/Compiler Research? Program analysis • Code coverage • Call-graph generation • Memory-leak detection • Instruction profiling Thread analysis • Thread profiling • Race detection
How is Instrumentation used in Computer Architecture Research? • Trace Generation • Branch Predictor and Cache Modeling • Fault Tolerance Studies • Emulating Speculation • Emulating New Instructions
Pin Features • DynamicInstrumentation: • Do not need source code, recompilation, post-linking • Programmable Instrumentation: • Provides rich APIs to write in C/C++ your own instrumentation tools (called Pintools) • Multiplatform: • Supports x86, x86-64, Itanium, Xscale • Supports Linux, Windows, MacOS • Robust: • Instruments real-life applications: Database, web browsers, … • Instruments multithreaded applications • Supports signals • Efficient: • Applies compiler optimizations on instrumentation code
Using Pin • Launch and instrument an application $ pin –t pintool –- application Instrumentation engine (provided in the kit) Instrumentation tool (write your own, or use one provided in the kit) • Attach to and instrument an application $ pin –t pintool –pid 1234
Pin Instrumentation APIs • Basic APIs are architecture independent: • Provide common functionalities like determining: • Control-flow changes • Memory accesses • Architecture-specific APIs • e.g., Info about segmentation registers on IA32 • Call-based APIs: • Instrumentation routines • Analysis routines
Instrumentation vs. Analysis • Instrumentation routinesdefine where instrumentation is inserted • e.g., before instruction C Occurs first time an instruction is executed • Analysis routinesdefine what to do when instrumentation is activated • e.g., increment counter C Occurs every time an instruction is executed
counter++; counter++; counter++; counter++; counter++; Pintool 1: Instruction Count • sub $0xff, %edx • cmp %esi, %edx • jle <L1> • mov $0x1, %edi • add $0x10, %eax
Pintool 1: Instruction Count Output $ /bin/lsMakefile imageload.out itrace proccount imageload inscount0 atrace itrace.out $ pin -t inscount0.so -- /bin/ls Makefile imageload.out itrace proccount imageload inscount0 atrace itrace.out • Count 422838
ManualExamples/inscount0.cpp #include <iostream> #include "pin.h" UINT64 icount = 0; void docount() { icount++; } void Instruction(INS ins, void *v) { INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)docount, IARG_END); } void Fini(INT32 code, void *v) { std::cerr << "Count " << icount << endl; } int main(int argc, char * argv[]) { PIN_Init(argc, argv); INS_AddInstrumentFunction(Instruction, 0); PIN_AddFiniFunction(Fini, 0); PIN_StartProgram(); return 0; } analysis routine instrumentation routine
Print(ip); Print(ip); Print(ip); Print(ip); Print(ip); Pintool 2: Instruction Trace • sub $0xff, %edx • cmp %esi, %edx • jle <L1> • mov $0x1, %edi • add $0x10, %eax Need to pass ip argument to the analysis routine (printip())
Pintool 2: Instruction Trace Output $ pin -t itrace.so -- /bin/lsMakefile imageload.out itrace proccount imageload inscount0 atrace itrace.out • $ head -4 itrace.out 0x40001e90 0x40001e91 0x40001ee4 0x40001ee5
ManualExamples/itrace.cpp argument to analysis routine • #include <stdio.h> • #include "pin.H" • FILE * trace; • void printip(void *ip) { fprintf(trace, "%p\n", ip); } • void Instruction(INS ins, void *v) { • INS_InsertCall(ins, IPOINT_BEFORE, (AFUNPTR)printip, IARG_INST_PTR, IARG_END); • } • void Fini(INT32 code, void *v) { fclose(trace); } • int main(int argc, char * argv[]) { • trace = fopen("itrace.out", "w"); • PIN_Init(argc, argv); • INS_AddInstrumentFunction(Instruction, 0); • PIN_AddFiniFunction(Fini, 0); • PIN_StartProgram(); • return 0; • } analysis routine instrumentation routine
Examples of Arguments to Analysis Routine • IARG_INST_PTR • Instruction pointer (program counter) value • IARG_UINT32 <value> • An integer value • IARG_REG_VALUE <register name> • Value of the register specified • IARG_BRANCH_TARGET_ADDR • Target address of the branch instrumented • IARG_MEMORY_READ_EA • Effective address of a memory read And many more … (refer to the Pin manual for details)
cmp %esi, %edx jle <L1> mov $0x1, %edi count() count() count() <L1>: mov $0x8,%edi Instrumentation Points • Instrument points relative to an instruction: • Before (IPOINT_BEFORE) • After: • Fall-through edge (IPOINT_AFTER) • Taken edge (IPOINT_TAKEN_BRANCH)
Instrumentation Granularity • Instruction • Basic block • A sequence of instructions terminated at a control-flow changing instruction • Single entry, single exit • Trace • A sequence of basic blocks terminated at an unconditional control-flow changing instruction • Single entry, multiple exits Instrumentation can be done at three different granularities: sub $0xff, %edx cmp %esi, %edx jle <L1> mov $0x1, %edi add $0x10, %eax jmp <L2> 1 Trace, 2 BBs, 6 insts
Pintool 3: Faster Instruction Count counter += 3 sub $0xff, %edx cmp %esi, %edx jle <L1> mov $0x1, %edi add $0x10, %eax basic blocks (bbl) counter += 2
Modifying Program Behavior • Pin allows you not only to observe but also change program behavior • Ways to change program behavior: • Add/delete instructions • Change register values • Change memory values • Change control flow
Application Operating System Hardware Pin’s Software Architecture Address space Pintool Pin Instrumentation APIs Virtual Machine (VM) Code Cache JIT Compiler Emulation Unit
1’ 1 3 2 2’ 4 5 7’ 6 7 Dynamic Instrumentation Original code Code cache Exits point back to Pin Pin Pin fetches trace starting block 1 and start instrumentation
1’ 2’ 7’ Dynamic Instrumentation Original code Code cache 1 3 2 4 5 6 7 Pin Pin transfers control into code cache (block 1)
1’ 3’ 1 3 2 5’ 2’ 4 5 6’ 7’ 6 7 Dynamic Instrumentation Original code Code cache trace linking Pin Pin fetches and instrument a new trace
Implementation Challenges • Linking • Straightforward for direct branches • Tricky for indirects, invalidations • Re-allocating registers • Maintaining transparency • Self-modifying code • Supporting MT applications…
Redirect all other pthreads function calls to application’s libpthread set up signal handlers System’s libpthread signal handler Pintool Pin’s mini-libpthread signal handler Pin’s Multithreading Support • Thread-safe accesses Pin, Pintool, and App • Pin: One thread in the VM at a time • Pintool: Locks, ThreadID, event notification • App: Thread-local spill area • Providing pthreadsfunctions to instrumentation tools Application
Pin Overhead • SPEC Integer 2006