760 likes | 1k Views
Profiling, Instrumentation, and Profile Based Optimization. Robert Cohn Robert.Cohn@compaq.com Mark T. Vandevoorde. Introduction. Understanding the dynamic interaction between programs and processors What do programs do? How do processors perform? How can we make it faster?. What to do?.
E N D
Profiling, Instrumentation, and Profile Based Optimization Robert Cohn Robert.Cohn@compaq.com Mark T. Vandevoorde
Introduction Understanding the dynamic interaction between programs and processors • What do programs do? • How do processors perform? • How can we make it faster? Profiling Tutorial
What to do? Build tools! • Profiling • Instrumentation • Profile based optimization Profiling Tutorial
The Big Picture Sampling Instrumentation Profiling Profile Based Optimization Analysis Modeling Profiling Tutorial
Instrumentation • User level view • Executable editing Profiling Tutorial
TOOL V V Code Instrumentation Trojan Horse • Application appears unchanged • Data collected as a side effect of execution Profiling Tutorial
Instrumentation Example if (b > c) t = 1; else b = 3; • Add extra code if (b > c) { bb[0]++; t = 1; } else { bb[1]++; b = 3; } Instrumentation Profiling Tutorial
Instrumentation Uses • Profiles • Model new hardware • What will this new branch predictor do? • What is the miss rate of this new cache? • Optimization opportunities • find unnecessary loads and stores • find divides by 1 Profiling Tutorial
What Tool Does Instrumentation? • Compiler • Compiler inserts extra operations • Requires recompile, access to source code • Executable editor • Post-link tool inserts instrumentation code • No rebuild, source code not required • More difficult to relate back to source Profiling Tutorial
Instrumentation Tools for Alpha • All executable based • General instrumentation: • Atom on Digital Unix • Distributed with Digital Unix • Ntatom on Windows NT • New! Download from web • Specialized tools based on above • hiprof, pixie, 3rd, ... Profiling Tutorial
ATOM • Tool for customized instrumentation • User writes program that describes how to instrument application • Instrumentation program applied to application, generates instrumented application • Instrumented application is run • Data is collected Profiling Tutorial
User Supplies • Instrumentation routines: user written program that inserts instrumentation • calls to analysis routines • Analysis routines: do the instrumentation work at runtime (e.g. count a basic block) Profiling Tutorial
Iterate Iterate Atom Programming Model spice libc.so libm.so main() Compute() _exit() block2 block3 block1 block5 block4 ldq r1, 8(sp) addq r1, 0x1, r2 stq r2, 8(sp) bne r1, 0x1ffc40 Profiling Tutorial
ATOM Instrumentation API: Navigation • Objects (binary, shared library) • GetFirstObj, GetNextObj • Procedures • GetFirstProc, GetNextProc • Basic blocks • GetFirstBlock, GetNextBlock • Instructions • GetFirstInst, GetNextInst Profiling Tutorial
ATOM Instrumentation API: Interrogation • GetObjInfo, GetProcInfo, GetBlockInfo, GetInstInfo • IsBranchTarget • GetInstRegUsage • InstPC • InstLineNo • ... Profiling Tutorial
ATOM Instrumentation API: Definition • AddCallProto • tells atom the types of the arguments for calls to analysis routines Profiling Tutorial
ATOM Instrumentation API: Instrumentation • AddCallProgram, AddCallObj, AddCallProc, AddCallBlock, AddCallInst, ReplaceProcedure • Insert before or after Profiling Tutorial
Arguments to analysis routines • Constants • variables in instrumentation program, but constant at instrumentation point • e.g. uninstrumented PC, function name • VALUE computed at runtime • effective address, branch taken predicate • Register • r3, arguments, return value Profiling Tutorial
Sample #1: Cache Simulator Write a tool that computes the miss rate of the application running in a 64KB, direct mapped data cache with 32 byte lines. > atom spice cache.inst.o cache.anal.o -o spice.cache > spice.cache < ref.in > ref.out > more cache.out 5,387,822,402 620,855,884 11.523% Profiling Tutorial
Reference(0(a0)) Reference (0(a0)); Cache Tool Implementation Application Instrumentation main: clr t0 loop: ldl t2,0(a0) addl t0,4,t0 addl t2,0x10,t2 stl t2,0(a0) bne t3,loop ret VALUE PrintResults(); Profiling Tutorial
Cache Analysis File #include <stdio.h> #define CACHE_SIZE 65536 #define BLOCK_SHIFT 5 long cache[CACHE_SIZE >> BLOCK_SHIFT], refs,misses; Reference(long address) { int index = address & (CACHE_SIZE-1) >> BLOCK_SHIFT; long tag = address >> BLOCK_SHIFT; if (cache[index] != tag) { misses++; cache[index] = tag ; } refs++;} Print() { FILE *file = fopen("cache.out","w"); fprintf(file,"%ld %ld %.2f\n",refs, misses, 100.0 * misses / refs); fclose(file);} Profiling Tutorial
Cache Instrumentation File #include <stdio.h> #include <cmplrs/atom.inst.h> unsigned Instrument(int argc, char **argv, Obj *o) { Inst *i;Block *b;Proc *p; AddCallProto("Reference(VALUE)"); AddCallProto("Print()"); AddCallProgram(ProgramAfter,"Print"); for (p = GetFirstProc(); p != NULL; p = GetNextProc(p)) for (b = GetFirstBlock(p); b != NULL; b = GetNextBlock(b)) for (i = GetFirstInst(b); i != NULL; i = GetNextInst(i)) if (IsInstType(i, InstTypeLoad) || IsInstType(i,InstTypeStore)) AddCallInst(i, InstBefore, "Reference", EffAddrValue); } Profiling Tutorial
Sample #2: Profiler Write a tool that outputs the address of each basic block and the number of times it is executed. vssad-27> atom a.out prof.inst.c prof.anal.c vssad-28> a.out.atom Hello world vssad-29> head prof.out 120001030 1 120001038 1 12000103c 1 120001058 33 120001064 1 Profiling Tutorial
Count(1) Init(3) Count(0) Count(2) Profiler Tool Implementation Application Instrumentation main: clr t0 loop: ldl t2,0(a0) addl t0,4,t0 addl t2,0x10,t2 stl t2,0(a0) bne t3,loop ret Constant PrintResults(addresses,3); Profiling Tutorial
Profiler: prof.anal.c #include <stdio.h> long * counts; void Init(int nblocks) { counts = (long *)malloc(nblocks * sizeof(long)); memset(counts,0,nblocks * sizeof(long));} void Count(int index){ counts[index]++; } void Print(long *blocks,int nblocks) { int i; FILE *file = fopen("prof.out","w"); for (i = 0; i < nblocks; i++) fprintf(file,"%lx %ld\n",blocks[i],counts[i]); fclose(file); } Profiling Tutorial
Profiler: prof.inst.c #include <stdio.h> #include <cmplrs/atom.inst.h> void CallInitPrint(); void Instrument(int argc, char **argv,Obj * o) { Block *b;Proc *p;int index=0; int nblocks = GetObjInfo(o,ObjNumberBlocks); long *addresses = (long *)malloc(nblocks * sizeof(long)); CallInitPrint(addresses,nblocks); for (p = GetFirstProc(); p != NULL; p = GetNextProc(p)) for (b = GetFirstBlock(p); b != NULL; b = GetNextBlock(b)) { addresses[index] = InstPC(GetFirstInst(b)); AddCallInst(GetFirstInst(b), InstBefore, "Count",index++); }} Profiling Tutorial
Profiler: prof.inst.c void CallInitPrint(long * addresses, int nblocks) { char buffer[100]; AddCallProto("Count(int)"); AddCallProto("Init(int)"); AddCallProgram(ProgramBefore,"Init",nblocks); sprintf(buffer,"Print(const stable int[%d],int)"); AddCallProto(buffer); AddCallProgram(ProgramAfter,"Print",addresses,nblocks); } Profiling Tutorial
Executable editors • Input: executable, ouput: executable • Instrument, optimize, translate • Executable = image = binary = shared library = shared object = dynamically linked library (DLL) • Executable editor, executable optimizer, binary rewriter, binary translator, post link optimizer Profiling Tutorial
Executable Editing • Insert/delete/reorder instructions and data • Obstacle to modification • Addresses are bound • Registers are bound Profiling Tutorial
lda a0,0x1000 bsr Reference Obstacles if (a) a = b; beq r1,+2 ldl r1,0x1000 • Is a0 free? • Adjust branch offsets • Adjust literal addresses Profiling Tutorial
Phases 1. Decompose 2. Build IR 3. Insert instrumentation 4. Convert IR to executable Profiling Tutorial
1. Decompose Executable Executable Header Text (code) Program code & data Data Rdata Exception Info Meta data Relocations Debug Profiling Tutorial
Decompose • Break executable into units • unit: minimum data that must be kept together • code: unit is instruction • data: unit is data section • alternative: unit is data item Profiling Tutorial
Instruction list Data sections add Data load Sdata beq MetaData Exception Info Relocations 2. Build Internal Representation Profiling Tutorial
Intermediate Representation • Similar to compiler • except unstructured, untyped data • 1 to 1 mapping for IR and machine instructions • Base representation should be compact • fit in physical memory • initial/final phases do multiple passes • Representations built/thrown away for procedures Profiling Tutorial
Bound addresses Data: 1 2 0x12345678 3 Code: br +4 ldah r0,0x1234 lda r0,0x5678(r0) Metadata: Begin: 0x12345678 End: 0x12345680 Profiling Tutorial
Adjusting addresses • No translation • Dynamic translation • Static translation Profiling Tutorial
No translation • Leave code and data at same address beq r1,L2 ldl r1,0x1234 L2: beq r1,L2 br L1 L2: ... ... L1: lda a0,0x1234 bsr Reference ldl r1,0x1234 br L2 Profiling Tutorial
Dynamic translation • Address computation is unchanged • Image has map of old->new address • Code inserted to map old->new address at runtime for load/store/branch • Better: • Do PC relative branches statically • Keep data section at original address • Still: indirect calls and jumps (not returns) Profiling Tutorial
Static translation • Address computation is altered for new layout • Find addresses • Determine what they point to: • unit, offset • Insert instrumentation • Adjust literals or offsets to compute new address of unit Profiling Tutorial
Other tools that change addresses • Linker • combine separately compiled objects • adjust addresses based on assigned load address • unit is section of object (data, text) • Loader • Load address != link address for DLL • unit is entire image • Use relocations Profiling Tutorial
Relocations Data: 1 2 0x12345678 3 No relocation required Code: br +4 ldl r1,10(gp) ldah r0,0x1234 lda r0,0x5678(r0) May require relocation Relocation example: address: 0x200 type: ldah literal object: 0x12345670 external: Requires relocation Profiling Tutorial
How to recognize addresses? • Metadata • example: procedure begin, procedure end • implicit in structure of data • Absolute addresses • example: literal address in data section • use relocations • Relative addresses: address offset • example: pc relative branch, offset for base pointer • may not need adjustment,usually no relocation Profiling Tutorial
Relative Addresses • Address computed as offset of another address • Address and Address + Offset point to same unit: ok, unit moved as a unit • Example: a->field1 ar[4] ldl r0,field1(a) ldl r0,16(ar) Profiling Tutorial
Relative Addresses • Offset spans multiple units • example: Jump table: ad = base + i jmp ad base: br l1 br l2 br l3 br l4 PC relative branch br +4 Must be 1 unit Profiling Tutorial
Map address to unit and offset Reference -> address • in code: interpret instructions br +4 ldah r0,0x1234 lda r0,0x5678(r0) • in data: data is address .data 0x12345678 Profiling Tutorial
Map address to unit and offset (relocation,address) -> (unit,offset) • to code: pointer to instruction • to data: data section and offset • alternative: data item and offset • offset = address - unit address Profiling Tutorial
3. Insert Instrumentation Instruction list add Data sections load Data load Sdata beq Ndata MetaData Exception Info Relocations Profiling Tutorial
Adding instrumentation code • Instrumentation requires free registers • wrapper routine saves and restores registers beq r1,+2 save registers lda a0,0x1000 bsr ra,wrapper restore registers ldl r1,0(r2) Save registers on stack bsr ra,Reference Restore registers return Reference • Local/global/interprocedural analysis finds free registers Profiling Tutorial
4. Convert IR to Executable Executable Header Text Program code data Data Rdata Ndata Exception Info Meta data Relocations Debug Profiling Tutorial