1 / 75

Profiling, Instrumentation, and Profile Based Optimization

Profiling, Instrumentation, and Profile Based Optimization. Robert Cohn Robert.Cohn@compaq.com Mark T. Vandevoorde. Introduction. Understanding the dynamic interaction between programs and processors What do programs do? How do processors perform? How can we make it faster?. What to do?.

scout
Download Presentation

Profiling, Instrumentation, and Profile Based Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Profiling, Instrumentation, and Profile Based Optimization Robert Cohn Robert.Cohn@compaq.com Mark T. Vandevoorde

  2. Introduction Understanding the dynamic interaction between programs and processors • What do programs do? • How do processors perform? • How can we make it faster? Profiling Tutorial

  3. What to do? Build tools! • Profiling • Instrumentation • Profile based optimization Profiling Tutorial

  4. The Big Picture Sampling Instrumentation Profiling Profile Based Optimization Analysis Modeling Profiling Tutorial

  5. Instrumentation • User level view • Executable editing Profiling Tutorial

  6. TOOL V V Code Instrumentation Trojan Horse • Application appears unchanged • Data collected as a side effect of execution Profiling Tutorial

  7. Instrumentation Example if (b > c) t = 1; else b = 3; • Add extra code if (b > c) { bb[0]++; t = 1; } else { bb[1]++; b = 3; } Instrumentation Profiling Tutorial

  8. Instrumentation Uses • Profiles • Model new hardware • What will this new branch predictor do? • What is the miss rate of this new cache? • Optimization opportunities • find unnecessary loads and stores • find divides by 1 Profiling Tutorial

  9. What Tool Does Instrumentation? • Compiler • Compiler inserts extra operations • Requires recompile, access to source code • Executable editor • Post-link tool inserts instrumentation code • No rebuild, source code not required • More difficult to relate back to source Profiling Tutorial

  10. Instrumentation Tools for Alpha • All executable based • General instrumentation: • Atom on Digital Unix • Distributed with Digital Unix • Ntatom on Windows NT • New! Download from web • Specialized tools based on above • hiprof, pixie, 3rd, ... Profiling Tutorial

  11. ATOM • Tool for customized instrumentation • User writes program that describes how to instrument application • Instrumentation program applied to application, generates instrumented application • Instrumented application is run • Data is collected Profiling Tutorial

  12. User Supplies • Instrumentation routines: user written program that inserts instrumentation • calls to analysis routines • Analysis routines: do the instrumentation work at runtime (e.g. count a basic block) Profiling Tutorial

  13. Iterate Iterate Atom Programming Model spice libc.so libm.so main() Compute() _exit() block2 block3 block1 block5 block4 ldq r1, 8(sp) addq r1, 0x1, r2 stq r2, 8(sp) bne r1, 0x1ffc40 Profiling Tutorial

  14. ATOM Instrumentation API: Navigation • Objects (binary, shared library) • GetFirstObj, GetNextObj • Procedures • GetFirstProc, GetNextProc • Basic blocks • GetFirstBlock, GetNextBlock • Instructions • GetFirstInst, GetNextInst Profiling Tutorial

  15. ATOM Instrumentation API: Interrogation • GetObjInfo, GetProcInfo, GetBlockInfo, GetInstInfo • IsBranchTarget • GetInstRegUsage • InstPC • InstLineNo • ... Profiling Tutorial

  16. ATOM Instrumentation API: Definition • AddCallProto • tells atom the types of the arguments for calls to analysis routines Profiling Tutorial

  17. ATOM Instrumentation API: Instrumentation • AddCallProgram, AddCallObj, AddCallProc, AddCallBlock, AddCallInst, ReplaceProcedure • Insert before or after Profiling Tutorial

  18. Arguments to analysis routines • Constants • variables in instrumentation program, but constant at instrumentation point • e.g. uninstrumented PC, function name • VALUE computed at runtime • effective address, branch taken predicate • Register • r3, arguments, return value Profiling Tutorial

  19. Sample #1: Cache Simulator Write a tool that computes the miss rate of the application running in a 64KB, direct mapped data cache with 32 byte lines. > atom spice cache.inst.o cache.anal.o -o spice.cache > spice.cache < ref.in > ref.out > more cache.out 5,387,822,402 620,855,884 11.523% Profiling Tutorial

  20. Reference(0(a0)) Reference (0(a0)); Cache Tool Implementation Application Instrumentation main: clr t0 loop: ldl t2,0(a0) addl t0,4,t0 addl t2,0x10,t2 stl t2,0(a0) bne t3,loop ret VALUE PrintResults(); Profiling Tutorial

  21. Cache Analysis File #include <stdio.h> #define CACHE_SIZE 65536 #define BLOCK_SHIFT 5 long cache[CACHE_SIZE >> BLOCK_SHIFT], refs,misses; Reference(long address) { int index = address & (CACHE_SIZE-1) >> BLOCK_SHIFT; long tag = address >> BLOCK_SHIFT; if (cache[index] != tag) { misses++; cache[index] = tag ; } refs++;} Print() { FILE *file = fopen("cache.out","w"); fprintf(file,"%ld %ld %.2f\n",refs, misses, 100.0 * misses / refs); fclose(file);} Profiling Tutorial

  22. Cache Instrumentation File #include <stdio.h> #include <cmplrs/atom.inst.h> unsigned Instrument(int argc, char **argv, Obj *o) { Inst *i;Block *b;Proc *p; AddCallProto("Reference(VALUE)"); AddCallProto("Print()"); AddCallProgram(ProgramAfter,"Print"); for (p = GetFirstProc(); p != NULL; p = GetNextProc(p)) for (b = GetFirstBlock(p); b != NULL; b = GetNextBlock(b)) for (i = GetFirstInst(b); i != NULL; i = GetNextInst(i)) if (IsInstType(i, InstTypeLoad) || IsInstType(i,InstTypeStore)) AddCallInst(i, InstBefore, "Reference", EffAddrValue); } Profiling Tutorial

  23. Sample #2: Profiler Write a tool that outputs the address of each basic block and the number of times it is executed. vssad-27> atom a.out prof.inst.c prof.anal.c vssad-28> a.out.atom Hello world vssad-29> head prof.out 120001030 1 120001038 1 12000103c 1 120001058 33 120001064 1 Profiling Tutorial

  24. Count(1) Init(3) Count(0) Count(2) Profiler Tool Implementation Application Instrumentation main: clr t0 loop: ldl t2,0(a0) addl t0,4,t0 addl t2,0x10,t2 stl t2,0(a0) bne t3,loop ret Constant PrintResults(addresses,3); Profiling Tutorial

  25. Profiler: prof.anal.c #include <stdio.h> long * counts; void Init(int nblocks) { counts = (long *)malloc(nblocks * sizeof(long)); memset(counts,0,nblocks * sizeof(long));} void Count(int index){ counts[index]++; } void Print(long *blocks,int nblocks) { int i; FILE *file = fopen("prof.out","w"); for (i = 0; i < nblocks; i++) fprintf(file,"%lx %ld\n",blocks[i],counts[i]); fclose(file); } Profiling Tutorial

  26. Profiler: prof.inst.c #include <stdio.h> #include <cmplrs/atom.inst.h> void CallInitPrint(); void Instrument(int argc, char **argv,Obj * o) { Block *b;Proc *p;int index=0; int nblocks = GetObjInfo(o,ObjNumberBlocks); long *addresses = (long *)malloc(nblocks * sizeof(long)); CallInitPrint(addresses,nblocks); for (p = GetFirstProc(); p != NULL; p = GetNextProc(p)) for (b = GetFirstBlock(p); b != NULL; b = GetNextBlock(b)) { addresses[index] = InstPC(GetFirstInst(b)); AddCallInst(GetFirstInst(b), InstBefore, "Count",index++); }} Profiling Tutorial

  27. Profiler: prof.inst.c void CallInitPrint(long * addresses, int nblocks) { char buffer[100]; AddCallProto("Count(int)"); AddCallProto("Init(int)"); AddCallProgram(ProgramBefore,"Init",nblocks); sprintf(buffer,"Print(const stable int[%d],int)"); AddCallProto(buffer); AddCallProgram(ProgramAfter,"Print",addresses,nblocks); } Profiling Tutorial

  28. Executable editors • Input: executable, ouput: executable • Instrument, optimize, translate • Executable = image = binary = shared library = shared object = dynamically linked library (DLL) • Executable editor, executable optimizer, binary rewriter, binary translator, post link optimizer Profiling Tutorial

  29. Executable Editing • Insert/delete/reorder instructions and data • Obstacle to modification • Addresses are bound • Registers are bound Profiling Tutorial

  30. lda a0,0x1000 bsr Reference Obstacles if (a) a = b; beq r1,+2 ldl r1,0x1000 • Is a0 free? • Adjust branch offsets • Adjust literal addresses Profiling Tutorial

  31. Phases 1. Decompose 2. Build IR 3. Insert instrumentation 4. Convert IR to executable Profiling Tutorial

  32. 1. Decompose Executable Executable Header Text (code) Program code & data Data Rdata Exception Info Meta data Relocations Debug Profiling Tutorial

  33. Decompose • Break executable into units • unit: minimum data that must be kept together • code: unit is instruction • data: unit is data section • alternative: unit is data item Profiling Tutorial

  34. Instruction list Data sections add Data load Sdata beq MetaData Exception Info Relocations 2. Build Internal Representation Profiling Tutorial

  35. Intermediate Representation • Similar to compiler • except unstructured, untyped data • 1 to 1 mapping for IR and machine instructions • Base representation should be compact • fit in physical memory • initial/final phases do multiple passes • Representations built/thrown away for procedures Profiling Tutorial

  36. Bound addresses Data: 1 2 0x12345678 3 Code: br +4 ldah r0,0x1234 lda r0,0x5678(r0) Metadata: Begin: 0x12345678 End: 0x12345680 Profiling Tutorial

  37. Adjusting addresses • No translation • Dynamic translation • Static translation Profiling Tutorial

  38. No translation • Leave code and data at same address beq r1,L2 ldl r1,0x1234 L2: beq r1,L2 br L1 L2: ... ... L1: lda a0,0x1234 bsr Reference ldl r1,0x1234 br L2 Profiling Tutorial

  39. Dynamic translation • Address computation is unchanged • Image has map of old->new address • Code inserted to map old->new address at runtime for load/store/branch • Better: • Do PC relative branches statically • Keep data section at original address • Still: indirect calls and jumps (not returns) Profiling Tutorial

  40. Static translation • Address computation is altered for new layout • Find addresses • Determine what they point to: • unit, offset • Insert instrumentation • Adjust literals or offsets to compute new address of unit Profiling Tutorial

  41. Other tools that change addresses • Linker • combine separately compiled objects • adjust addresses based on assigned load address • unit is section of object (data, text) • Loader • Load address != link address for DLL • unit is entire image • Use relocations Profiling Tutorial

  42. Relocations Data: 1 2 0x12345678 3 No relocation required Code: br +4 ldl r1,10(gp) ldah r0,0x1234 lda r0,0x5678(r0) May require relocation Relocation example: address: 0x200 type: ldah literal object: 0x12345670 external: Requires relocation Profiling Tutorial

  43. How to recognize addresses? • Metadata • example: procedure begin, procedure end • implicit in structure of data • Absolute addresses • example: literal address in data section • use relocations • Relative addresses: address offset • example: pc relative branch, offset for base pointer • may not need adjustment,usually no relocation Profiling Tutorial

  44. Relative Addresses • Address computed as offset of another address • Address and Address + Offset point to same unit: ok, unit moved as a unit • Example: a->field1 ar[4] ldl r0,field1(a) ldl r0,16(ar) Profiling Tutorial

  45. Relative Addresses • Offset spans multiple units • example: Jump table: ad = base + i jmp ad base: br l1 br l2 br l3 br l4 PC relative branch br +4 Must be 1 unit Profiling Tutorial

  46. Map address to unit and offset Reference -> address • in code: interpret instructions br +4 ldah r0,0x1234 lda r0,0x5678(r0) • in data: data is address .data 0x12345678 Profiling Tutorial

  47. Map address to unit and offset (relocation,address) -> (unit,offset) • to code: pointer to instruction • to data: data section and offset • alternative: data item and offset • offset = address - unit address Profiling Tutorial

  48. 3. Insert Instrumentation Instruction list add Data sections load Data load Sdata beq Ndata MetaData Exception Info Relocations Profiling Tutorial

  49. Adding instrumentation code • Instrumentation requires free registers • wrapper routine saves and restores registers beq r1,+2 save registers lda a0,0x1000 bsr ra,wrapper restore registers ldl r1,0(r2) Save registers on stack bsr ra,Reference Restore registers return Reference • Local/global/interprocedural analysis finds free registers Profiling Tutorial

  50. 4. Convert IR to Executable Executable Header Text Program code data Data Rdata Ndata Exception Info Meta data Relocations Debug Profiling Tutorial

More Related