1 / 51

Update on HP Caliper, the Performance Tool for Itanium ® HP-UX and Linux Systems

Update on HP Caliper, the Performance Tool for Itanium ® HP-UX and Linux Systems. September 2006 Speaker: Stephen Williams Caliper Development Team Hewlett-Packard. Previous webcasts. An introduction to HP Caliper, what it is, and how to use it.

zeke
Download Presentation

Update on HP Caliper, the Performance Tool for Itanium ® HP-UX and Linux Systems

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Update on HP Caliper, the Performance Tool for Itanium® HP-UX and Linux Systems September 2006 Speaker: Stephen Williams Caliper Development Team Hewlett-Packard

  2. Previous webcasts • An introduction to HP Caliper, what it is, and how to use it. Webcast: September 9, 2003 Slides: http://h21007.www2.hp.com/dspp/files/unprotected/caliper/HPCaliper090903_ppt.ppt • An update on HP Caliper for HP-UX and Linux Itanium. Webcast: September 21, 2004 Slides: http://h21007.www2.hp.com/dspp/files/unprotected/caliper/Caliper36_092104.ppt • Yet more HP Caliper: an update on the Itanium HP-UX and Linux Performance Tool Webcast September 20, 2005 Slides: http://h21007.www2.hp.com/dspp/files/unprotected/caliper/Caliper050920.ppt

  3. Agenda • Quick overview of HP Caliper • New features in HP Caliper 3.9, 4.0, and 4.1 • Future directions • Hints and tips • Summary • DSPP information • Q & A

  4. What is HP Caliper? • Per-process or system-wide performance measurement tool, for any Itanium®/Itanium®2 native applications • For both HP-UX and Linux Integrity servers • “Swiss army knife” • Many different measurements • Common user interface and options • Multiple report formats – text, CSV, HTML • Graphical user interface (new at 4.0) • Uses Performance Monitor Unit (PMU) hardware and dynamic instrumentation as needed

  5. Example command lines caliper [measurement] [options] application [ app-opts ] caliper [measurement] [options] PID1 [PID2 …] caliper[measurement] [options] -w Examples: caliper fprof --html dir_name sweep3d caliper dcache –t –p all cc himom.c caliper cpu -w -o out.txt --dur 10 caliper scgprof –p myproc myscript.sh caliper icache –o out.txt 8451 8452 8453

  6. Measurements Used for: What? Where? Details? (instrumented) Overview: cpu, ecount Profiles: alat,branch, dcache,dtlb,fprof, icache,itlb, cycles Traces: pmu_trace Call graph: scgprof, cgprof* Coverage: fcover* Counts: acount*, fcount* * not in Linux version

  7. New features since HP Caliper 3.9 • Improved command line usability • Quick Start reference card • Improved reports for multi-process applications • New ‘cycles’ measurement (dual-core Itanium 2 only) • Richer sets of PMU events (dual-core Itanium 2 only) • System-wide measurements • Graphical user interface

  8. Improved command line usability • scgprof now the default measurement: $ caliper myprog collect scgprof data on myprog • -a no longer required for attaching to processes: $ caliper 1234collect scgprof data on process 1234 • Re-reporting of last recorded data is simple: $ caliper report [options] • Reporting from an HP Caliper database simplified: • $ caliper mydb.db • New default: report to down to source—but not instruction—level (use -r all to get disassembly) • New default: --process all (-p all)

  9. Improved command line usability(short options) More short options added. Here is the complete list: Short Form Long Form -d --database -e (for elapsed time) --duration -f --options-file -H (long form help) --help -m --metrics -o --output-file -p --process -r --report-details -s --sampling-spec -t --threads all -v --version -w --scope system,attr_mod -h or -? (short form help)no equivalent

  10. Improved command line usability(short measurement names) Measurement names have been shortened: New Name Old Name alat alat_miss acount arc_count branch branch_prediction cpu cpu_metrics dcache dcache_miss dtlb dtlb_miss fcount func_count fcover func_cover icache icache_miss itlb itlb_miss ecount total_cpu

  11. Improved command line usability(simplified merge and diff syntax) • --join deprecated. Instead, use: $ caliper merge -o out.txt db1 [db2 . . .] $ caliper diff -o out.txt db1 db2 • Note that you can merge per-process data in a single database: • $ caliper merge -o out.txt mydb

  12. Quick Start reference card http://h21007.www2.hp.com/dspp/files/unprotected/caliper/caliper-quick-start.pdf

  13. Quick Start reference card (back side)

  14. Improved reports for multi-process applications • Caliper can now report: • Across-process CPU events • Histograms of processes and associated metrics: $ caliper report -o out.txt mydb • Histograms of executables and associated metrics: $ caliper merge -o out.txt mydb • Use --process-cutoffto change the number of processes or executables reported in the process or executable histogram.

  15. Improved reports for multi-process applications (cont.) Example of a merged process (executable) summary: Process Summary ------------------------------------------- % Total Cumulat IP % of IP Samples Total Samples Process ------------------------------------------- 67.86 67.86 1797 be (1 instances) 20.17 88.03 534 ecom (1 instances) 5.25 93.28 139 u2comp (1 instances) 4.83 98.11 128 ld (1 instances) 0.72 98.83 19 sh (4 instances) ------------------------------------------- [Minimum process entries: 5, percent cutoff: 2.00, cumulative percent cutoff: 100.00] -------------------------------------------

  16. New measurement: cycles • On dual-core Itanium 2 systems, HP Caliper can now report average cycles per bundle: $ caliper cycles -o out.txt -r all myprog • Resulting report resembles an fprof report (showing IP sample hits), but provides the following additional information at disassemby level: • Average cycles used to retire bundles. (With no stalls, bundle should be retired in one cycle.) • Instructions that were split issued (i.e., instructions not issued at the same time as the instruction that precedes them).

  17. Richer PMU events sets On dual-core Itanium 2 systems, HP Caliper now reports many more PMU events (and derivations) in one run. An example from an IP Sample (fprof) report: Metrics Summed for Entire Run -------------------------------------------------------- PLM Event Name U..K TH AC AT Count -------------------------------------------------------- BE_L1D_FPU_BUBBLE.ALL x___ 0 T F 175989 BE_RSE_BUBBLE.ALL x___ 0 T F 3250 BE_FLUSH_BUBBLE.ALL x___ 0 T F 33615 BACK_END_BUBBLE.FE x___ 0 F F 1208011 CPU_OP_CYCLES.ALL x___ 0 T F 752736219 BE_EXE_BUBBLE.ALL x___ 0 F F 209463 BE_L1D_FPU_BUBBLE.L1D x___ 0 T F 175989 BE_EXE_BUBBLE.GRALL x___ 0 F F 199727 BE_EXE_BUBBLE.FRALL x___ 0 F F 8014 BE_EXE_BUBBLE.GRGR x___ 0 F F 67 CPU_CPL_CHANGES.ALL x___ 0 F F 1731 --------------------------------------------------------

  18. Richer PMU events sets (cont.) % Unstalled execution (higher is better): 47.44 = % Unstalled execution % of Cycles lost due to Front end stalls (lower is better): 6.43 = % stalls due to ICACHE, ITLB and branch execution % of Cycles lost due to Pipeline flush stalls (lower is better): 9.23 = % stalls due to branch misprediction or interruption flush % of Cycles lost due to data access stalls (lower is better): 33.23 = % stalls due to DCACHE and DTLB (includes FR/FR stalls) % of Cycles lost due to RSE stalls (lower is better): 1.45 = % stalls due to RSE spilling/filling registers to/from memory % of Cycles lost due to Scoreboard stalls (lower is better): 2.22 = % stalls due to FPU and register dependency (excludes FR/FR stalls) Number of privilege level changes to/from all privileges: 73385 = CPU_CPL_CHANGES.ALL % of Cycles lost due to Front end stalls: 6.43 = 100 * (BACK_END_BUBBLE.FE / CPU_OP_CYCLES.ALL) % of Cycles lost due to Pipeline flush stalls: 9.23 = 100 * (BE_FLUSH_BUBBLE.ALL / CPU_OP_CYCLES.ALL) % of Cycles lost due to data access stalls (includes FR/FR stalls): 33.23 = % register load stalls (includes FR/FR) + % stalls due to L1D % of Cycles lost due to RSE stalls: 1.45 = 100 * (BE_RSE_BUBBLE.ALL / CPU_OP_CYCLES.ALL) % of Cycles lost due to Scoreboard stalls (excludes FR/FR stalls): 2.22 = % stalls due to FPU + % register dependency stalls % of Cycles lost due to register load stalls (includes FR/FR stalls): 26.81 = % GR/load dependency stalls + % FR/load or FR/FR dependency stalls % of Cycles lost due to FR/load or FR/FR dependency stalls: 0.20 = 100 * BE_EXE_BUBBLE.FRALL / CPU_OP_CYCLES.ALL % of Cycles lost due to GR/load dependency stalls: 26.61 = 100 * (BE_EXE_BUBBLE.GRALL - BE_EXE_BUBBLE.GRGR) / CPU_OP_CYCLES.ALL % of Cycles lost due to stalls in L1D cache and L1/L2 DTLB: 6.42 = 100 * (BE_L1D_FPU_BUBBLE.L1D / CPU_OP_CYCLES.ALL) % of Cycles lost due to register dependency stalls (excludes FR/FR stalls): 2.22 = (100 * BE_EXE_BUBBLE.ALL / CPU_OP_CYCLES.ALL) - % register load stalls % of Cycles lost due to GR/GR dependency stalls: 2.14 = 100 * BE_EXE_BUBBLE.GRGR / CPU_OP_CYCLES.ALL

  19. System-wide measurements • Most measurements can now be made system-wide—across all processes and CPUs in both user and kernel space. • Three levels of sample attribution: --scope system[,attr-mod|attr-proc|attr-none] • -w equivalent to: --scope system,attr-mod • PLM:--event-defaults user|kernel|all • Sample command (collect IP samples in both kernel and user space for 20 seconds): $ caliper fprof –o o.txt --ev all –w –e 20

  20. System-wide measurements (cont.) • Limitations on HP-UX: • You must be logged in as theroot user • Caliper may not be able to locate some executables and shared libraries, resulting in many “unattributed” samples. Workaround: use--module-search-path • Limitations on Linux: • You cannot exclude idle time and the caliper process (though we hope to provide this feature in the future). • Limitations on both HP-UX and Linux: • While caliper runs in system-wide mode, no other caliper process can be run on the same system.

  21. New graphical user interface • An Eclipse RCP application • Makes it easy to: • Perform measurement collections • Browse Caliper databases • See measurement data, with easy drill down • Can be run on remote Integrity server, with display shown on your desktop X server (not recommended on wide-area network) via: $ caliper -g • Can be run locally on a Windows or Linux x86-based system (local GUI client communicates with Caliper server via ssh or rexec)

  22. New graphical user interface(Projects view and Collect view) Saved collection setup Start process System wide Attach process Previously collected data Start data collection Required fields and tabs in red Only applicable collection tabs enabled

  23. New graphical user interface(Measurement tab of Collect view) Data cache misses selected Stop data collection Collection in progress

  24. New graphical user interface(viewing data) Analyze view Saved collection specification Process tree tab opened Available data sets Application output

  25. New graphical user interface(CPU event counts) Show data for entire application Show CPU events tab

  26. New graphical user interface(metrics derived from CPU events) CPU events tab scrolled to show derived metrics

  27. New graphical user interface(histogram viewer) Maximize or minimize by double-clicking Analyze view tab Hottest process (double-click to drill down) Overview of entire histogram Percent of application’s total misses in process be

  28. New graphical user interface(drill down to functions) Use stacking bars Popups for long function names Show ‘local’ percents (percent of total for be) DagNode::dagConstMarkPredArc(DagNode *, DagNode *, Dag*) Area viewed in table highlighted in Overview Previous levels visited

  29. New graphical user interface(drill down to disassembly) Show: Source Source/disasm Sorted by address Disassembly Click to show hotspots in table

  30. New graphical user interface(sorting) Sort bundles by misses

  31. New graphical user interface(call graph viewer) Multiple Analyze views allowed Callees visited Current function Callers Callees

  32. Future directions • Expected new features at HP Caliper 4.2 (January 07): • Load module-centric reports (e.g., across process profile of libc.so) • Call stack profiling (with wall-clock sampling) • Bucketing of data cache miss latencies (to help ascertain cache levels accessed) • Trap profiling • Merge/diff capability in graphical user interface • Caliper Advisor integrated with graphical user interface • Features beyond HP Caliper 4.2: • Caliper Advisor cheatsheets in graphical user interface • Data-centric cache miss reports • Integration with Ktrace/Kprofile • More data visualization aides in graphical user interface • Per-CPU/per-thread CPU metrics

  33. Load modules as top level (v4.2) View load modules as top level

  34. Call-stack profile (v4.2) Graph hot call paths by running time, blocked time, or both

  35. CPU metrics overview (v4.2) Overview of metrics collected by cpu measurement (default metrics)

  36. Call-stack samples display (potential future display) Overview of running and stopped threads Sample cursor (drag to any point) Call stacks at sample 754 “Playback” controls

  37. Data-centric cache miss profile display(potential future display) Double-click row to see function’s disassembly Double-click row (below) to view instruction addresses (above Double-click row (below) to view data addresses (above

  38. 3D histograms (potential future display) Figure from CxPerf User’s Guide

  39. Hints and tips: caliper command • Getting CPU event names from caliper: • Dump all events names and descriptions: $ caliper info all • List all event names (no other fields): $ caliper info all –d name • List names of all events containing string “L3”: $ caliper info L3 –d name • Or, use an ambiguous event name: $ caliper ecount –metric L3_READ myprog HP Caliper: usage error: Ambiguous event name ("L3_READ") specified for "--metrics". Matches L3_READS.ALL.ALL, L3_READS.ALL.HIT, L3_READS.ALL.MISS, L3_READS.DATA_READ.ALL, L3_READS.DATA_READ.HIT, L3_READS.DATA_READ.MISS, L3_READS.DINST_FETCH.ALL, L3_READS.DINST_FETCH.HIT, L3_READS.DINST_FETCH.MISS, L3_READS.INST_FETCH.ALL, L3_READS.INST_FETCH.HIT, L3_READS.INST_FETCH.MISS.

  40. Hints and tips:caliper command (cont.) • Getting report help: • Dump help file for cycles measurement: $ caliper info –r cycles • Append help to a report: $ caliper cycles --info –o out.txt myprog • Providing command options using a file: • $ caliper fprof –f myOptionsFile • Helping Caliper find: • Source code: --source-path-map dir|map[:dir|map:…]* • Symbols and disassembly: --module-search-path dir[:dir:…] * Where map == old_path,new_path

  41. Hints and tips: using views Close Restore views Minimize Restore default locations Maximize Local view menu Common view menu (right-click on tab) Detached view (not suported by Motif)

  42. Summary • Itanium execution performance tool • Measures production applications • Measures entire system • Wide range of performance metrics available • Explore performance data using textual or graphical reports • Help available from caliper-help@cup.hp.com • Available on HP-UX and Linux http://www.hp.com/go/caliper

  43. DSPP Tools & Resources for Itanium®2 Architecture Set You Up for Success Community • Itanium® architecture forums, source code repository, document sharing and mailing lists Training and Education • online and classroom training News & Events Software • development environments, compilers, operating systems, installation/configuration tools, performance tools and more Technical documentation • white papers, tutorials, references documents and manuals, FAQ’s, known problems, sample code, etc. Partner Resources • webconferencing services • podcast production services • trade show discounts Equipment • rentals and purchase discounts

  44. Where to go … Software Developer Resource Kit for the Intel® Itanium®2 microarchitecture: www.hp.com/go/hpitaniumdvd Development and Business Resources from HP & Intel for HP Integrity-based solutions: www.hp.com/go/dspp-eap Contact points for additional information: Americas email: dspp.dev@hp.com telephone 1.800.249.3294 Europe email: dspp.emea@hp.com telephone 800.100.929.70 Asia-Pac email: hpdev.support@hp.com or go to www.hp.com/go/dspp for local country phone numbers

  45. Complete Survey to Win HP & Intel are giving away an HP laptop to 1(one) lucky winner!! • Promotion Period ends November 19, 2006 • Attend a webcast ANDcomplete the post-event survey. • Full promotion details can be found on DSPP at: http://h21007.www2.hp.com/dspp/bus/bus_BusDetailPage_IDX/1,1252,9284,00.html

  46. More Events • Tuesday, October 24 – New Dual-Core Processor and Server Hardware • Tuesday, November 28 – Open MP • Tuesday, December 19 – HP-MPI • Sign up for the DSPP newsletter to get the latest webcast information sent to you directly. • Webcast replays may also be found at: www.hp.com/go/itaniumwebcasts • Did you know...that your company can use this same webconferencing tool – at a discounted price - to promote your HP Integrity solutions to your staff and customers? For members only... • http://h21007.www2.hp.com/dspp/bus/bus_BusDetailPage_IDX/1,,9173!0!,00.html

  47. Intel® Early Access Program - Technology The Early Access Program (EAP) gives you access to Intel® technology to support your current development cycle as well as early access to tools and information on new technologies. Your membership includes: • Early access to pre-release software development platforms • Access to Intel and 3rd party software and testing tools • Training through Intel® Software College and Web events • Technical content and how–to articles • Protected remote access to easily evaluate and develop software safely and securely on platforms over the Internet

  48. Intel® Early Access Program -Marketing Opportunities and Support • Extensive marketing and business development opportunities: • Inclusion in online and print versions of the Intel® Developer Solutions Catalog • Intel quotes to support your PR • Case studies • Access to Intel’s event marketing asset kit • Participation in selected industry events and trade shows • Support in your development efforts provided through: • Access to an Intel Account Representative who will act as your primary contact • Intel® Premier Support for confidential technical support • 24/7 online support via www.intel.com/software/support

  49. Related Intel® Resources • Intel® Early Access Program • http://www.intel.com/software/EAP • Intel® Software Network • http://www.intel.com/software • Intel® Software College • http://www.intel.com/software/college • Intel® Software Development Tools • http://www.intel.com/software/products • Experience Intel® Itanium® 2 Architecture • http://www.intel.com/cd/ids/developer/asmo-na/eng/66176.htm

  50. Q&A Session: To ask a question over the phone, press *1 on your touch-tone telephone.

More Related