1 / 13

Performance Modeling

Performance Modeling. Cross-Architecture Performance Predictions for Scientific Applications Using Parameterized Models Marin and Mellor-Crummey. The Gist. Performance modeling is important (theoretical problem sizes on theoretical machines). Build a model that will: Create execution graph

robbinsr
Download Presentation

Performance Modeling

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Performance Modeling Cross-Architecture Performance Predictions for Scientific Applications Using Parameterized Models Marin and Mellor-Crummey

  2. The Gist • Performance modeling is important (theoretical problem sizes on theoretical machines) • Build a model that will: • Create execution graph • Model edge frequencies • Model node weights 5 2 2 1 5 2 7 4 5 • Use model to predict performance 2*5 + 2*1 + 5*4 + 5*1 + 2*7 = 51

  3. Why Model Performance? • Procurement: Which new machine should I buy? • Machine Design: Which machine should I produce? • Application Tuning: Why does my application scale like it does on this machine and how can I help?

  4. Static Analysis • Create a Control Flow Graph (CFG) • Identify instruction mix • Identify Dependencies • Some are easy e.g. add R1 R2 R3 add R4 R1 R2 • Some more difficult e.g. ld R1 1000(R2) sw R3 1000(R2) Solution: “Access formulas” – good idea? Dependence??

  5. Dynamic Analysis: Execution Freq. • Get edge counts with minimum overhead • Place one counter on each loop • Construct spanning tree including uninstrumentable edges • Place counters on remaining 2 2 5 2 5

  6. Modeling Execution Frequencies • How do the edge weights change with larger input parameters? • Make multiple runs and vary only one parameter at a time • Interpolate a function for each edge and parameter ?? Linear Combination Edge weight with respect to N Edge weight with respect to Z

  7. Dynamic Analysis: Memory Access Calculate reuse distance histograms • Reuse Distance – How many distinct memory addresses have been accessed since I last accessed this one? • Reuse distance distribution: E.g. - 25% of mem refs have reuse dist. = 2 12% of mem refs have reuse dist. = 3 .. Etc. • Tree structure to hold each mem. Ref • Sort key is time step of last access • Slowdown? No sampling. • Total time is O(MlogN) where M = #memRefs • Memory Requirement?

  8. Modeling Memory Access • Need to model histograms, not just avg. dist • Use same 1-variable approach as before

  9. Modeling Memory Access (cont.) • How many bins to use? • Different problem sizes have different bins but we need to normalize • Accuracy and complexity increase with bin count • For each bin, model: • number of accesses vs. problem size • avg. reuse distance vs. problem size

  10. Map Model to Target Architecture • Combine the CFG’s and path frequency information we’ve gathered • Translate code to generic RISC instructions • Generic scheduler, initialized with a machine description, predicts the runtime of this generic RISC code (difficult) • Assume all memory refs hit in cache

  11. Adding the Memory Model • Use Reuse distances to estimate cache misses (dist > # of blocks) and add miss penalties • Scheme assumes fully associative cache (doesn’t account for conflict misses)

  12. Predicting Performance

  13. The Gist – again… • Performance modeling is important (theoretical problem sizes on theoretical machines) 5 • Build a model that will: • Create execution graph • Model edge frequencies • Model node weights 2 2 1 5 2 7 4 5 • Remaining difficulties: • How to understand affect of parameters • Difficult to predict dependencies and instruction scheduling • Different compilers can cause performance variations • Cost of gathering reuse distances very high – possible for big apps? • Conflict misses not modeled • Penalty of cache misses not clear

More Related