1 / 19

Analytical Evaluation of Shared-Memory Systems with Commercial Workloads

Analytical Evaluation of Shared-Memory Systems with Commercial Workloads. Jichuan Chang <chang@cs.wisc.edu>. Outline. A Case for Analytical Models Existing Models and Their Limitations What Kind of Tools do We Need. Background. Shared-memory Multiprocessors Servers

hani
Download Presentation

Analytical Evaluation of Shared-Memory Systems with Commercial Workloads

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Analytical Evaluation of Shared-Memory Systems with Commercial Workloads Jichuan Chang <chang@cs.wisc.edu> CS747

  2. Outline • A Case for Analytical Models • Existing Models and Their Limitations • What Kind of Tools do We Need CS747

  3. Background • Shared-memory Multiprocessors Servers • Important - the computing infrastructure of our society • Complex system (ILP processors + caches + interconnection) • Commercial workloads • Important - 80% server market, supporting our daily business • Different behavior from scientific workloads • Large code size and data set, different cache behaviors • Lots of OS interactions (context switches), higher I/O rate • Hard to study (complex, hard to setup, no code, moving target) CS747

  4. A Motivating Example • Bob is designing a next generation multiprocessor server for commercial workloads. Assume that the largest benchmark he can setup now is a 10G database. • How can Bob predict the performance (IPC, or tpm) of running a 100G database TPC-D benchmark on the future machine? • What’s the ideal cache hierarchy design for this workload given his prediction of future technology constants? • We need tools to characterize the workloads! • We need tools to prune the vast design space! CS747

  5. Performance Evaluation Tools • Hardware Monitors, Binary Instrumentation Tools + Realistic, dynamic information - Only work for existing systems, aggregated info • Program Analysis Tools (i.e. compilers) + Can do global analysis, works well for arrays/loops - Little dynamic info, not good for (pointer-based) irregular programs, needs source code. • (Full System) Architecture Simulators + Detailed simulation, realistic result, can simulate future HW - Slow (can’t extrapolate), complex, can’t simulate future SW • Analytical Models + Fast, gives insights, can predict for future SW/HW combinations + Need to create models of multiprocessor with new workloads CS747

  6. Sorin et al. MVA for ILP Multiprocessors • Application input parameters •  CV fM fsync-write Pread Pwrite … ... • Iterate between 2 submodels • SB (fraction of time CPU stalls due to synch operations) • MB (fraction of time CPU stalls due to limited MSHR size) • Surrogate service time inflation ILP Processor The rest of the system (Bus, NI, Switches DRAM, Directories) L1$ L2$ MSHR (when MSHR not full) CS747

  7. Sorin et al. MVA Model + Target system design, answer question like + MSHR size, directory organization, NI latencies, etc + Insight into application behavior + Miss rate (), burstiness (CV), degree of parallelism (fM) – Some app. param. (, fM,fsync-write) depend on arch. param. • Most parameters insensitive to changes outside CPU/cache • Need input parameters for each CPU/cache configuration • Caches also interact with the system design (i.e update protocol) – Fixed problem size, not characterizing the workload • Can we break the processor/cache black-box into processor and cache two submodels? • What would be the application input parameters? CS747

  8. Cache Models (1) • Stack distance model • Estimate capacity misses, based on one access trace • Work for inclusive fully-associated cache • Have extensions for direct-mapped and set-associative cache ABBACA A typical access trace CS747

  9. Cache Models (2) • Agarwal et al. 1989 • Model cache block size, working-set transitions, conflict misses and multi-programming interference • Data Reference Model (Tsai/Agarwal 1993) • Configuration independent model for Multiprocessor • problem size, # processor, block size as parameters • Model sharing pattern for each shared block • Assume certain data distribution for data-dependent applications (i.e. parallel quick-sort) • Limitation: simple and iterative program, well-known algorithm, no significant synchronization CS747

  10. Cache Models (3) • Mathematical Cache Miss Equations • Compiler generated equations for loop-based array access • Model reuse along array dimensions by “reuse vector” • Extended to model pointer data structures • Single-linked lists and binary trees on uniprocessor • Must understand malloc() implementation • Ultimate aim is to model B-tree for databases CS747

  11. Architects’ Workload Characterization • Observe for different configurations • Busy/stall time breakdown • Kernel/user time breakdown • Misses breakdown (4C) • Last touch prediction • Observe for different problem size • Working set and working set transition • Sharing degree (producer-consumer, migratory) CS747

  12. What Tools do We Need • Application models for commercial workloads • What to model? (working set, sharing, communication, etc.) • Include problem size as input parameter • Configuration independent (or less dependent) • Algorithm-based (need source code) • Or observation-based (on simulations) • Architectural Models • Separate processor core and caches • Separate CPU and the rest of the system [Sorin et al] • Model vs. Simulation • Analytical models to simplify simulator design [CAECW 01] • Simulators to ease the acquisition of model parameters CS747

  13. Configuration Independent Analysis • What to characterize? [Abandah/Davidson] • general characteristics • working set (access-age, footprint) • concurrency (serial / imbalance / contention / busy) • communication pattern (sharing degree/invalidation degree) • communication phases and locality, sharing behavior • Possible parameters for workload characterization • An Example - DSS systems working-set sizes • Application parameters (for each node i in the query plan) • Ni = # truples in a scan; Hi = probability a tuple matches • QD = depth of the query tree; • DB_REi= fraction of a relation accessed • Model the reuse after working set transitions (instructions, private, meta-data, index, tuple-locks, tuples) CS747

  14. A (simplistic?) Model for TPCC • Use stack distance curve to derive miss rates • L1 cache accesses totally overlapped with execution • M/G/1 queue to model bus/memory contention • Things not being modeled • Query algorithms • Communication misses • Overlapping between computation and memory access • The paper reports <10% errors. [Zhang et al 99] CS747

  15. Conclusion • Analytical models are needed to • Characterize commercial workloads • Predict their performance on multiprocessors • We need models that • Perform configuration independent analysis • Can use the output from workload models CS747

  16. Thank You! Questions? CS747

  17. Backup Slides • References • Acknowledgement CS747

  18. References • Cache Models • An Analytical Cache Model, Agarwal et al, ACM Transaction on Computer Systems, 1989 • Analyzing Multiprocessor Cache Behavior Through Data Reference Modeling, Tsai and Agarwal, SIGMETRICS 93 • An Analytical Model for Designing Memory Hierarchies, Jacob et al, IEEE Transaction on Computers, 1996 • Cache Miss Equations: A Compiler Framework for Analyzing and Turning Memory Behavior, Ghosh et al, ACM Transactions on Programming Languages and Systems, 1999 • A Mathematical Cache Miss Analysis for Pointer Data Structures, Zhang and Martonosi, SIAM • Commercial Workloads Overview • Trends in Shared Memory Multiprocessing, Stenstrom et al, IEEE Computer 97 • Memory System Characterization of Commercial Workloads, Barroso et al, ISCA 98 CS747

  19. Reference (cont.) • Configuration Independent Analysis • Configuration Independent Analysis for Characterizing Shared-memory Applications, Abandah and Davidson, UMich TR 1997. • Shared Memory Multiprocessor Models • Analytical Evaluation of Shared-memory Systems with ILP Processors, Sorin et al, ISCA 98 • A Customized MVA Model for Shared-memory Systems with Heterogeneous Applications, Sorin et al, UWisc TR, 2000 • Commercial Workload Specific Models • An Analytical Model of the Working-set Sizes in Decision-Support Systems, Karlsson et al, SIGMETRICS 2000 • Analysis of Commercial Workload on SMP Multiprocessors, Zhang et al, Proceedings of Performance 99 • Evaluation of Commercial Workloads • A Processor Queueing Simulation Model for Multiprocessor System Performance Analysis, Tsuei and Yamamoto, CAECW 2001 • Evaluating the Non-determinism in Commercial Workloads, Multifacet group, CAECW 2001 CS747

More Related