So you think that you want to model something?

So you think that you want to model something? Blaine Gaitherblaine@acm.org

Outline Goal: • Understand the trade-offs involved in benchmarking, simulation and analytical modeling Outline • Problem definition • It takes two models to tango • ModelLevel of detail • Workload Characterization • Benchmarking • Model Validation • Queuing-based (analytic) models • Simulation models

Problem definition • What are the questions that you really want answered? • Refine • What specific information is needed to influence the decision? • What level of confidence is needed to influence others to adopt your recommendations? • When must the decision be made? • What are the time and money constraints for this study?

Problem Definition Performance Evaluation is a Decision-Making Process • Recognition of need • Problem formulation/identification • Model building • Data collection, model parameterization • Model solution • Benchmarking • Analytic • Simulation • Model validation and analysis • Results interpretation • Iteration • Decision making

It takes two models to tango Workload System The most accurate model of a system is the system itself Working with real system is often impractical Reasons why? Models abstract the real system Analytic Simulation Hybrid • A benchmark is a model of a workload • The most accurate model of the workload is the workload itself • Working with the actual workload is often impractical? • Reasons why? • Workload characterization helps abstract the important workload characteristics • Benchmarks are sometimes used to model the workload

Level of Detail Risk of “going Rainman”, Jay Veazey • Do you really need to model every detail? • Avoid model parameters that cannot be accurately measured We need to find the right level of abstraction Identify the key characteristics of the: • Workload • For OLTP it might include IO rate, instructions executed per transaction, and lock contention rate, … • For system • For OLTP it might be service rates, latencies and ability to process lock contention The rest is just a distraction

Benchmarking, Simulation and Analytical Modeling

Benchmarking types and pitfalls • Real application? • Includes all I/O? • Real inputs? • Repeatability? • Can you scale the inputs • Real hardware? • Kernel program? • May exclude I/O, loops, subroutines, … • E.g., SPEC CPU • Benchmark program? • Scaled-down application? • Does it still exercise scaling bottlenecks? • Synthetic behavior? • E.g., TPC-C uses: • Real Database code (Oracle, SQL) • Synthetic schema and data (Models hypothetical warehouse) • Synthetic workload (Models users)

Workload characterization Measure real systems to collect: • Workload parameters for your model • Critical aspects of the workload for making the decision • Examples: • Transaction types and rates • Number of users • IO rate • IO block size • Instructions executed per transaction, … • Remember we may need to scale the workload up or down for specific model scenarios • Measurement of operational variables is preferred. • Variables that can be established beyond doubt by measurement are called operationally testable. • GIGO • Data to help validate your model • Throughput • Response times • Utilizations • Queue lengths, …

Validation • Don’t just look at the predicted performance metric • Compare known (validation) cases for: • proper queue lengths, • number of visits and • utilizations on as many components as you can. Understand deviations.

Validation • Never trust model results until validated • Are the results reasonable?? • Sources of error • Wrong workload • Poor workload characterization • Missed a key aspect of the workload • Measurement error • Improperly scaled the workload for the new situation • Benchmarking • Instrumentation can perturb system (Measurement artifact) • Not really the system we want to measure! • Analytical model • (Symptom) Improper queue lengths on validation cases • Not enough detail or there are software blockages • Simulation • Programming errors? • Too much detail • A detailed model requires more workload assumptions which are subject to error • Are the random numbers really random? • Untested corner cases? • High value decisions may merit cross checking between more than one approach

Queueing-based models • Open Queueing Networks • Acyclic network of queues • Uses Markovian models, M/M/n, M/G/1 … • Closed queueing network • Mean Value analysis* *P. J. Denning and J. P. Buzen, Computing Surveys, Vol 10, No. 3, September 1978 http://cs.gmu.edu/cne/pjd/PUBS/csurv-opanal.pdf

Example of an Open Queueing Network Approach Environment. • Limited resources and short time-frames. • External chip-sets and CPUs. • Never know enough detail, soon enough. • Not time to make decisions based upon detailed simulation. • Concentrate on an accurate understanding of workloads: • TPC-C, TPC-E, TPC-H, … Three components: • Characterize memory/interconnect workloads and path-length. • Parameters include: • Memory size, cache sizes, coherency/cache/TLB behavior, instructions/trans, memory accesses per transaction. … • Use CPI models to model CPU core throughput. • Modeled at Xbar interface. • Parameters include. • CPU GHz, • Database size, path lengths, kernel vs. user, … • Use queueing models to model northbridge andchip-sets. • Parameters. • Memory organization and speed. • Link speed and configuration, ... • Coherency protocol design.

CPU Core Throughput = F(Memory Latency, Cache Size, Memory Size, Path length, …) Calculate Cycles per Instruction at memory latency Determine throughput as function of CPI, clock frequency and instruction path length/transaction

Chip-set Latency = F(Demand., …) Determine # visits to each resource, and resource utilizations for load Then, sum service times and queueing delays, that impact memory latency Dependent axis

Solve Balance Equations Solution Point Typical accuracy ± 2-3%

Simulation models • Simulation • Types of simulations for use in capacity planning • Transaction oriented • model from the point of view of the transaction visiting services • Process oriented • modeled from the point of view of either transactions or servers or both • Workload source • Trace-driven – perhaps traces of real system activity • Stochastic – Use of random number generators • Statistical tools can be used to: • Reduce the simulation time. • Confidence intervals • Determine whether a change made to a system has a statistically significant impact on performance

CSIM example MM1 queue { create("sim"); fp = fopen("mm1.out", "w"); set_output_file(fp); // construct facility q = new facility("q"); // Construct stat collection object resp_time = new box("resp time"); while(clock < simTime) { customer(); // invoke customer process hold(exponential(interArrival)); } report(); // create report } void customer() // customer process { TIME t1; create("customer"); // create customer t1 = resp_time->enter(); //Stat gather q->use(exponential(serviceTime)); //use facility for service time resp_time->exit(t1); // Stat gather } Average service time s*r/(1- r) Where r is utilization and s is service time

Texts by these authors are great • The Art of Computer Systems Performance Analysis: Techniques for Experimental Design, Measurement, Simulation,... by R. K. Jain • The Practical Performance Analyst by Neil Gunther • Performance by Design: Computer Capacity Planning By Example by Daniel A. Menasce, Lawrence W. Dowdy and Virgilio A.F. Almeida • Fundamentals of Performance Modeling by Michael K. Molloy • Getting Started: CSIM Simulation Engine (C++ Version) • Herb Schwetman, CSIM19: A POWERFUL TOOL FOR BUILDING SYSTEM MODELS , Proceedings of the 2001 Winter Simulation Conference, B. A. Peters, J. S. Smith, D. J. Medeiros, and M. W. Rohrer, eds

So you think that you want to model something?