140 likes | 257 Views
Simulating a $2M Commercial Server on a $2K PC. Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill and David A. Wood IEEE Computer – November 22, 2002. Commercial Workloads. Business and Communication Infrastructure DBMS Web Servers
E N D
Simulating a $2M Commercial Server on a $2K PC Alaa R. Alameldeen, Milo M.K. Martin, Carl J. Mauer, Kevin E. Moore, Min Xu, Daniel J. Sorin, Mark D. Hill and David A. Wood IEEE Computer – November 22, 2002
Commercial Workloads • Business and Communication Infrastructure • DBMS • Web Servers • Designed to run on High End Servers • TPC-C leader • 128 Processors, 256 GB RAM, 29 TB Disk, $13M • 100M Transactions in 25min warm-up + 2h • Simulation on Standard PC • 1-2 Processors, 1GB RAM, 120GB Disk, $2K
Simulation of Commercial Workloads • Challenges • Size of Workload • Running Time • Requires Full System Simulation • Highly dependent on OS, I/O • Goals • Representative Approximation • Tractable Simulation Times • Sufficient Level of Detail
Wisconsin Commercial Workload Suite • Online Transaction Processing (OLTP) • DB2 with TPC-C like workload • SPECjbb • 3-tier Java-based Middleware • Static Web Content: Apache • SURGE generated requests • Dynamic Web Content Serving: Slashcode • OpenSource dynamic web message posting • Perl, Apache, MySQL
Workload Scaling and Tuning • Tuning of all workloads on real MP-Server • TPC-C on Sun E5000 • 12 CPUs (167Mhz), 2GB RAM • Disk images of real system used for Sim • Allows Validation of Results • Faster Benchmark Setup • Initial Setup 10 Warehouses, 100MB each • Lower Throughput than expected
TPC-C Tuning • Kernel and Database Configuration • Kernel limits on number of threads, semaphores, etc. • DB on raw disk • Multiple Disks • DB spread over 5 disks • Table Contention Reduction • More and smaller warehouses • Same total DB size • Against TPC-C rule • Size per warehouse is fixed
TPC-C Tuning • Additional Concurrency • Number of simulated clients increased from 24 to 96 • No think or keying times • Overall • Throughput increased by factor 12 • Close to published results • More representative of real OLTP workload
Workload Runtime • Simulation slow-down around 24000 • Full TPC-C run (2h real) infeasible • Long warm-up periods • Short Simulation introduces high variability
Simulation Improvements • Starting with Warm Workloads • Start from snapshot of warmed-up system • Fixed Transaction Count • Simulate fixed number of transactions • Applications must notify simulator, when transactions complete
Variability • Simulation executes 1 deterministic path • Path could favor certain configurations • Average over multiple short simulation runs • Introduce artificial variability in memory access times • Can run multiple short simulations in parallel • Preferable to one long simulation run
Timing Simulation • Complex for full system simulation • Functional Simulation with Simics • Timing Simulation with 2 additional Sims • CPU Timing • Memory Timing • Timing-First Simulation • Timing Simulator • Controls when functional simulator can advance • Solves races • Validates functional simulator • Average Timing Error < 0.001%
Conclusion • Commercial Workloads are essential for MP design • Biggest Market for MP systems • Simulation on low-cost PC is hard • Wisconsin Commercial Workload Suite approximates behaviour
Questions • If TPC-C has to run 2h for official results, how reliable is an average over a couple of seconds? • Should disk timing be simulated?