80 likes | 244 Views
Memory System Characterization of Commercial Workloads. Authors: Luiz André Barroso (Google, DEC; worked on Piranha) Kourosh Gharachorloo (Compaq, DEC; worked on Dash and Flash) Edouard Bugnion (one of the original founders of VMware; also worked on SimOS )
E N D
Memory System Characterization of Commercial Workloads Authors: Luiz André Barroso (Google, DEC; worked on Piranha) KouroshGharachorloo (Compaq, DEC; worked on Dash and Flash) EdouardBugnion (one of the original founders of VMware; also worked on SimOS) Presented by: David Eitel, March 31, 2010
Types of Commercial Applications • Online Transaction Processing (OLTP) • Decision Support Systems (DSS) • Web Index Search (WIS) Source: S. Brin and L. Page. “The Anatomy of a Large-Scale Hypertextual Web Search Engine.”
Benchmarks • Oracle Database Engine • TPC-B Banking Benchmark for OLTP • TPC-D Benchmark for DSS (read-only queries) • AltaVista Sources: http://georgiaconsortium.org/images/Banking-Coins.jpg, http://greencanada.files.wordpress.com/2009/04/databases.jpg, http://sixrevisions.com/web_design/popular-search-engines-in-the-90s-then-and-now/
Monitoring Results Source: Fig. 4 misses Pipeline and address translation related stalls Sum of single- and dual-issue cycles Lots of Bcache misses >75% mem stalls Scache = secondary cache Bcache = board-level cache Big CPI! Icache = instruction cache Dcache = data cache Scache = secondary cache Bcache = board-level cache Breakdown of the execution time • OLTP has more complex queries than DSS/AV • Important to have low-latency to non-primary caches because OLTP working set is very large. • Cache misses for DSS are low – misses on large database tables.
Simulation Results for OLTP Source: Fig. 5 • Idle time increases with bigger caches. • The I/O latency cannot be hidden with faster processing rates. • Faster processing rates with a more efficient memory system = more commits ready for the log writer (I/O). • OLTP benefits from larger Bcaches. INST = instruction execution CACHE = stalls within cache hierarchy MEM = memory system stalls Data capacity/ Conflict misses Associativity Cache Size
More Simulation Results (OLTP and DSS) • DSS works well with current sized caches because the working sets are small (few misses in on-chip caches) • Replacement/instr miss rate are not affected by line size good for larger cache sizes. • False sharing increases with cache line size. • What would be different if increased latency and bandwidth were accounted for when line size increases? • Are the results NOT valid because • size(database) = size(main memory)? Sources: Fig. 7 and Fig. 8
Important Things to Remember • As # processors increases, communication stalls increase (see Fig. 6) • O/S activity & I/O latencies do not greatly affect the behavior of database engines. • OLTP has instruction & data locality helped by off-chip caches • DSS and WIS have working sets that fit in memory sensitive to on-chip caches Source: http://www.stress-treatment-21.com/wp-content/uploads/2009/05/thinking-monkey.bmp
Discussion Questions • What are some new commercial applications that have developed since this paper was written? • How much have the issues in this paper been addressed in recent architecture designs? • What should we focus on in the “parallel” future to increase performance for commercial applications? • Could we change commercial workloads to function more like scientific workloads to obtain performance gains? Source: http://www.vosibilities.com/wp-content/uploads/2009/05/bpm-questions-you-should-ask-your-bpms-vendor1.jpg