280 likes | 412 Views
To Share or Not to Share?. Ryan Johnson. Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki, Babak Falsafi PARALLEL DATA LABORATORY Carnegie Mellon University. **HP LABS. output. output. aggregate. aggregate. join. scan.
E N D
To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki, Babak Falsafi PARALLEL DATA LABORATORY Carnegie Mellon University **HP LABS
output output aggregate aggregate join scan scan scan Dept Student Student Motivation For Work Sharing Query: What is the average GPA in the ECE dept.? Query: What is the highest undergraduate GPA? http://www.pdl.cmu.edu/
output output aggregate aggregate join scan scan scan Dept Student Student Motivation For Work Sharing • Many queries in system • Similar requests • Redundant work • Work Sharing • Detect redundant work • Compute results once and share • Big win for I/O, uniprocessors • 2x speedup for TPC-H queries [hariz05] http://www.pdl.cmu.edu/
8 CPU core core 7x L1 L1 L2 L2 L2 Memory Memory Work Sharing on Modern Hardware 2.0 1 CPU 1.5 1.0 Speedup due to WS 0.5 0.0 0 15 30 45 Shared Queries • Work sharing can hurt performance! http://www.pdl.cmu.edu/
Contributions • Observation • Work sharing can hurt performance on parallel hardware • Analysis • Develop intuitive analytical model of work sharing • Identify trade-off between total work, critical path • Application • Model-based policy outperforms static ones by up to 6x http://www.pdl.cmu.edu/
Outline • Introduction • Part I: Intuition and Model • Part II: Analysis and Experiments http://www.pdl.cmu.edu/
Challenges of Exploiting Work Sharing • Independent execution only? • Load reduction from work sharing can be useful • Work sharing only? • Indiscriminate application can hurt performance • To share or not to share? • System and workload dependent • Adapt decisions at runtime • Must understand work sharing to exploit it fully http://www.pdl.cmu.edu/
Scan Join Aggregate Query 2 response time Query 1 response time Critical Paths Query 2 Query 1 Work Sharing vs. Parallelism P = 4.33 Independent Execution http://www.pdl.cmu.edu/
Scan Join Aggregate Query 2 response time Query 1 response time Query 2 Query 1 Work Sharing vs. Parallelism P = 4.33 P = 2.75 Critical path now longer Penalty Shared Execution • Total work and critical path both important http://www.pdl.cmu.edu/
Understanding Work Sharing • Performance depends on two factors: • Work sharing presents a trade-off • Reduces total work • Potentially lengthens critical path • Balance both factors or performance suffers http://www.pdl.cmu.edu/
Basis for a Model • “Closed” system • Consistent high load • Throughput computing • Assumed in most benchmarks • Fixed number of clients • Little’s Law governs throughput • Higher response time = lower throughput • Total work not a direct factor! • Load reduction secondary to response time http://www.pdl.cmu.edu/
Predicting Response Time • Case 1: Compute-bound • Case 2: Critical path-bound • Larger bottleneck determines response time • Model provides u and pmax http://www.pdl.cmu.edu/
Throughput for m queries and n processors Improved by work sharing Potentially worsened by work sharing An Analytical Model of Work Sharing U = requested utilization Pmax = longest pipe stage • Sharing helpful when Xshared > Xalone http://www.pdl.cmu.edu/
Outline • Introduction • Part I: Intuition and Model • Part II: Analysis and Experiments http://www.pdl.cmu.edu/
Experimental Setup • Hardware • Sun T2000 “Niagara” with 16 GB RAM • 8 cores (32 threads) • Solaris processor sets vary effective CPU count • Cordoba • Staged DBMS • Naturally exposes work sharing • Flexible work sharing policies • 1GB TPCH dataset • Fixed Client and CPU counts per run http://www.pdl.cmu.edu/
Predicted vs. Measured Performance 1.4 1 CPU 1.2 1 CPU model 1 2 CPU 2 CPU model 0.8 Speedup due to WS 8 CPU 8 CPU model 0.6 32 CPU 0.4 32 CPU model 0.2 0 0 15 30 45 Shared Queries Model Validation: TPCH Q1 • Avg/max error: 5.7% / 22% http://www.pdl.cmu.edu/
Model Validation: TPCH Q4 • Behavior varies with both system and workload http://www.pdl.cmu.edu/
Independent - 37% Serial - 4% Shared - 59% Exploring WS vs. Parallelism • Work sharing splits query into three parts • Independent work • Per-query, parallel • Total work • Serial work • Per-query, serial • Critical path • Shared work • Computed once • “Free” after first query Example: http://www.pdl.cmu.edu/
Potential Speedup Exploring WS vs. Parallelism 2.5 CPUs 2 4 8 1.5 16 Benefit from Work Sharing 32 1 0.5 0 0 8 16 24 32 • Behavior matches previously published results Shared Queries http://www.pdl.cmu.edu/
Potential Speedup Saturated Exploring WS vs. Parallelism 2.5 CPUs 2 4 8 1.5 Benefit from Work Sharing 1 0.5 0 0 8 16 24 32 Shared Queries http://www.pdl.cmu.edu/
Potential Speedup Saturated Exploring WS vs. Parallelism 2.5 CPUs 2 4 8 1.5 16 Benefit from Work Sharing 32 1 0.5 0 0 8 16 24 32 Shared Queries http://www.pdl.cmu.edu/
Potential Speedup Saturated Exploring WS vs. Parallelism 2.5 CPUs 2 4 8 1.5 16 Benefit from Work Sharing 32 1 0.5 0 0 8 16 24 32 • More processors shift bottleneck to critical path Shared Queries http://www.pdl.cmu.edu/
1% 2% 7% Performance Impact of Serial Work Impact of Serial Work (32 CPU) 2.5 0% 2 1.5 Benefit from Work Sharing 1 0.5 0 0 10 20 30 40 Shared Queries • Critical path quickly becomes major bottleneck http://www.pdl.cmu.edu/
Model-guided Work Sharing • Integrate predictive model into Cordoba • Predict benefit of work sharing for each new query • Consider multiple groups of queries at once • Shorter critical path, increased parallelism • Experimental setup • Profile run with 2 clients, 2 CPUs • Extract model parameters with profiling tools • 20 clients submit mix of TPCH Q1 and Q4 • Compare against always-, never-share policies http://www.pdl.cmu.edu/
32 CPU 250 200 150 Queries/min 100 50 0 50/50 50/50 All Q4 All Q4 All Q1 All Q1 Query Ratio Query Ratio Comparison of Work Sharing Strategies 2 CPU 200 always share model guided 150 never share 100 Queries/min 50 0 • Model-based policy balances critical path and load http://www.pdl.cmu.edu/
Related Work • Many existing work sharing schemes • Identification occurs at different stages in the query’s lifetime • All allow pipelined query execution Materialized Views [rouss82] Multiple Query Optimization [roy00] Staged DBMS [hariz05] Synchronized Scanning [lang07] Late Early Schema design Query compilation Query execution Buffer Pool Access • Model describes all types of work sharing http://www.pdl.cmu.edu/
Conclusions • Work sharing can hurt performance • Highly parallel, memory resident machines • Intuitive analytical model captures behavior • Trade-off between load reduction and critical path • Model-guided work sharing highly effective • Outperforms static policies by up to 6x http://www.cs.cmu.edu/~StagedDB/ http://www.pdl.cmu.edu/
References • [hariz05] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. “QPipe: A Simultaneously Pipelined Relational Query Engine.” In Proc. SIGMOD, 2005. • [lang07] C. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. “Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling.” In Proc. ICDE, 2007. • [rouss82] N. Roussopoulos. “View Indexing in Relational databases.” In ACM TODS, 7(2):258-290, 1982. • [roy00] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. “Efficient and Extensible Algorithms for Multi Query Optimization.” In Proc. SIGMOD, 2000. http://www.cs.cmu.edu/~StagedDB/ http://www.pdl.cmu.edu/