To Share or Not to Share?

To Share or Not to Share? Ryan Johnson Nikos Hardavellas, Ippokratis Pandis, Naju Mancheril, Stavros Harizopoulos**, Kivanc Sabirli, Anastasia Ailamaki, Babak Falsafi PARALLEL DATA LABORATORY Carnegie Mellon University **HP LABS

output output aggregate aggregate join scan scan scan Dept Student Student Motivation For Work Sharing Query: What is the average GPA in the ECE dept.? Query: What is the highest undergraduate GPA? http://www.pdl.cmu.edu/

output output aggregate aggregate join scan scan scan Dept Student Student Motivation For Work Sharing • Many queries in system • Similar requests • Redundant work • Work Sharing • Detect redundant work • Compute results once and share • Big win for I/O, uniprocessors • 2x speedup for TPC-H queries [hariz05] http://www.pdl.cmu.edu/

8 CPU core core 7x L1 L1 L2 L2 L2 Memory Memory Work Sharing on Modern Hardware 2.0 1 CPU 1.5 1.0 Speedup due to WS 0.5 0.0 0 15 30 45 Shared Queries • Work sharing can hurt performance! http://www.pdl.cmu.edu/

Contributions • Observation • Work sharing can hurt performance on parallel hardware • Analysis • Develop intuitive analytical model of work sharing • Identify trade-off between total work, critical path • Application • Model-based policy outperforms static ones by up to 6x http://www.pdl.cmu.edu/

Outline • Introduction • Part I: Intuition and Model • Part II: Analysis and Experiments http://www.pdl.cmu.edu/

Challenges of Exploiting Work Sharing • Independent execution only? • Load reduction from work sharing can be useful • Work sharing only? • Indiscriminate application can hurt performance • To share or not to share? • System and workload dependent • Adapt decisions at runtime • Must understand work sharing to exploit it fully http://www.pdl.cmu.edu/

Scan Join Aggregate Query 2 response time Query 1 response time Critical Paths Query 2 Query 1 Work Sharing vs. Parallelism P = 4.33 Independent Execution http://www.pdl.cmu.edu/

Scan Join Aggregate Query 2 response time Query 1 response time Query 2 Query 1 Work Sharing vs. Parallelism P = 4.33 P = 2.75 Critical path now longer Penalty Shared Execution • Total work and critical path both important http://www.pdl.cmu.edu/

Understanding Work Sharing • Performance depends on two factors: • Work sharing presents a trade-off • Reduces total work • Potentially lengthens critical path • Balance both factors or performance suffers http://www.pdl.cmu.edu/

Basis for a Model • “Closed” system • Consistent high load • Throughput computing • Assumed in most benchmarks • Fixed number of clients • Little’s Law governs throughput • Higher response time = lower throughput • Total work not a direct factor! • Load reduction secondary to response time http://www.pdl.cmu.edu/

Predicting Response Time • Case 1: Compute-bound • Case 2: Critical path-bound • Larger bottleneck determines response time • Model provides u and pmax http://www.pdl.cmu.edu/

Throughput for m queries and n processors Improved by work sharing Potentially worsened by work sharing An Analytical Model of Work Sharing U = requested utilization Pmax = longest pipe stage • Sharing helpful when Xshared > Xalone http://www.pdl.cmu.edu/

Outline • Introduction • Part I: Intuition and Model • Part II: Analysis and Experiments http://www.pdl.cmu.edu/

Experimental Setup • Hardware • Sun T2000 “Niagara” with 16 GB RAM • 8 cores (32 threads) • Solaris processor sets vary effective CPU count • Cordoba • Staged DBMS • Naturally exposes work sharing • Flexible work sharing policies • 1GB TPCH dataset • Fixed Client and CPU counts per run http://www.pdl.cmu.edu/

Predicted vs. Measured Performance 1.4 1 CPU 1.2 1 CPU model 1 2 CPU 2 CPU model 0.8 Speedup due to WS 8 CPU 8 CPU model 0.6 32 CPU 0.4 32 CPU model 0.2 0 0 15 30 45 Shared Queries Model Validation: TPCH Q1 • Avg/max error: 5.7% / 22% http://www.pdl.cmu.edu/

Model Validation: TPCH Q4 • Behavior varies with both system and workload http://www.pdl.cmu.edu/

Independent - 37% Serial - 4% Shared - 59% Exploring WS vs. Parallelism • Work sharing splits query into three parts • Independent work • Per-query, parallel • Total work • Serial work • Per-query, serial • Critical path • Shared work • Computed once • “Free” after first query Example: http://www.pdl.cmu.edu/

Potential Speedup Exploring WS vs. Parallelism 2.5 CPUs 2 4 8 1.5 16 Benefit from Work Sharing 32 1 0.5 0 0 8 16 24 32 • Behavior matches previously published results Shared Queries http://www.pdl.cmu.edu/

Potential Speedup Saturated Exploring WS vs. Parallelism 2.5 CPUs 2 4 8 1.5 Benefit from Work Sharing 1 0.5 0 0 8 16 24 32 Shared Queries http://www.pdl.cmu.edu/

Potential Speedup Saturated Exploring WS vs. Parallelism 2.5 CPUs 2 4 8 1.5 16 Benefit from Work Sharing 32 1 0.5 0 0 8 16 24 32 Shared Queries http://www.pdl.cmu.edu/

Potential Speedup Saturated Exploring WS vs. Parallelism 2.5 CPUs 2 4 8 1.5 16 Benefit from Work Sharing 32 1 0.5 0 0 8 16 24 32 • More processors shift bottleneck to critical path Shared Queries http://www.pdl.cmu.edu/

1% 2% 7% Performance Impact of Serial Work Impact of Serial Work (32 CPU) 2.5 0% 2 1.5 Benefit from Work Sharing 1 0.5 0 0 10 20 30 40 Shared Queries • Critical path quickly becomes major bottleneck http://www.pdl.cmu.edu/

Model-guided Work Sharing • Integrate predictive model into Cordoba • Predict benefit of work sharing for each new query • Consider multiple groups of queries at once • Shorter critical path, increased parallelism • Experimental setup • Profile run with 2 clients, 2 CPUs • Extract model parameters with profiling tools • 20 clients submit mix of TPCH Q1 and Q4 • Compare against always-, never-share policies http://www.pdl.cmu.edu/

32 CPU 250 200 150 Queries/min 100 50 0 50/50 50/50 All Q4 All Q4 All Q1 All Q1 Query Ratio Query Ratio Comparison of Work Sharing Strategies 2 CPU 200 always share model guided 150 never share 100 Queries/min 50 0 • Model-based policy balances critical path and load http://www.pdl.cmu.edu/

Related Work • Many existing work sharing schemes • Identification occurs at different stages in the query’s lifetime • All allow pipelined query execution Materialized Views [rouss82] Multiple Query Optimization [roy00] Staged DBMS [hariz05] Synchronized Scanning [lang07] Late Early Schema design Query compilation Query execution Buffer Pool Access • Model describes all types of work sharing http://www.pdl.cmu.edu/

Conclusions • Work sharing can hurt performance • Highly parallel, memory resident machines • Intuitive analytical model captures behavior • Trade-off between load reduction and critical path • Model-guided work sharing highly effective • Outperforms static policies by up to 6x http://www.cs.cmu.edu/~StagedDB/ http://www.pdl.cmu.edu/

References • [hariz05] S. Harizopoulos, V. Shkapenyuk, and A. Ailamaki. “QPipe: A Simultaneously Pipelined Relational Query Engine.” In Proc. SIGMOD, 2005. • [lang07] C. Lang, B. Bhattacharjee, T. Malkemus, S. Padmanabhan, and K. Wong. “Increasing Buffer-Locality for Multiple Relational Table Scans through Grouping and Throttling.” In Proc. ICDE, 2007. • [rouss82] N. Roussopoulos. “View Indexing in Relational databases.” In ACM TODS, 7(2):258-290, 1982. • [roy00] P. Roy, S. Seshadri, S. Sudarshan, and S. Bhobe. “Efficient and Extensible Algorithms for Multi Query Optimization.” In Proc. SIGMOD, 2000. http://www.cs.cmu.edu/~StagedDB/ http://www.pdl.cmu.edu/

To Share or Not to Share?

To Share or Not to Share?

Presentation Transcript

What do all these things have in common? Think, Pair, Share

The Case of the Three Bears’ Breakfast

Henry and Mudge and Mrs. Hopper’s House Big Question: What treasures can share with neighbors?

Let’s Talk Cost Sharing

Test Taking

Share Schemes Training Session for HMRC Shares Valuation

Test Taking

NVIDIA Quadro by PNY

The Writing Process

Primary 3 (CTR-B): Lesson 23

The Writing Process

Assumptions: Life is monophyletic Biological entities (sequences, taxa) share common ancestry

Assumptions: Life is monophyletic Biological entities (sequences, taxa) share common ancestry

ElectionAudits: a Django App for Good Election Auditing

Chapter 11

65 Interesting Ideas* for Class Blog Posts

Reading Street

… An Innovation Clearinghouse

TISSUES

Test Taking