A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses

A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses George Candea (EPFL & Aster Data) Neoklis Polyzotis (UC Santa Cruz) Radek Vingralek (Aster Data)

Highly Concurrent Data Warehouses • Data analytics is a core service of any DW. • High query concurrency is becoming important. • At the same time, customers need predictability. • Requirement of actual customer: Increasing concurrency from one query to 40 should not increase latency by more than 6x.

Shortcoming of Existing Systems • DWs employ the query-at-a-time model. • Each query executes as a separate physical plan. • Result: Concurrent plans contend for resources. • This creates a situation of “workload fear”.

Our Contribution: CJOIN • A novel physical operator for star queries. • Star queries arise frequently in ad-hoc analytics. • Main ideas: • A single physical plan for all concurrent queries. • The plan is always ``on’’. • Deep work sharing: I/O, join processing, storage.

Outline • Preliminaries • The CJOIN operator • Experimental study • Conclusions

Setting • We assume a star-schema DW. • We target the class of star queries. • Goal: Executing efficiently concurrent star queries. • Low latency. • Graceful scale-up.

Further Assumptions • Fact table is too large to fit in main memory. • Dimension tables are “small”. • Example from TPC-DS: 2.5GB of dimension data for 1TB warehouse. • Indices and materialized views may exist. • Workload is volatile.

Design Overview CJOIN Preprocessor Filter Filter Distributor Star Queries Query Stream Optimizer Other Queries Conventional Query Processor

Running Example Schema Queries select COUNT(*) from F join X join Y where φ1(X) and ψ1 (Y) Dimension X Q1 m Fact Table F Dimension Y select SUM(F.m) from F join Y where ψ2 (Y) join X and TRUE(X) Q2

The CJOIN Operator COUNT Q1 SUM Q2 Preprocessor Filter Filter Distributor Fact Table F Continuous Scan

The CJOIN Operator Q1 Q1 Q2 Q2 Hash Table X Hash Table Y Query Start COUNT 1 1 1 0 Q1 Q1: a 0 1 Q2: b * 1 1 SUM 0 1 Q2 0 0 * Preprocessor Filter Filter Distributor Fact Table F Dimension X Dimension Y Continuous Scan Q1 Q1∧ −Q2 a −Q1∧ Q2 b Q1∧ Q2

Processing Fact Tuples Q1 Q1 Q2 Q2 Hash Table X Hash Table Y Query Start COUNT 1 1 0 1 Q1 Q1: a 0 1 Q2: b * 1 0 1 0 1 1 1 1 SUM 0 1 Q2 0 0 * 1 0 Q1 Q2 Q1 Q1 Q1 Q2 Q2 Q2 1 1 Preprocessor Filter Filter Distributor Fact Table F Continuous Scan a b

Registering New Queries Q1 Q1 Q2 Q2 Hash Table X Hash Table Y Query Start Q3 COUNT Q1 Q2 Q3 1 1 0 1 Q1 Q1: a 1 1 1 0 0 1 Q2: b * 1 1 1 1 1 1 1 1 SUM 1 1 0 1 0 1 Q2 1 0 0 0 1 0 * * 1 Q1 Q2 Q1 Q1 Q1 Q2 Q2 Q2 1 1 Preprocessor Filter Filter Distributor select * from X where φ3(Χ) Fact Table F Dimension X select AVG(F.m) from F join X where φ3(X) Continuous Scan Q1 ∧ −Q3 join Y and TRUE(Y) Q3 a −Q1 ∧ Q3 b

Registering New Queries Q1 Q2 Hash Table X Hash Table Y Query Start Q3 COUNT Q1 Q2 Q3 1 0 Q1 Q1: a 1 1 1 0 Q2: b 0 1 1 0 1 1 1 1 1 1 1 SUM 1 1 0 1 0 1 Q2 0 1 1 Q3: c 1 0 0 0 1 0 * * Q3 1 AVG Q1 Q2 Q3 Q1 Q1 Q1 Q2 Q3 Q2 Q3 Q2 Q3 1 1 1 c: Begin Q3 Begin Q3 Begin Q3 Preprocessor Filter Filter Distributor Fact Table F select AVG(F.m) from F join X where φ3(X) Continuous Scan join Y and TRUE(Y) a c b

Properties of CJOIN Processing • CJOIN enables a deep form of work sharing: • Join computation. • Tuple storage. • I/O. • Computational cost per tuple is low. • Hence, CJOIN can sustain a high I/O throughput. • Predictable query latency. • Continuous scan can provide a progress indicator.

Other Details (in the paper) • Run-time optimization of Filter ordering. • Updates. • Implementation on multi-core systems. • Extensions: • Column stores. • Fact table partitioning. • Galaxy schemata. x n Preprocessor Filter Filter Filter Distributor

Experimental Methodology • Systems: • CJOIN Prototype on top of Postgres. • Postgres with shared scans enabled. • Commercial system X. • We use the Star Schema Benchmark (SSB). • Scale factor = 100 (100GB of data). • Workload comprises parameterized SSB queries. • Hardware: • Quad-core Intel Xeon. • 8GB of shared RAM. • RAID-5 array of four 15K RPM SAS disks.

Effect of Concurrency Throughput increases with more concurrent queries.

Response Time Predictability Query latency is predictable; no more workload fear.

Influence of Data Scale Concurrency level: 128 CJOIN is effective even for small data sets.

Related Work • Materialized views [R+95,HRU96]. • Multiple query Optimization [T88]. • Work Sharing. • Staged DBs [HSA05]. • Scan Sharing [F94, Z+07, Q+08]. • Aggregation [CR07]. • BLINK [R+08]. • Streaming database systems [M+02, B+04].

Conclusions • High query concurrency is crucial for DWs. • Query-at-a-time leads to poor performance. • Our solution: CJOIN. • Target: Class of star queries. • Deep work sharing: I/O, join, tuple storage. • Efficient realization on multi-core architectures. • Experiments show an order of magnitude improvement over commercial system.

http://people.epfl.ch/george.candea http://www.cs.ucsc.edu/~alkis http://www.asterdata.com THANK YOU!

A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses

A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses

Presentation Transcript

Aspen -- A language for highly concurrent network applications

Data Warehouses

Data Warehouses

XJoin: A Reactively-Scheduled Pipelined Join Operator

Data Warehouses

Highly Scalable Packetised correlators

Modeling Issues for Data Warehouses

Building Highly Scalable Websites

Data Warehouses

A Highly Scalable Perfect Hashing Algorithm

Data Warehouses

Porcupine: a highly scalable email service

DWXML A Preservation Format for Data Warehouses

Highly-Concurrent Data Structures

Actor-based Programming for Scalable Concurrent Systems

Matrix “Bit”loaded: A Scalable Lightweight Join Query Processor for RDF Data

Highly Available, Highly Scalable – Enterprise Manager 12c for Large Enterprises

Data Warehouses

Data warehouses