280 likes | 526 Views
Query Task Model (QTM): Modeling Query Execution with Tasks. Steffen Zeuch and Johann-Christoph Freytag. Motivation. Different DBMS execute the same QEP using different schedules Run-time execution not query optimization No uniform scheduling format
E N D
Query Task Model (QTM):Modeling Query Execution with Tasks • Steffen Zeuchand Johann-Christoph Freytag
Motivation • Different DBMS execute the same QEP using different schedules • Run-time execution not query optimization • No uniform scheduling format • Query execution in different DBMS are not comparable • Major differences between DBMS: • Chunk Size: Size of operator’s input • Scheduling Strategy: Execution model vs. run-time scheduler How to make different schedules comparable to explain why one schedule performs better than another?
Outline • Parallel Query Execution • QTM: Query Task Model • Evaluation • Outlook
Chunk Size • Column- • at-a-time • Tuple- • at-a-time • Buffer- • at-a-time t1,t2,t3 t4, t5, t6 • Selection t1 t1,t2, t3 • t1 • t2 • t3 • t4 • t5 • t6
Scheduling Strategie • Hash • Probe (R) • Hash • Probe (S) • Selection • Hash • Build • Hash • Build • R • T • S
Volcano Execution Model(Open-Next-Close Iterator) • Hash • Probe (R) Tuple Next • Hash • Probe (S) Tuple Next • Selection • Hash • Build • Hash • Build Next Tuple • R • T • S
(Run-time) Scheduler • Spatial • Locality • Temporal • Locality • Hash • Probe (R) • Prob_R(t2) • Prob_R(t2) • Prob_R(t1) • Prob_S(t2) • Hash • Probe (S) • Prob_S(t2) • Sel(t2) • Prob_S(t1) • Prob_R(t1) • Selection • Sel(t2) • Prob_S(t1) • Sel(t1) • Sel(t1) • Time • T • Further Optimiziation Criteria: • I/O, NUMA or Memory Usage
Dynamic Load Balancing • ⋈ T2 T1 T3 T4 T5 • ⋈ σ σ CPU1 CPU2 T2 T1 T3 • R • S • T T5 T4
DBMS Landscape MonetDBMIL Column-at-a time MonetDB X100 DB2 BLU StagedDB Hyper DB2 PostgreSQL Buffer-at-a time SAP HANA Chunk Size System R MySQL • PostgreSQL Tuple-at-a time Volcano Execution Model (Run-time) Scheduler Dynamic Load Balancing Scheduling Strategy
Outline • Parallel Query Execution • QTM: Query Task Model • Evaluation • Outlook
QTM: Query Task Model • Idea:A model that describes parallel query execution with tasks • QEP: Queue of tasks • Task: Encapsulate a piece of work on some data • Goals: • Open a design space for DBMS schedules • Make main aspects of query scheduling comparable: • Execution order, degree of parallelism and thread coordination, and partitioning
Query Task Model Work Data Task Queue T1 T2 T3 Processing Strategies • t1 • Table Data Queue • t2 t1 t2 t3 • t3
QTM Transformation: Input QEP Table Format Hardware Architecture
QTM Transformation Choosing Hash Join Max. Pipelines + Dependency Graph QEP
QTM: Task Configuration • Max. Pipelines + Dependency Graph Task Configurations (Task Blueprints)
QTM: Tasks Instantiation Set of Tasks (TC Instantiation) Task Configuration (Task Blueprints)
QTM: Implementation Compile-time Run-time
Outline • Parallel Query Execution • QTM: Query Task Model • Evaluation • Outlook
Evaluation: Sampling Data-related Misses Instruction-related Misses
Evaluation:Insights • Tradeoff between data and instruction cache performance • Sweet spot: Largest private cache size vs. slightly larger buffer • Medium sized tasks are data-efficient: • Pros: Buffer fits entirely into cache, high data locality • Cons: High number of tasks and instructions • Large tasks are instruction-efficient: • Pros: Decrease number of instructions and tasks, high instruction locality • Cons: More data cache misses if cache size is exceeded • QTM: Cache-performance can be adjusted by buffer size
Outline • Parallel Query Execution • QTM: Query Task Model • Evaluation • Outlook
Outlook • Contributions: • QTM: A model for parallel query execution using tasks • Open a design space for DBMS schedules • Make different schedules present in different DBMS comparable • Thanks! • Future Work: • Cost Model • Transformation process for an arbitrary QEP