XJoin: Faster Query Results Over Slow And Bursty Networks

XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena Razzaq

Motivation • Data delivery issues in terms of: • unpredictable delay from some remote data sources • wide-area network with possibly communication links, congestion, failures, and overload • Goal: • Not just overall query processing time matters • Also when initial data is delivered • Overall throughput and rate throughout query process

Overview • Hash Join History • 3 Classes of Delays • Motivation of XJoin • Challenges of Developing XJoin • Three Stages of XJoin • Handling Duplicates • Experimental Results

S tuple 1 key2 key1 key3 key4 Key5… R tuples… R tuples R tuples R tuples R tuples S tuple 2 S tuple 3 S tuple 4 S tuple 5…. Hash Join 2. Probe • Only one table is hashed 1. BUILD

Memory Disk Bucket i R tuples S tuple 1 Bucket n R tuples Bucket i+1 R tuples S tuple 1 Bucket n+1 R tuples Bucket i+2 R tuples S tuple 1 Bucket n+2 R tuples Bucket … R tuples S tuple 1 Bucket … R tuples Bucket j-1 R tuples S tuple … Bucket m-1 R tuples Bucket j R tuples Bucket m R tuples Hybrid Hash Join • One table is hashed both to disk and memory (partitions) • G. Graefe. “Query Evaluation Techniques for Large Databases”, ACM1993.

R tuple S tuple R tuple S tuple Key i Key n S tuples R tuples Key n+1 Key i+1 S tuples R tuples PROBE Key n+2 Key i+2 S tuples R tuples Key … Key … R tuples S tuples PROBE Key m-1 Key j-1 S tuples R tuples Key m Key j S tuples R tuples BUILD BUILD Symmetric Hash Join (Pipeline) • Both tables are hashed (both kept in main memory only) • Z. Ives, A. Levy, “An Adaptive Query Execution”, VLDB 99 OUTPUT Source S Source R

Problem of SHJ: • Memory intensive : • Won’t work for large input streams • Wont’ allow for many joins to be processed in a pipeline (or even in parallel)

New Problems: Three Delays • Initial Delay • First tuple arrives from remote source more slowly than usual (still want initial answer out quickly) • Slow Delivery • Data arrives at a constant, but slower than expected rate (at the end, still overall good throughput behavior) • Bursty Arrival • Data arrives in a fluctuating manner (how to avoid sitting idle in periods of low input stream rates)

Question: • Why are delays undesirable? • Prolongs the time for first output • Slows the processing if wait for data to first be there before acting • If too fast, you want to avoid loosing any data • Waste of time if you sit idle while no data is incoming • Unpredictable, one single strategy won’t work

Challenges for XJoin • Manage flow of tuples between memory and secondary storage (when and how to do it) • Control background processing when inputs are delayed (reactive scheduling idea) • Ensure the full answer is produced • Ensure duplicate tuples are not produced • Both quick initial output as well as good overall throughput

Motivation of XJoin • Produces results incrementally when available • Tuples returned as soon as produced • Good for online processing • Allows progress to be made when one or more sources experience delays by: • Background processing performed on previously received tuples so results are produced even when both inputs are stalled

Stages (in different threads) M :M M :D D:D

Memory-resident partitions of source A Memory-resident partitions of source B 1 1 n n 1 k n . . . . . . . . . . . . . . . . . . M E M O R Y flush D I S K Tuple B Tuple A hash(Tuple B) = n hash(Tuple A) = 1 1 . . . n 1 k n . . . . . . SOURCE-A Disk-residentpartitions of source A Disk-residentpartitions of source B SOURCE-B XJoin

1st Stage of XJoin • Memory - to - Memory Join • Tuples are stored in partitions: • A memory-resident (m-r) portion • A disk-resident (d-r) portion • Join processing continues as usual: • If space permits, M to M • If memory full, then pick one partition as victim, flush to disk and append to end of disk partition • 1st Stage runs as long as one of the inputs is producing tuples • If no new input, then block stage1 and start stage 2

Output Partitions of source A Partitions of source B j i j i . . . . . . . . . . . . . . . . . . Insert Insert Probe Probe Tuple A Tuple B hash(record A) = i hash(record B) = j 1st Stage Memory-to-Memory Join M E M O R Y SOURCE-A SOURCE-B

Why Stage 1? • Use Memory as it is the fastest whenever possible • Use any new coming data as it’s already in memory • Don’t stop to go and grab stuff out of disk for new data joins

Question: • What does Second Stage do? • When does the Second Stage start? • Hints: • Xjoin proposes a memory management technique • What occurs when data input (tuples) are too large for memory? • Answer: • Second Stage joins Mem-to-Disk • Occurs when both the inputs are blocking

2nd Stage of XJoin • Activated when 1st Stage is blocked • Performs 3 steps: • Chooses the partition according to throughput and size of partition from one source • Uses tuples from d-r portion to probe m-r portion of other source and outputs matches, till d-r completely processed • Checks if either input resumed producing tuples. If yes, resume 1st Stage. If no, choose another d-r portion and continue 2nd Stage.

Output i Partitions of source A Partitions of source B . . . . . . . . . . . . . . i . . . . . . . . . . . . . . M E M O R Y DPiA MPiB i D I S K i . . . . . . . . . . . . . . . . . . . . Partitions of source A Partitions of source B Stage 2: Disk-to-memory Joins

Controlling 2nd Stage • Cost of 2nd Stage is hidden when both inputs experience delays • Tradeoff ? • What are the benefits of using the second stage? • Produce results when input sources are stalled • Allows variable input rates • What is the disadvantage? • The second stage must complete a d-r portion before checking for new input (overhead) • To address the tradeoff, use an activation threshold: • Pick a partition likely to produce many tuples right now

3rd Stage of XJoin • Disk-to-Disk Join • Clean-up stage • Assumes that all data for both inputs has arrived • Assumes that first and second stage completed • Makes sure that all tuples belonging in the result are being produced. • Why is this step necessary? • Completeness of answer

Handling Duplicates • When could duplicates be produced? • Duplicates could be produced in all 3 stages as multiple stages may perform overlapping work. • How address it: • XJoin prevents duplicates with timestamps. • When address this: • During processing as continuous output

Time Stamping : part 1 • 2 fields are added to each tuple: • Arrival TimeStamp (ATS) • indicates when the tuple arrived first in memory • Departure TimeStamp (DTS) • used to indicated time the tuple was flushed to disk • [ATS, DTS] indicates when tuple was in memory • When did two tuples get joined? • If Tuple A’s DTS is within Tuple B’s [ATS, DTS] • Tuples that meet this overlap condition are not considered for joining by the 2nd or 3rd stages

ATS ATS DTS DTS Tuple A 102 234 Tuple A 102 234 Non-Overlapping Overlapping Tuple B1 Tuple B2 178 348 601 198 Detecting tuples joined in 1st stage • Tuples joined in first stage • B1 arrived after A, and before A was flushed to disk • Tuples not joined in first stage • B2 arrived after A, and after A was flushed to disk

Time Stamping : part 2 • For each partition, keep track off: • ProbeTS: time when a 2nd stage probe was done • DTSlast: the latest DTS time of all the tuples that were available on disk at that time • Several such probes may occur: • Thus keep an ordered history of such probe descriptors • Usage: • All tuples before and including at time DTSlast were joined in stage 2 with all tuples in main memory (ATS,DTS) at time ProbeTS

DTSlast ATS DTS ProbeTS Tuple A 100 200 20 300 100 800 250 700 900 340 550 300 Partition 1 Partition 3 Partition 2 Overlap Partition 1 Partition 2 Tuple B 500 600 ATS DTS History list for the corresponding partitions Detecting tuples joined in 2nd stage All tuples before and including DTSlast were joined in Stage 2 At time ProbeTS All A tuples in Partition 2 up to DTSlast 250, Were joined with m-r tuples that arrived before Partition 2’s ProbeTS.

Experiments • HHJ (Hybrid Hash Join) • Xjoin (with 2nd stage and with caching) • Xjoin (without 2nd stage) • Xjoin (with aggressive usage of 2nd stage)

Case 1: Slow NetworkBoth sources are slow (bursty) • XJoin improves delivery time of initial answers -> interactive performance • The reactive background processing is an effective solution to exploit intermittant delays to keep continued output rates. • Shows that 2nd stage is very useful if there is time for it

Slow Network: both resources are slow

Case 2: Fast NetworkBoth sources are fast • All XJoin variants deliver initial results earlier. • XJoin also can deliver the overall result in equal time to HHJ • HHJ delivers the 2nd half of the result faster than XJoin. • 2nd stage cannot be used too aggressively if new data is coming in continuously

Case 2: Fast NetworkBoth sources are fast

Conclusion • Can be conservative on space (small footprint) • Can be used in conjunction with online query processing to manage the streams • Resuming Stage 1 as soon as data arrives • Dynamically choosing techniques for producing results

References • Urhan, Tolga and Franklin, Michael J. “XJoin: Getting Fast Answers From Slow and Bursty Networks.” • Urhan, Tolga, Franklin, Michael J. “XJoin: A Reactively-Scheduled Pipelined Join Operator.” • Hellerstein, Franklin, Chandrasekaran, Deshpande, Hildrum, Madden, Raman, and Shah. “Adaptive Query Processing: Technology in Evolution”. IEEE Data Engineering Bulletin, 2000. • Hellerstein and Avnur, Ron. “Eddies: Continuously Adaptive Query Processing.” • Babu and Wisdom, Jennefer. “Continuous Queries Over Data Streams”.

XJoin: Faster Query Results Over Slow And Bursty Networks

XJoin: Faster Query Results Over Slow And Bursty Networks

Presentation Transcript

Wavelets and Ranking of database query results

Query Optimization over Web Services

IP over networks

Control over Networks

Women’s Over 30 SLOW PITCH Softball

XJoin : Getting Fast Answers From Slow and Bursty Networks

Ship efficiency over time and slow steaming

Performance Estimation of Bursty Wavelength Division Multiplexing Networks

Intuitive Database Query System, Zooming Query Results Previews

XJoin: Faster Query Results Over Slow And Bursty Networks

Text Pre-processing and Faster Query Processing

Adaptive Frequency Counting over Bursty Data Streams

Query Incentive Networks

Reliable Bursty Convergecast in Wireless Sensor Networks

Gene Sequence and Query Results Database

Text Pre-processing and Faster Query Processing

Query Incentive Networks

Query Optimization over Web Services

Fix Sage 50 Slow Over Network

Identifying SQL Query Slow Performance: MySQL and PostgreSQL