330 likes | 430 Views
XJoin: Faster Query Results Over Slow And Bursty Networks. IEEE Bulletin, 2000 by T. Urhan and M Franklin. Based on a talk prepared by Asima Silva & Leena Razzaq. Motivation. Data delivery issues in terms of: unpredictable delay from some remote data sources
E N D
XJoin: Faster Query Results Over Slow And Bursty Networks IEEE Bulletin, 2000 by T. Urhan and M Franklin Based on a talk prepared by Asima Silva & Leena Razzaq
Motivation • Data delivery issues in terms of: • unpredictable delay from some remote data sources • wide-area network with possibly communication links, congestion, failures, and overload • Goal: • Not just overall query processing time matters • Also when initial data is delivered • Overall throughput and rate throughout query process
Overview • Hash Join History • 3 Classes of Delays • Motivation of XJoin • Challenges of Developing XJoin • Three Stages of XJoin • Handling Duplicates • Experimental Results
S tuple 1 key2 key1 key3 key4 Key5… R tuples… R tuples R tuples R tuples R tuples S tuple 2 S tuple 3 S tuple 4 S tuple 5…. Hash Join 2. Probe • Only one table is hashed 1. BUILD
Memory Disk Bucket i R tuples S tuple 1 Bucket n R tuples Bucket i+1 R tuples S tuple 1 Bucket n+1 R tuples Bucket i+2 R tuples S tuple 1 Bucket n+2 R tuples Bucket … R tuples S tuple 1 Bucket … R tuples Bucket j-1 R tuples S tuple … Bucket m-1 R tuples Bucket j R tuples Bucket m R tuples Hybrid Hash Join • One table is hashed both to disk and memory (partitions) • G. Graefe. “Query Evaluation Techniques for Large Databases”, ACM1993.
R tuple S tuple R tuple S tuple Key i Key n S tuples R tuples Key n+1 Key i+1 S tuples R tuples PROBE Key n+2 Key i+2 S tuples R tuples Key … Key … R tuples S tuples PROBE Key m-1 Key j-1 S tuples R tuples Key m Key j S tuples R tuples BUILD BUILD Symmetric Hash Join (Pipeline) • Both tables are hashed (both kept in main memory only) • Z. Ives, A. Levy, “An Adaptive Query Execution”, VLDB 99 OUTPUT Source S Source R
Problem of SHJ: • Memory intensive : • Won’t work for large input streams • Wont’ allow for many joins to be processed in a pipeline (or even in parallel)
New Problems: Three Delays • Initial Delay • First tuple arrives from remote source more slowly than usual (still want initial answer out quickly) • Slow Delivery • Data arrives at a constant, but slower than expected rate (at the end, still overall good throughput behavior) • Bursty Arrival • Data arrives in a fluctuating manner (how to avoid sitting idle in periods of low input stream rates)
Question: • Why are delays undesirable? • Prolongs the time for first output • Slows the processing if wait for data to first be there before acting • If too fast, you want to avoid loosing any data • Waste of time if you sit idle while no data is incoming • Unpredictable, one single strategy won’t work
Challenges for XJoin • Manage flow of tuples between memory and secondary storage (when and how to do it) • Control background processing when inputs are delayed (reactive scheduling idea) • Ensure the full answer is produced • Ensure duplicate tuples are not produced • Both quick initial output as well as good overall throughput
Motivation of XJoin • Produces results incrementally when available • Tuples returned as soon as produced • Good for online processing • Allows progress to be made when one or more sources experience delays by: • Background processing performed on previously received tuples so results are produced even when both inputs are stalled
Stages (in different threads) M :M M :D D:D
Memory-resident partitions of source A Memory-resident partitions of source B 1 1 n n 1 k n . . . . . . . . . . . . . . . . . . M E M O R Y flush D I S K Tuple B Tuple A hash(Tuple B) = n hash(Tuple A) = 1 1 . . . n 1 k n . . . . . . SOURCE-A Disk-residentpartitions of source A Disk-residentpartitions of source B SOURCE-B XJoin
1st Stage of XJoin • Memory - to - Memory Join • Tuples are stored in partitions: • A memory-resident (m-r) portion • A disk-resident (d-r) portion • Join processing continues as usual: • If space permits, M to M • If memory full, then pick one partition as victim, flush to disk and append to end of disk partition • 1st Stage runs as long as one of the inputs is producing tuples • If no new input, then block stage1 and start stage 2
Output Partitions of source A Partitions of source B j i j i . . . . . . . . . . . . . . . . . . Insert Insert Probe Probe Tuple A Tuple B hash(record A) = i hash(record B) = j 1st Stage Memory-to-Memory Join M E M O R Y SOURCE-A SOURCE-B
Why Stage 1? • Use Memory as it is the fastest whenever possible • Use any new coming data as it’s already in memory • Don’t stop to go and grab stuff out of disk for new data joins
Question: • What does Second Stage do? • When does the Second Stage start? • Hints: • Xjoin proposes a memory management technique • What occurs when data input (tuples) are too large for memory? • Answer: • Second Stage joins Mem-to-Disk • Occurs when both the inputs are blocking
2nd Stage of XJoin • Activated when 1st Stage is blocked • Performs 3 steps: • Chooses the partition according to throughput and size of partition from one source • Uses tuples from d-r portion to probe m-r portion of other source and outputs matches, till d-r completely processed • Checks if either input resumed producing tuples. If yes, resume 1st Stage. If no, choose another d-r portion and continue 2nd Stage.
Output i Partitions of source A Partitions of source B . . . . . . . . . . . . . . i . . . . . . . . . . . . . . M E M O R Y DPiA MPiB i D I S K i . . . . . . . . . . . . . . . . . . . . Partitions of source A Partitions of source B Stage 2: Disk-to-memory Joins
Controlling 2nd Stage • Cost of 2nd Stage is hidden when both inputs experience delays • Tradeoff ? • What are the benefits of using the second stage? • Produce results when input sources are stalled • Allows variable input rates • What is the disadvantage? • The second stage must complete a d-r portion before checking for new input (overhead) • To address the tradeoff, use an activation threshold: • Pick a partition likely to produce many tuples right now
3rd Stage of XJoin • Disk-to-Disk Join • Clean-up stage • Assumes that all data for both inputs has arrived • Assumes that first and second stage completed • Makes sure that all tuples belonging in the result are being produced. • Why is this step necessary? • Completeness of answer
Handling Duplicates • When could duplicates be produced? • Duplicates could be produced in all 3 stages as multiple stages may perform overlapping work. • How address it: • XJoin prevents duplicates with timestamps. • When address this: • During processing as continuous output
Time Stamping : part 1 • 2 fields are added to each tuple: • Arrival TimeStamp (ATS) • indicates when the tuple arrived first in memory • Departure TimeStamp (DTS) • used to indicated time the tuple was flushed to disk • [ATS, DTS] indicates when tuple was in memory • When did two tuples get joined? • If Tuple A’s DTS is within Tuple B’s [ATS, DTS] • Tuples that meet this overlap condition are not considered for joining by the 2nd or 3rd stages
ATS ATS DTS DTS Tuple A 102 234 Tuple A 102 234 Non-Overlapping Overlapping Tuple B1 Tuple B2 178 348 601 198 Detecting tuples joined in 1st stage • Tuples joined in first stage • B1 arrived after A, and before A was flushed to disk • Tuples not joined in first stage • B2 arrived after A, and after A was flushed to disk
Time Stamping : part 2 • For each partition, keep track off: • ProbeTS: time when a 2nd stage probe was done • DTSlast: the latest DTS time of all the tuples that were available on disk at that time • Several such probes may occur: • Thus keep an ordered history of such probe descriptors • Usage: • All tuples before and including at time DTSlast were joined in stage 2 with all tuples in main memory (ATS,DTS) at time ProbeTS
DTSlast ATS DTS ProbeTS Tuple A 100 200 20 300 100 800 250 700 900 340 550 300 Partition 1 Partition 3 Partition 2 Overlap Partition 1 Partition 2 Tuple B 500 600 ATS DTS History list for the corresponding partitions Detecting tuples joined in 2nd stage All tuples before and including DTSlast were joined in Stage 2 At time ProbeTS All A tuples in Partition 2 up to DTSlast 250, Were joined with m-r tuples that arrived before Partition 2’s ProbeTS.
Experiments • HHJ (Hybrid Hash Join) • Xjoin (with 2nd stage and with caching) • Xjoin (without 2nd stage) • Xjoin (with aggressive usage of 2nd stage)
Case 1: Slow NetworkBoth sources are slow (bursty) • XJoin improves delivery time of initial answers -> interactive performance • The reactive background processing is an effective solution to exploit intermittant delays to keep continued output rates. • Shows that 2nd stage is very useful if there is time for it
Case 2: Fast NetworkBoth sources are fast • All XJoin variants deliver initial results earlier. • XJoin also can deliver the overall result in equal time to HHJ • HHJ delivers the 2nd half of the result faster than XJoin. • 2nd stage cannot be used too aggressively if new data is coming in continuously
Conclusion • Can be conservative on space (small footprint) • Can be used in conjunction with online query processing to manage the streams • Resuming Stage 1 as soon as data arrives • Dynamically choosing techniques for producing results
References • Urhan, Tolga and Franklin, Michael J. “XJoin: Getting Fast Answers From Slow and Bursty Networks.” • Urhan, Tolga, Franklin, Michael J. “XJoin: A Reactively-Scheduled Pipelined Join Operator.” • Hellerstein, Franklin, Chandrasekaran, Deshpande, Hildrum, Madden, Raman, and Shah. “Adaptive Query Processing: Technology in Evolution”. IEEE Data Engineering Bulletin, 2000. • Hellerstein and Avnur, Ron. “Eddies: Continuously Adaptive Query Processing.” • Babu and Wisdom, Jennefer. “Continuous Queries Over Data Streams”.