1 / 28

Dynamic Plan Migration for Continuous Query over Data Streams

Dynamic Plan Migration for Continuous Query over Data Streams. Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts, USA SIGMOD’2004.

rhonda
Download Presentation

Dynamic Plan Migration for Continuous Query over Data Streams

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Dynamic Plan Migration for Continuous Query over Data Streams Yali Zhu, Elke Rundensteiner and George Heineman Database System Research Group, WPI. Massachusetts, USA SIGMOD’2004 *Research partly supported by the RDC grant 2003-04 on ”On-line Stream Monitoring Systems: Untethered Healthcare, Intrusion Detection, and Beyond.”

  2. Stream Query Optimization • Differences with Traditional Query Optimization? SIGMOD 2004

  3. Stream Query Optimization • New classes of operators (windows) may mean new rewrites • New execution modes (continous/pipelining) • More dynamic fluctuations in statistics  compile time optimization not possible • Global optimization not practical; as huge query networks  Adaptive optimization. • Other cost models taking memory into account • Query optimization and load shedding SIGMOD 2004

  4. Motivation of ‘Query Migration’ • Continuous query over streams • Statistics unknown before start • Statistics changing during execution • Stream rates, arrival pattern, distribution, etc • Need for dynamic adaptation • Plan re-optimization • Change the shape of query plan tree SIGMOD 2004

  5. Run-time Plan Re-Optimization • Step 1 - Decide when to optimize • Statistics Monitoring • Step 2 – Generate new query plan • Query Optimization • Step 3 – Replace current plan by new plan • Plan Migration SIGMOD 2004

  6. Naïve Plan Migration Strategy BC AB AB BC • Migration Steps • Pause execution of old plan • Drain out all tuples inside old plan • Replace old plan by new plan • Resume execution of new plan A A B B C C Problem: Works for stateless operators only SIGMOD 2004

  7. Stateful Operator in CQ Example: Symmetric NL join w/ window constraints • Why stateful • Need non-blocking operators in CQ • Operator needs to output partial results • State data structure keep received tuples ax b2 ax b3 State A State B Key Observation: The purge of tuples in states relies on processing of new tuples. AB b1 b2 b3 b4 b5 ax A B SIGMOD 2004 ax

  8. Naïve Migration Strategy Revisited BC AB Deadlock Waiting Problem: A B C • Steps (1) Pause execution of old plan (2) Drain out alltuples inside old plan (3) Replace old plan by new plan (4) Resume execution of new plan (2) All tuples drained (3) Old Replaced By new (4) Processing Resumed SIGMOD 2004

  9. Problem Definition • Dynamic Plan Migration • Input (two migration boxes) • One contains old plan • One contains new plan • Have same input and output queues • Result • Old box is replaced by new box • Valid Migration • No missing tuples • No duplicates • Key points: • - Involved plans contain stateful operators • Need to migrate yet still retain useful states • and discard useless states. SIGMOD 2004

  10. State of the Art • “Efficient mid-query re-optimization of sub-optimal query execution plans” • [Kabra, DeWitt 1998] • Only migrates unprocessed portion • Query plan competing model • [Ioannidis, Ng, et. al. 1992] [Graefe, Cole. 1994] • Generate several candidate query plans before start • Execute all, choose one after a while SIGMOD 2004

  11. Outline • Problem Motivation and Definition • Dynamic Migration Strategies • Moving State Strategy • Parallel Track Strategy • Experimental Results SIGMOD 2004

  12. Moving State Strategy QABCD • Basic idea • Share common states between two migration boxes • Key steps • State Matching • Match states based on IDs. • State Moving • Create new pointers for matched states in new box • What’s left? • Unmatched states in new box QABCD CD AB SABC SBCD SD SA CD BC SD SBC SAB SC BC AB SB SC SA SB QA QB QC QD QA QB QC QD Old Box New Box SIGMOD 2004

  13. Unmatched States QABCD • State Recomputing • Recursively recompute unmatched SBC and SBCD from bottom up • Why always possible? • Old and new boxes have same input queues • The states associated with input queues always match • Why necessary? AB SA SBCD CD SBC SD BC SB SC QA QB QC QD SIGMOD 2004

  14. A B C D Terms on Tuples QABCD SABC SD CD SAB • New/Old tuples • Old: tuples already in old box when migration starts • New: tuples not exist in old box when migration starts • Sub-tuples • Tuple ABCD is result of • Tuple A, B, C and D are sub-tuples of tuple ABCD • Tuple ABCD has 24=16 possible combinations of old/newsub-tuples SC BC SA SB AB QA QB QC QD SIGMOD 2004

  15. Why Recompute Unmatched States • To get the complete results of ABCD, we need all 16 old/new combinations SA SBCD AB SD SBC CD SB SC BC If SBC not recomputed, will miss results with both B and C as OLD: QD QA QC QB A B C D A B C D A B C D Old Tuple New Tuple SIGMOD 2004

  16. Cost Estimation of MS Migration • Cost of MS consists of • Cost of state matching • ID comparison (neglectable) • Cost of state moving • Create pointers (neglectable) • Cost of state recomputing • Majority of cost • Affecting parameters • Operator selectivities • # of tuples in states • Estimated as (input rate x window size) • See paper for detailed cost models One cost model conclusion: Cost of MS has polynomial relation to window size SIGMOD 2004

  17. MS Migration Pros and Cons • Pros • Fast when # of tuples in states is small • Low input rates, low selectivity or small window • Cons • Output silence during entire migration stage • Can query output even during migration? • Motivation for Parallel Track Strategy SIGMOD 2004

  18. Parallel Track Strategy • Basic idea • Execute both plans in parallel and gradually “push” old tuples out of old box by purging • Key steps • Connect boxes • Execute in parallel • Until old box “expired” (no old tuple or sub-tuple) • Disconnect old box • Start execute new box only QABCD QABCD SABC SD SBCD SA CD AB SBC SAB SD SC BC CD SA SB SB SC BC AB QA QB QC QD QD QA QB QC SIGMOD 2004

  19. Potential Duplicates Duplicate Prevention • Tuple ABCD • 24=16 possible old/new sub-tuple combinations • Same case not generated by both boxes • Otherwise we may have duplicates • In new box • all states start empty • only generates ABCD as (new,new,new,new) • In old box • may generate all 16 cases • duplicate the case of (new,new,new,new) At root op in old box: If both to-be-joined tuples have all-new sub-tuples, don’t join. QABCD SABC SD CD SAB SC BC Other op in old box: Proceed as normal SA SB AB QD QA QC QB SIGMOD 2004

  20. Estimation of PT Migration T Old Old Old Old W TM-start Old Box 1st W New New SABC SD CD 2nd W New New TM-end SC SAB BC Estimation Formula: SA SB AB TPT ≈ 2W QA QB QC QD SIGMOD 2004

  21. PT Migration Duration • Given enough system computing resources • new tuples processed right away • PT migration duration ≈ 2W • If not enough system resources • New tuples accumulated in queues • PT migration duration > 2W SIGMOD 2004

  22. Cost Estimation of PT Migration • Cost of PT = cost of process 2W tuples in old box + cost of process 2W tuples in new box • Parameters: • Input rates, window size, selectivity • Similar to MS strategy SIGMOD 2004

  23. PT Migrations Pros and Cons • Pros • Keep on producing results even during migration • no results during MS migration • Cons • Migration duration is at least 2W • MS may be faster depending on # tuples in states SIGMOD 2004

  24. Outline • Problem Definition and Motivation • Dynamic Migration Strategies • Moving State Strategy • Parallel Track Strategy • Experimental Results SIGMOD 2004

  25. Experimental Setup • Embed in the CAPE system • CAPE = Continuous Adaptive Processing Engine • A streaming query engine developed at DSRG, WPI • VLDB’04 demo • Layers of Adaptations • Punctuation exploring • Adaptive scheduling • Query migration • Dynamic distribution • Input Streams • By stream generator of CAPE • Poisson arrival pattern • Experiments on migration duration • Vary window size SIGMOD 2004

  26. Migration Duration vs. Window Size SIGMOD 2004

  27. Conclusions • Identify problem of migration for stateful operators • First solutions for continuous query migration • Moving state strategy • Parallel track strategy • Embed both strategies into stream system • Cost model and experimental evaluation • Cost model confirmed by experiments • Identify performance trade-off of the two strategies SIGMOD 2004

  28. Thank You • For more information, check the CAPE website @: http://davis.wpi.edu/~dsrg/CAPE/ SIGMOD 2004

More Related