150 likes | 166 Views
R-SOX presents runtime schema optimization techniques for efficient processing of XML streams with dynamic schema changes, using semantic query optimization to minimize memory usage and enhance responsiveness. The system architecture involves annotating streams, refining query plans, and propagating schemas downstream. The tool supports various optimization techniques like tree minimization and recursion optimization for improved query performance.
E N D
R-SOX:Runtime Semantic Query Optimization over XML Streams Song Wang, Hong Su, Ming Li, Mingzhu Wei, Shoushen Yang Drew Ditto, Elke A. Rundensteiner and Murali Mani Database Systems Research Group Department of Computer Science Worcester Polytechnic Institute Worcester, Massachusetts, USA VLDB2006Seoul, Korea
Background:XML Stream Applications • Wide-range and growing applications • Examples: news publishing and on-line auction systems • Characteristics • Real-time processing: short response time • Limited resources: minimize memory News Publishing On-line Auction
Background:Optimization Using Constraints • Constraint Properties • Document Type Definition (DTD) or XML Schema • Constraints are statically available beforehand • General XML Semantic Query Optimization (SQO) • Tree minimization • Recursion optimization • Stream-specific XML SQO • Context-aware shortcutting • Token-granularity data output
R-SOX: Motivation and Goal • Motivation • Scenarios where static schema cannot be applied • Challenges when schema comes dynamically: - how to represent and manage runtime schema - how to exploit dynamic schema for runtime optimization - how to propagate runtime schema down stream • Goals • Runtime schema encoding and synchronization • Semantic query optimization techniques • Runtime schema propagation
R-SOX: Architecture and Workflow Annotated Output Stream Input Stream Stream Annotator Result Stream Extended Raindrop XQuery Engine Plan Refinement RSI Result Schema Query Plan Generator Schema Inf. Manager Query Plan Adaptor Schema knowledge Query Plan R-SOX System XQuery Basic XQuery Evaluation Runtime Schema Refinement Runtime Semantic Query Optimization Downstream Schema Propagation Raindrop Engine Demon Focus R-SOX Contributions Future Work
Basic XQuery Evaluation XQuery Q1-1: FOR $o in document(“news.xml")/stream/news RETURN <result> $o/source, $o/comments </result> • Raindrop XQuery Engine • Construction of Raindrop plan • Automaton-based query evaluation SJoin on $x ExtractNest $b ExtractNest $c Nav $x//source-> $b Nav $x//comments->$c Nav stream//news -> $x Input Token Stream: <stream> <news> <source> <content>CNN…</content> <rank>…</rank>… </source> <comments> <content>President…</content>… </comments> …… </news> …… Raindrop XQuery Plan Stream Data content s4 s3 source stream news s0 s1 s2 content comments s5 s6 Query Automata
Runtime Schema Refinement Example of RSI: News ((source | comment)+, date+) RSI 1: ((news,inf,TIME), (/news/comment, , ),-) News (source+, date+) RSI 2: ((/news,200,COUNT), (/news/comment, /news/source, *), +) News (source*, comment+, date+) • Runtime Schema Information (RSI) • Representing RSI: RSI Grammar • Encoding RSI: - embedded into input XML token stream - extracted using DFA stream loader • Managing Schema Information • Schema Graph: directed ordered graph • Schema graph synchronization with the newly received RSIs • History-aware RSI rollback
Runtime SQO: Overview Supporting Following SQO Techniques: ( 1) Tree Minimization ( 2) Recursion Optimization ( 3) Fast Data Output ( 4) Navigation Shortcutting • Runtime Plan Adaptor • Incremental plan migration • Rule library • Rule applier • Query Execution • Modifying automata computations • Switching execution modes • Performing event-condition actions
Runtime SQO: Tree Minimization XQuery Q1: FOR $o in document(“news.xml")/stream/news RETURN <result> $o/source, $o/comments </result> • Benefits • Expedite document traversal on pattern retrieval by avoiding unnecessary navigation • Change query plan at run-time by adjusting automata • Query Execution • Temporarily removing and adding automaton states RSIs: P1: ((stream,inf,Count), (/news, source , ), -) P2: ((stream,inf,Count), (/news, comments ,), -) stream (1,∞) news Cut by P1 Cut by P2 (1, ∞) (1, ∞) comments date source …… …… …… Schema Graph Refinement Disable the transition by P1 content s4 s3 source Disable the transition by P2 stream news s0 s1 s2 content comments s5 s6 Query Automata Refinement
Runtime SQO: Recursion Optimization Recursive-aware operators will be switched to the non-recursive operator if input XML data isn’t recursive RSIs: P1: ((news,inf,Count), (/news, news, ), - ) P2: ((news,inf,Count), (/news, news, ), +) • Benefits • Improve performance by avoiding unneces-sary over-head on recursive handling • Optimization Processing • Detect recursion by analyze the runtime schema knowledge • Switch between recursion-aware/non-recursive operators • Characterize safe moments of runtime migration RecurSJoin on $x Recursive Operator RecurExtractNest $b RecurExtractNest $c P1 P2 RecurNav $x//source-> $b RecurNav $x//comments->$c Non-recursive Operator RecurNav stream//news -> $x Stream Data Operator Switching in the Query Plan XQuery Q2: (slightly different with Q1) FOR $o in document(“news.xml") stream//news RETURN <result> $o/source, $o/comments </result>
Runtime SQO: Fast Data Output source date comments S2 S3 S4 S1 comments date comments source • Benefits • Minimize memory consumption by avoiding unnecessary data storage and releasing buffered data at the earliest moment • Optimization Processing • Augment query automata with Glushkov automata • Encode event-condition actions Glushkov Automata for Type “News” start • Case 1: Overall Schema Knowledge as • news((source | comments | date)+) • No order constraints can be used. Storing comments/content • Case 2: Overall Schema Knowledge as • News(source+,comments+,date+) • Global order constraint: Order( source, comments ) • No storage is needed • Case3: Overall Schema Knowledge as • News( (source | comment)+, date+, comment+ ) • Local order constraint: LocalOrder( source, comments ) • Same as Case 1 at the beginning. Glushkov automata on the type “news” is used to indicate the completeness of source elements. After that, storage on comments/content is not needed XQuery Q1: FOR $o in document(“news.xml")/stream/news RETURN <result> $o/source, $o/comments </result> content s5 s4 source stream news s1 s2 s3 content comments s6 s7 Actions Encoded into the Automata
Runtime SQO: Navigation Shortcut (I) • Benefit • Expedite document-order traversal on pattern retrieval by early filtering of failed patterns • Optimization Rules • Order, occurrence and exclusive rules • Completeness and minimal cost optimization is guaranteed • Query Execution • Introduce new pattern look-up into query automata • Encode event-condition actions
Runtime SQO: Navigation Shortcut (II) XQuery Q3: FOR $a in stream(bids)/auction, $bin$a/seller[homepage], $cin$a/bidder[sameAddr] WHERE $b/*/phone = “508” RETURN <auction> $b, $c </auction> Actions Encoded into the Automata Overall Schema Knowledge as: Occurrenc( phone, 2 ) when </phone> is encountered twice, check /*/phone: if fails the predicate, suspend states s2and s3 Overall Schema Knowledge as: Order( primary, homepage) when <primary> is encountered once, check /homepage: if no presence, suspend states s10, s3 and s2 Utilizing Order Constraints Utilizing OccurrenceConstraints
R-SOX System Demonstration Algebraic Query Plan Generation Runtime SchemaRefinement • Application Scenarios: • On-line auction data • News publishing data Runtime SQO
Raindrop Project http://davis.wpi.edu/dsrg/raindrop Recent Publications • S.Wang etc. R-SOX: Runtime Semantic Query Optimization over XML Streams. VLDB 2006. • H.Su etc. Automata Meets Algebra. DKE Journal 2006. • M.Wei etc. Processing Recursive XQuery over XML Streams: the Raindrop Approach. XSDM 2006. • H.Su etc. Semantic Query Optimization in an Automata-Algebra Combined XQuery Engine. VLDB 2004. • H.Su etc. Semantic Query Optimization for XQuery over XML Streams. VLDB 2005. Source Code Release • Raindrop 1.0 is released: http://davis.wpi.edu/dsrg/raindrop/release Acknowledgement • NSF for the Support on Grants IIS 0414567 and CNS 0551584