1 / 46

A Transducer-Based XML Query Processor

A Transducer-Based XML Query Processor. Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD. Overview. Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions.

deacon
Download Presentation

A Transducer-Based XML Query Processor

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD

  2. Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions

  3. Web Service Implementations & RMI Web Service XML message XML Message Transformer Transformed XML message Efficient Processing of Sequentially Accessed XML Data

  4. Efficient Processing of Sequentially Accessed XML Data Web Development Web Front-End XHTML page XML file XML-to-XHTML Transformer

  5. Efficient Processing of Sequentially Accessed XML Data Archive Transformation & ETL (Extraction Transformation & Loading) Applications XML target file XML archive file XML Processor

  6. Efficient Processing of Sequentially Accessed XML Data Sensor Data Analysis Stream Sensor Data Processor Acting/ Mining Software XML Stream

  7. XML XML XML stream XML stream XML Bandwidth & Connectivity will Increase the Amount of Data … XML stream XML Sensor Data Processor XML XML stream

  8. …Hardware Advances do not Favor Conventional Architectures CPU Speed Bandwidth Magnitude CPU2Memory Speed Year

  9. Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions

  10. Transducer-Based Processing:On-the-Fly & Minimal Memory XML Stream Machine Condition | Action … … Condition | Action Output buffer Input buffer … Buffers …

  11. XML Stream Machine (XSM)High-Level Architecture XQuery Optional Input DTD XQuery Compiler XSM XSM-to-C Compiler C program

  12. Optional Input DTD Schema Optimization Components of the XQuery Compiler XQuery XQuery-to-Network Translation XSM Network XSM Composition Single XSM

  13. Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions

  14. Path Expressions Element Construction for-where-return Expressions Concatenation XQuery Subset for $X in $R/a return for $Y in $X/b return <res> $Y, $X </res>

  15. S$R E$R • Control Tokens XML Stream:Tags, Data & Control Tokens … <r> <a> <b> 5 </b> <b> 1 </b> </a> • Data XML Stream is Sequence of • Open Tag & Close Tag Tokens

  16. Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions

  17. y x <b> <a> Sz 5 </b> 5 </b> <b> 1 </b> </a> Ez Output Buffer Z XML Stream Machine (XSM) Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ Concatenation of bindings of Y, X into bindings of Z 3 C *x!=Ex | w(z,*x), x++

  18. y x z Output Buffer Z XML Stream Machine (XSM) Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++

  19. XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++

  20. XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> Sz Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++

  21. XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> <b> Sz Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++

  22. XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> <b> Sz 5 </b> Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++

  23. XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> <b> Sz 5 </b> Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++

  24. XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> <b> <a> Sz 5 </b> Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++

  25. XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> <b> <a> Sz 5 </b> 5 </b> <b> 1 </b> </a> Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++

  26. XML Stream Machine (XSM) y x z <b> <a> Sz 5 </b> 5 </b> <b> 1 </b> </a> Ez Output Buffer Z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++

  27. State Automata Do not construct Do not store intermediate results Sufficient for XPath only Transducers Finite alphabets State is the only memory No reset of input pointers Comparison of XSM against State Automata & Transducers XSM • Unbounded alphabet • Buffers • Pointer reset

  28. Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions

  29. XSM Networks: Intermediate Step in Translating Queries to XSMs XQuery XQuery-to-Network Translation XSM Network XSM Composition Single XSM

  30. XSM Network for $X in $R/a return for $Y in $X/b return <res> $Y, $X </res> $Y $X/b $X’ $R For $Y [$Y,$X]  [$Y’,$X’] $X $R/a $Y’ $O $Z $Y’,$X’ <res> $Z </res>

  31. $X $X $O $Y $Y $Z $O $Y,$X <res> $Z </res> From XQueries to XSM Networks:Non-FLWR Expressions <res> $Y, $X </res>

  32. $R $X $O expr($X) G From XQueries to XSM Networks:FLWRs without Free Variables for $X inGreturn expr($X)

  33. $X From XQueries to XSM Networks:FLWRs with Free Variables for $Y in $X/b return <res> $Y, $X </res> free variable $X $Y $X/b $X’ For $Y [$Y,$X]  [$Y’,$X’] <res> $Y’, $X’ </res> $O $Y’

  34. Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions

  35. Composition Merges Two XSMs Into One $Y $X/b $X’ $R For $Y [$Y,$X]  [$Y’,$X’] $X $R/a $Y’ $O $Z $Y’,$X’ <res> $Z </res>

  36. Composition Merges Two XSMs into One $Y $X/b $X’ $R For $Y [$Y,$X]  [$Y’,$X’] $X $R/a $Y’ <res> $Y’, $X’ </res> $O

  37. XSM Composition: “State Product” Emulates Producer-Consumer Producer M1 Consumer M2 q2 q1 “State Product” M3 = (M2 o M1) q1 q2

  38. r1 ...rn • (q2) = ¬AE(r1)  ...  ¬AE(rn) = “no shared read-pointer riof q2is At End” Naive Composition M1 M2 1|A1 2|A2 ... ... ... ... q1 q1’ q2 q2’ M3 = (M2 o M1) 2|A2 ... ... q1 q2 q1 q2’ M2 step if (q2) ¬1|A1 ... ... q1 q2 q1’ q2 M1 step if ¬(q2)

  39. Smart Composition • Normalization Assumptions: • #( read-pointers-into-shared-buffer(q2) )  1 • Atomic actions only • Basic idea: • avoid runtime tests (“At-End”) whenever outcome can be determined at compile- • Different “modes”: • go: consumer M2 proceeds (full buffer) • no: producer M1 proceeds (empty buffer) • may be consumer can follow immediately • ae: do runtime check AE:

  40. 1|A1 q1 q2 no q’1 q2 no Smart Composition: noCase (shared buffer is empty) M1 M2 1|A1 2|A2 ... ... ... ... q1 q1’ q2 q2’ Case Transition inserted M2 does not wait on shared buffer 2|A2 q1 q1 q2 no q’2 no A1 does not write to the shared buffer

  41. 12|A12 q1 q2 no q’1 q’2 no 12|A12 q1 q2 no q’1 q’2 go Smart Composition: Producer fills buffer Combination of 1 with 2 Combination of A1 with A2

  42. Smart Composition: go - ae -no • in go mode 2|A2 if A2 advances the read pointer into shared buffer q1 q2 go q1 q’2 no 2|A2 if A2 does not advance read pointer into shared buffer q1 q2 go q1 q’2 go

  43. Performance Datapoint(Transformation Query on DBLP)

  44. Conclusions & Future Work • Novel query processor model • Success in filtering & transformation • To be extended for joins & aggregations • Memory footprint questions • Facilitated by model’s simplicity

  45. Related Work • Relational Data Streams & Sequence Data Models • Pipelined Join Operators • Aggregates & Approximations • Fast XPath on streams • Memory requirements of validating XML

  46. 1|A1 q1 q2 no q’1 q2 no 12|A12 q1 q2 no q’1 q’2 no 12|A12 q1 q2 no q’1 q’2 go Smart Composition: go - ae -no • in no mode: execute M1 step ... if A1 does not advance shared write pointer 1|A1 if A1 does advance shared write pointer q1 q2 no q’1 q2 ae simplified composed 12 and (A1;A2) • ... AND possibly M2 step if A2 advances shared read pointer if A2 does not advance shared read pointer

More Related