460 likes | 589 Views
A Transducer-Based XML Query Processor. Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD. Overview. Motivation Architecture Framework: Streams + XQuery XSM (XML Stream Machine) XSM Networks Network Composition Conclusions.
E N D
A Transducer-Based XML Query Processor Bertram Ludäscher, SDSC/CSE UCSD Pratik Mukhopadhyay, CSE UCSD Yannis Papakonstantinou, CSE UCSD
Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions
Web Service Implementations & RMI Web Service XML message XML Message Transformer Transformed XML message Efficient Processing of Sequentially Accessed XML Data
Efficient Processing of Sequentially Accessed XML Data Web Development Web Front-End XHTML page XML file XML-to-XHTML Transformer
Efficient Processing of Sequentially Accessed XML Data Archive Transformation & ETL (Extraction Transformation & Loading) Applications XML target file XML archive file XML Processor
Efficient Processing of Sequentially Accessed XML Data Sensor Data Analysis Stream Sensor Data Processor Acting/ Mining Software XML Stream
XML XML XML stream XML stream XML Bandwidth & Connectivity will Increase the Amount of Data … XML stream XML Sensor Data Processor XML XML stream
…Hardware Advances do not Favor Conventional Architectures CPU Speed Bandwidth Magnitude CPU2Memory Speed Year
Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions
Transducer-Based Processing:On-the-Fly & Minimal Memory XML Stream Machine Condition | Action … … Condition | Action Output buffer Input buffer … Buffers …
XML Stream Machine (XSM)High-Level Architecture XQuery Optional Input DTD XQuery Compiler XSM XSM-to-C Compiler C program
Optional Input DTD Schema Optimization Components of the XQuery Compiler XQuery XQuery-to-Network Translation XSM Network XSM Composition Single XSM
Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions
Path Expressions Element Construction for-where-return Expressions Concatenation XQuery Subset for $X in $R/a return for $Y in $X/b return <res> $Y, $X </res>
S$R E$R • Control Tokens XML Stream:Tags, Data & Control Tokens … <r> <a> <b> 5 </b> <b> 1 </b> </a> • Data XML Stream is Sequence of • Open Tag & Close Tag Tokens
Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions
y x <b> <a> Sz 5 </b> 5 </b> <b> 1 </b> </a> Ez Output Buffer Z XML Stream Machine (XSM) Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ Concatenation of bindings of Y, X into bindings of Z 3 C *x!=Ex | w(z,*x), x++
y x z Output Buffer Z XML Stream Machine (XSM) Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++
XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++
XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> Sz Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++
XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> <b> Sz Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++
XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> <b> Sz 5 </b> Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++
XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> <b> Sz 5 </b> Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++
XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> <b> <a> Sz 5 </b> Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++
XML Stream Machine (XSM) y x z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> <b> <a> Sz 5 </b> 5 </b> <b> 1 </b> </a> Output Buffer Z *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++
XML Stream Machine (XSM) y x z <b> <a> Sz 5 </b> 5 </b> <b> 1 </b> </a> Ez Output Buffer Z Input Buffer Y … Sy Ey Sy Ey Sy <b> 5 </b> <b> 1 </b> Input Buffer X … Sx Ex Sx <a> <b> 5 </b> <b> 1 </b> </a> *x=Sx | w(z,Sz), x++ 1 *y!=Ey | w(z,*y), y++ *y=Sy | y++ 2 0 *x=Ex | w(z,Ez), x++ *y=Ey | y++ 3 C *x!=Ex | w(z,*x), x++
State Automata Do not construct Do not store intermediate results Sufficient for XPath only Transducers Finite alphabets State is the only memory No reset of input pointers Comparison of XSM against State Automata & Transducers XSM • Unbounded alphabet • Buffers • Pointer reset
Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions
XSM Networks: Intermediate Step in Translating Queries to XSMs XQuery XQuery-to-Network Translation XSM Network XSM Composition Single XSM
XSM Network for $X in $R/a return for $Y in $X/b return <res> $Y, $X </res> $Y $X/b $X’ $R For $Y [$Y,$X] [$Y’,$X’] $X $R/a $Y’ $O $Z $Y’,$X’ <res> $Z </res>
$X $X $O $Y $Y $Z $O $Y,$X <res> $Z </res> From XQueries to XSM Networks:Non-FLWR Expressions <res> $Y, $X </res>
$R $X $O expr($X) G From XQueries to XSM Networks:FLWRs without Free Variables for $X inGreturn expr($X)
$X From XQueries to XSM Networks:FLWRs with Free Variables for $Y in $X/b return <res> $Y, $X </res> free variable $X $Y $X/b $X’ For $Y [$Y,$X] [$Y’,$X’] <res> $Y’, $X’ </res> $O $Y’
Overview • Motivation • Architecture • Framework: Streams + XQuery • XSM (XML Stream Machine) • XSM Networks • Network Composition • Conclusions
Composition Merges Two XSMs Into One $Y $X/b $X’ $R For $Y [$Y,$X] [$Y’,$X’] $X $R/a $Y’ $O $Z $Y’,$X’ <res> $Z </res>
Composition Merges Two XSMs into One $Y $X/b $X’ $R For $Y [$Y,$X] [$Y’,$X’] $X $R/a $Y’ <res> $Y’, $X’ </res> $O
XSM Composition: “State Product” Emulates Producer-Consumer Producer M1 Consumer M2 q2 q1 “State Product” M3 = (M2 o M1) q1 q2
r1 ...rn • (q2) = ¬AE(r1) ... ¬AE(rn) = “no shared read-pointer riof q2is At End” Naive Composition M1 M2 1|A1 2|A2 ... ... ... ... q1 q1’ q2 q2’ M3 = (M2 o M1) 2|A2 ... ... q1 q2 q1 q2’ M2 step if (q2) ¬1|A1 ... ... q1 q2 q1’ q2 M1 step if ¬(q2)
Smart Composition • Normalization Assumptions: • #( read-pointers-into-shared-buffer(q2) ) 1 • Atomic actions only • Basic idea: • avoid runtime tests (“At-End”) whenever outcome can be determined at compile- • Different “modes”: • go: consumer M2 proceeds (full buffer) • no: producer M1 proceeds (empty buffer) • may be consumer can follow immediately • ae: do runtime check AE:
1|A1 q1 q2 no q’1 q2 no Smart Composition: noCase (shared buffer is empty) M1 M2 1|A1 2|A2 ... ... ... ... q1 q1’ q2 q2’ Case Transition inserted M2 does not wait on shared buffer 2|A2 q1 q1 q2 no q’2 no A1 does not write to the shared buffer
12|A12 q1 q2 no q’1 q’2 no 12|A12 q1 q2 no q’1 q’2 go Smart Composition: Producer fills buffer Combination of 1 with 2 Combination of A1 with A2
Smart Composition: go - ae -no • in go mode 2|A2 if A2 advances the read pointer into shared buffer q1 q2 go q1 q’2 no 2|A2 if A2 does not advance read pointer into shared buffer q1 q2 go q1 q’2 go
Conclusions & Future Work • Novel query processor model • Success in filtering & transformation • To be extended for joins & aggregations • Memory footprint questions • Facilitated by model’s simplicity
Related Work • Relational Data Streams & Sequence Data Models • Pipelined Join Operators • Aggregates & Approximations • Fast XPath on streams • Memory requirements of validating XML
1|A1 q1 q2 no q’1 q2 no 12|A12 q1 q2 no q’1 q’2 no 12|A12 q1 q2 no q’1 q’2 go Smart Composition: go - ae -no • in no mode: execute M1 step ... if A1 does not advance shared write pointer 1|A1 if A1 does advance shared write pointer q1 q2 no q’1 q2 ae simplified composed 12 and (A1;A2) • ... AND possibly M2 step if A2 advances shared read pointer if A2 does not advance shared read pointer