60 likes | 70 Views
Stre a ming Processing of Large XML Data. Jana Dvo řá kov á , Filip Zavoral processing of large XML data using XSLT with optimal memory complexity formal model / implementation framework analyzer, SSXT / BUXT transformer. SSXT - streaming transducer. Simple Streaming Xml Transducer
E N D
Streaming Processing of Large XML Data Jana Dvořáková, Filip Zavoral processing of large XML data using XSLT with optimal memory complexity formal model / implementation framework analyzer, SSXT / BUXT transformer
SSXT - streaming transducer • Simple Streaming Xml Transducer • no backward axis, no predicates, no variables • order-preserving • branch-disjoint • stack / document depth • BUXT - Buffering Transducer
Xord framework - Analyzer • Analyzer • XSLT & XSD: virtually applies templates to schema • all possible node sequences are processed • regexp • all possible node sequences selected by XPath expressions • possible reading orders of the elements • names • sequence of element names in the order they are called • represents the processing order of the elements
SSXT Transformer • Polymorphic stack • two types of transformation states - DFA & CC • related to current document level • sequence of deterministic finite automata states • concurrent evaluation of XPath expressions • single DFA for each expression • start-tag → DFA transition • final state → template call • cycle configuration • template and template call being processed
Evaluation & Comparison Memory consumption (MB) of SSXT algorithm and tree-based XSLT processors for input XML data of different size DBLP.xml ≈ 700 MB
Future work • Future work • buffering transformer optimizations and evaluation • multipass streaming algorithms • overcoming some restrictions to XSLT constructs