300 likes | 407 Views
Stream Processing of XPath Queries with Predicates. Ashish Kumar Gupta Dan Suciu University of Washington SIGMOD 2003. 報告者 : 蔡明瑾. Introduction. XML messages :exchange information XML stream processing problem Processing XPath queries(filters) on an incoming stream of XML packets
E N D
Stream Processing of XPath Queries with Predicates Ashish Kumar Gupta Dan Suciu University of Washington SIGMOD 2003 報告者:蔡明瑾
Introduction • XML messages :exchange information • XML stream processing problem • Processing XPath queries(filters) on an incoming stream of XML packets • Workload is very high • XPath queries • multiple predicates
Definition - XPath fragment E is atomic predicates
Definition – XML and SAX Parsers • startDocument() • startElement(a) • text(s) • endElement(a) • endDocument() • a:element or attribute label • s:data value
<a c=“3”> <b>4</b></a> • startDocument() • startElement(a) • startElement(@c) • text(“3”) • endElement(@c) • startElement(b) • text(“4”) • endElement(b) • endElement(a) • endDocument()
XML stream processing problem • XPath expression P:boolean filter • A XML documentmatches P if and only if P selects at least one node when evaluated on the document’s root • Set P = {P1,…,Pn} • Set I = {o1,…,on}
XPush Machine • Modified deterministic pushdown automaton • Simulate the execution of XPath filters • Input :stream of XML documents • Outout:oids • Changes: • States:top-down,bottom-up • Accepts SAX events as input
SAX call-back functions current state(qt,qb)
P1 = //a[b/text()=1 and .//a[@c>2]] P2 = // a[@c>2 and b/text()=1] <a> <b> 1 </b> <a c= “3” > <b> 1 </b></a></a> qo q2 qo q1 qo q1 qo qo qo q4 q4 q4 q5 qo qo qo q3 q3 q3 q3 q3 q3 q3 q3 q9 qo qo qo qo qo qo qo qo qo qo qo qo qo q15
Compiling a set of XPath filters to an XPush Machine • Convert XPath filters P1,…Pn into an Alternating Finite Automaton A1,…An • Translate all AFAs to a single XPush machine
Step1:Construct the AFA • Nondeterministic finite automaton A1,…,An • S:union of all states in A1,…,An • One initial state s1,…,sn • terminal states are OR states labeled with an atomic predicate on data values • πs(v): true of predicates on v V, else false
Step1:Construct the AFA (cont.) • States label: AND, OR, or NOT • εtransitions • δ: S * (Σ∪ {ε}) P(S) • AND and OR states :εtransitions NOT states : one outgoing transition
Step1:Construct the AFA (cont.) • Given an XML document tree, AFA accepts document: • Initial states matches the root node • OR state s matches node x: • node x is a data value node and πs(v)=true • Some transition s’ δ(s,a) matches y(child of x labeled a) • AND state s matches node x: • All transitions s’ δ(s,ε) matches x • NOT state s matches node x: • If s’doesn’t match x ,δ(s,ε) = {s’}
example1 • S = {1,..,13} • s1 = 1,s2 = 8 • wildcard:δ(5,@c) = Ø , δ(5,b) =5, δ(5,a) =6 • And states : states2 and 9 • π7(55)=true, π2(v)=false • State :correspond to a subquery in XPath: state2 [b/text()=1 and .//a[@c>2]]
Step2: construct XPush Machine • (Qt,Qb,qot,qob,tpush,tvalue,tpop,)
tpop(qb,a)= δ-1(q,a) • δ-1(q,a) {s’|δ(s’,a) ∩ q≠ Ø } • eval (q): a set of states q • Adds to q all states that are implied by states already in q • AND states • OR states • NOT states
example2 • tvalue(qot,1)={4,13} = q1 • tvalue(qot,x)={7,11} = q2 , for x > 2 tvalue(qot,x)= {Ø} = qo, for all other values of x • tpop(q8,a)={1,5} = q14 • tbadd(q3, q6)={3,12}∪{5}= q8 • leaf states cannot match with any other statesno mixed data • <a>1<b>2</b></a> X
Lazy XPush Machine • Do not construct states that are inconsistent with DTD • Lazy evaluation exploits regularities in the data that are not captured by the DTD • Avoid constructing States don’t occur in a given data set
Top-down Pruning • <e1>….<c>ci1</c>…..<c>cij</c>…</e1> • keeping track of the enabled branches in the top-down state • bottom-up computations only at the enabled branches
Order Optimization • /person[name/text()=“smith” and age/text()=“33”and phone/text()=“5551234”] • prec(s)={s’|s’ s} • tadd(qsb,qb)=qsb ∪ {s|s qb,prec(s) qsb}
Training the XPush Machine • Generate one XML document tree for every XPath query
Experiment • Real data sets: Protein • 9.12MB XML fragment • A non-recursive DTD • Max depth of document is 7