150 likes | 163 Views
This lecture covers path expressions and regular path expressions for managing XML and semistructured data. Evaluation techniques and examples are provided. Resources from Data on the Web book section 4.1 are referenced.
E N D
Managing XML and Semistructured Data Lecture 4: Path Expressions Prof. Dan Suciu Spring 2001
In this lecture • Path expressions • Regular path expressions • Evaluation techniques Resources: Data on the Web Abiteboul, Buneman, Suciu : section 4.1
Path Expressions Examples: • Bib.paper • Bib.book.publisher • Bib.paper.author.lastname Given an OEM instance, the answer of a path expression p is a set of objects
Bib &o1 paper paper book references &o12 &o24 &o29 references references author page author year author title http title title publisher author author author &o43 &25 &o44 &o45 &o46 &o52 &96 1997 &o51 &o50 &o49 &o47 &o48 last firstname firstname lastname first lastname &o70 &o71 &243 &206 “Serge” “Abiteboul” “Victor” 122 133 “Vianu” Path Expressions Examples: DB = Bib.paper={&o12,&o29} Bib.book.publisher={&o51} Bib.paper.author.lastname={&o71,&206}
Answer of a Path Expression Simple evaluation algorithms for Answer(P,DB): Runs in PTIME in size(P), size(db): • PTIME complexity Answer(P, DB) = f(P, root(DB)) Where: f(e, x) = {x} f(L.P, x) = {f(P,y) | (x,L,y) edges(DB)}
Regular Path Expressions R ::= label | _ | R.R | (R|R) | R* | R+ | R? Examples: • Bib.(paper|book).author • Bib.book.author.lastname? • Bib.book.(references)*.author • Bib.(_)*.zip
Applications ofRegular Path Expressions • Navigating uncertain structure: • Bib.book.author.lastname? • Syntactic substitute for inheritance: • Bib.(paper|book).author • Better: Bib.publication.author, but we don’t have inheritance
Applications ofRegular Path Expressions • Computing transitive closure: • Bib.(_)*.zip = everything accessible • Bib.book.(references)*.author = everything accessible via references • Some regular expressions of doubtful practical use: • (references.references)* = a path with an even number of references • (_._)* = paths of even length • (_._._.(_)?)* = paths of length (3m + 4n) for some m,n • But make great examples for illustration
Answer of aRegular Path Expression Recall: • Lang(R) = the set of words P generated by R Answer of regular path expressions: • Answer(R,DB) = {Answer(P,DB) | P Lang(R)} Need an evaluation algorithm that copes with cycles
Regular Path Expressions Recall: each regular expression NDFA Example: R = (a.a)*.a.b A = a states(A) = {s1,s2,s3,s4} initial(A) = s1 terminal(A) = {s4} s1 s2 a a b s3 s4
Regular Path Expressions Canonical Evaluation Algorithm • Answer(R,DB): • construct A from R • construct product automaton G = A x DB: • nodes(G) = states(A) x nodes(db) • edges(G) = {((s,x),L,(s’,x’) | (s,L,s’) edges(A), (x,L,x’) edges(DB)} • root(G) = (initial(A), root(DB)) • compute Gacc = set of nodes accessible from root(G) • return {x | s terminal(A) s.t. (s,x) Gacc}
_ &o1 s1 s2 s3 a a _ &o2 a a &o3 &o4 b Regular Path Expressions Example: R = _.(_._)*.a A = DB = Answer of R on DB = { &o2, &o3}
Compute Product Automaton G _ a _ s3,&o1 s1,&o1 s2,&o1 a a a s3,&o2 s1,&o2 s2,&o2 a a a a a a s3,&o3 s3,&o4 s1,&o3 s1,&o4 s2,&o3 s2,&o4 b b b
Compute Accessible Part Gacc _ a _ s3,&o1 s1,&o1 s2,&o1 a a a s3,&o2 s1,&o2 s2,&o2 a a a a a a s3,&o3 s3,&o4 s1,&o3 s1,&o4 s2,&o3 s2,&o4 b b b Answer(R,DB) = {&o2, &o3}
Complexity of Regular Path Expressions • The evaluation algorithm runs in PTIME in size(R), size(DB) • Even when there are cycles in DB