1 / 25

Symposium on Database Provenance University of Edinburgh May 21, 2008

Annotated XML: Queries and Provenance Nate Foster TJ Green Val Tannen University of Pennsylvania. Symposium on Database Provenance University of Edinburgh May 21, 2008. Need to Track XML Provenance. For scientific data processing [Buneman+ 01]

hoshi
Download Presentation

Symposium on Database Provenance University of Edinburgh May 21, 2008

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Annotated XML: Queries and ProvenanceNate Foster TJ Green Val TannenUniversity of Pennsylvania Symposium on Database Provenance University of Edinburgh May 21, 2008

  2. Need to Track XML Provenance • For scientific data processing [Buneman+ 01] • Tree-structured data, heterogeneous sources • XML is the natural data model • Data annotated with source info; annotations need to be propagated during query processing • For incomplete/probabilistic data [Sen.&Abit. 06] • Query output annotated with Boolean formulas • Annotations indicate correlations between source data and output data • For data warehousing [Cui+ 00] • Even when data is relational, often have XML views

  3. Provenance for Relational Algebra Views source R view V ? ? ? V := ¼AB((¼AC(R) ⋈ ¼C(R)) [ (¼AB(R) ⋈ ¼BC(R)))

  4. Semiring-Annotated Relations [PODS07] • Associate each tuple in database with an annotation from a commutative semiring (K, +, ¢, 0, 1) • Combine and propagate annotations during (positive) relational query processing • ⋈, £, Å combine annotations using ¢ • ¼, [ combine annotations using + • ¾ multiplies annotations by 0 or 1

  5. Annotated Relations Example V R V := ¼AB((¼AC(R) ⋈ ¼C(R)) [ (¼AB(R) ⋈ ¼BC(R)))

  6. Semiring Bestiary • (B, Ç, Æ, ?, >) Set semantics • (N, +, ¢, 0, 1) Bag semantics • (PosBool(B), Ç, Æ, ?, >) Incomplete dbs • (P(), [, Å, ;, ) Probabilistic dbs • (P(P(X)), [, d, ;, {;}) Why-provenance where AdB := {a[b : a2A, b2B} • (C, min, max, absent, public) Security clearances • (N[X], +, ¢, 0, 1) Prov. polynomials

  7. Our Contribution: Annotated XML • We show how to decorate unordered XML datawith semiring annotations: K-UXML • We propagate the annotations for K-UXQuery (based on a large fragment of positive XQuery) • We do this by generalizing the semantics of Nested Relational Calculus (NRC) to handle annotated values and to incorporate a recursive tree type and structural recursion on trees • We prove a commutation with homomorphisms theorem, and show that it enables applications in security and incomplete databases

  8. K-UXML • No attributes, no text values, no repeated children (inessential); no order (essential!) • Each node decorated with a value k from semiring K (1 “neutral,” 0 “not present”) • K-collection: a finite set of elements annotated with values from K • Formally, the children of a node form a K-collection of subtrees (to annotate root, also have a top-level K-collection)

  9. Example: XPath on K-UXML Source, $T: Answer: a r bx1 cy1 cx1¢y3 + y1¢y2 cy1 a d cy2 bx2 d cy2 bx2 cy3 d a a Omitted annotations are 1 (and omitted subtrees have annotation 0) Query:element r { $T//c }

  10. Example: For-Loops in K-UXQuery Answer: Source, $S: az p bx1 cx2 dz¢x1¢y1+ z¢x2¢y2 ez¢x2¢y3 dy1 dy2 ey3 Query:element p { for $t in $S return for $x in ($t)/¤return ($x)/¤ } (i.e., element p { $S/¤/¤ })

  11. Outline of Technical Approach • Extend NRC with a recursive tree type • satisfies: tree = label£ { tree } and an operation for structural recursion on trees (srt) [Robertson+ 07] • apply to each child subtree, collect results using NRC big union • Generalize NRC + srt to handle semiring-annotated complex values )NRCK+ srt • Define semantics of K-UXQuery by translation to NRCK+ srt

  12. Semantics of Small Union • Sums annotations «e1[e2¬K (x) := «e1¬K (x) + «e2¬K (x) • Example: Answer: Source: ax ax ax ax a2x , , bz by bz by by Query: return ($S, $T) (in NRC: $S [ $T)

  13. Semantics of Big Union • Sums and multiplies annotations «[(x2e1) e2¬K (y) := «e1¬K (ai) ¢«e2¬K[x := ai] (y) where the support (the set of elements with non-zero annotations) of «e1¬K is {a1, ..., an}

  14. Big Union Example With K = N Source, $T : Answer: b b2 b b b c, c, c, c, c, c, c c7 ´ ´ , , , c c3 c c c c c c c Query: return $T/¤/¤ (in NRC: [(x2 $T) [(y2x) { y })

  15. XPath Descendant Operator Uses srt • //¤ applied to forest $T translates to [(x2 $T) ¼1((srt(b, s) . f) x) where f := let self = Tree(b, [(x2s) {¼2(x)} in let matches = [(x2s) {¼1(x)} in (matches[ {self}, self)) • //a, similar to above

  16. Application: Security Clearances • Data annotated with clearance levels from total order C : P < C < S < T < 0 • Joint use of data (¢) requires access to both (max of clearances); alternative use of data (+) requires access to either (min of clearances) • (C, min, max, 0, P) is a commutative semiring aP p bC cC d min(max(P,C,C),max(P,C,S)) e max(P,C,T) Query:element p { $S/¤/¤} dC dS eT p dC e T

  17. Security Condition: Non-Interference • For any given clearance level (e.g., C), want the following diagram to commute: aP query bC cC dC dS eT erase > C erase > C aP pP pP query bC cC dC eT dC dC

  18. Application: Incomplete XML • Data annotated with Boolean expressions; tree T represents set of possible worlds Mod(T) a a 7 possible worlds b c b cy1 T = a d b a d cy2 b a d a a a cy3 d a b c b b , , ,..., Mod(T) = a d c b a a c d a d c d

  19. Correctness: Possible Worlds • For every incomplete tree T, and every UXQuery query q, want this diagram to commute: Mod T Mod(T) q q Mod q(Mod(T)) = Mod(q(T)) q(T)

  20. Commutation with Homomorphisms • Theorem: Let h : K1K2 be a semiring homo-morphism. Then for any UXQuery query q, and for any K1-UXML document D, we have h(q(D)) = q(h(D)). • Ex: security clearances hc : CChc(k) := if k·c then k else 0 • Ex: incomplete dbs º : BB Evalº : PosBool(B) B • Ex: duplicate elimination ± : NB±(k) := if k = 0 then ? else >

  21. Related Work • Bag semantics for NRC[Libkin&Wong 97] • Incomplete XML [Kanza+ 99, Abiteboul+ 06] • Probabilistic XML [Nierman&Jagadish 02, van Keulen+ 05, Abit.&Senellart 06, Sen.&Abit. 07, Hung+ 07] • XML provenance [Buneman+ 01] • NRC provenance [Hidders+ 07] • Semiring-annotated XPath [Grahne+ 07] • Negation, expressiveness of RAK[Geerts&Poggi 08]

  22. Conclusion • We showed how to annotate unordered XML trees (complex values) with values from a commutative semiring K, and propagate those annotations in queries for a large, positive fragment of XQuery (NRC + srt) • We saw novel applications in security and incomplete dbs, made possible by a fundamental property of our framework, commutation with homomorphisms

  23. Future Work • Practical applications based on framework • Security clearances • Jointly recording provenance, security, multiplicities, uncertainty, etc. (product of semirings is also a semiring!) • Query optimization: containment/equivalence wrt annotated semantics depends on K • In paper, we show K-equivalence for UXQuery is the same as B-equivalence when K is a distributive lattice

  24. K-UXQuery Syntax

More Related