1 / 18

On Propagation of Deletions and Annotations through Views

On Propagation of Deletions and Annotations through Views. Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna. Data Annotations (share annotations). Knowledge sharing through “annotations”

Download Presentation

On Propagation of Deletions and Annotations through Views

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. On Propagation of Deletions and Annotations through Views Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna

  2. Data Annotations (share annotations) • Knowledge sharing through “annotations” • Annotations on data at various levels of granularity, annotations on annotations • Improve accuracy of data • data and annotations can be reviewed by independent parties • Annotations: • loosely structured • Source Data: • proprietary • fixed schema • A system that overlays annotations on existing data • “big business” in scientific databases Wang-Chiew Tan, Penn Database Group

  3. Serves fine French Cuisine in elegant setting. Jackets required. Cost Type Restaurant Pacifica $ Chinese $ Soho Kitchen & Bar American Cost Type Restaurant Extensive wine list! Peacock Alley $$$ French Bull & Bear $$$ Seafood Pacifica $ Chinese $ Soho Kitchen & Bar American Data Annotations (share annotations) NYRestaurants (Source Table) Cost Type Restaurant Zip Peacock Alley $$$ French 10022 Bull & Bear $$$ Seafood 10022 Pacifica $ Chinese 10013 $ Soho Kitchen & Bar American 10022 Yummy chicken curry!! Cheap Restaurants (View 2) All Restaurants (View 1) Wang-Chiew Tan, Penn Database Group

  4. Query Data Annotations • Communicate “meta data” through annotations • “bounce” or “spread” annotations around by piggybacking annotations on data items in the source-query-view model. • An annotation is placed in the view • where do we place the annotation on source? • Annotation placement problem presented in relational setting • results carry over to fragments of XML (hierarchical model) Model: Source: Relational Database View : result of query applied on source Not an easy problem! Wang-Chiew Tan, Penn Database Group

  5. Location and Propagation Rules relation name tuple in R A is an attribute in schema of R • A location is a triple: (R, t, A) • Propagation Rules: • Select: • Project: • Join: • Union: A1 A2 A3 A1 A2 A3 R A1 A2 A3 A3 R A1 A2 A2 A3 A1 A2 A3 R2 R1 A1 A2 A3 A1 A2 A3 R1 A1 A2 A3 R2 Wang-Chiew Tan, Penn Database Group

  6. V=Q(S) Q Annotation Placement Problem • Annotation Placement Problem: • Given a view V = Q(S) and an annotation A placed in the view V, decide if there is an annotation in the source that when propagated to the view, produces no other annotation except A. • Q = query • S = data source • “side-effect-free annotation” : an annotation on the source that produces no other annotation except A in the view S Wang-Chiew Tan, Penn Database Group

  7. V=Q(S) Q S A Dichotomy Theorem Theorem: (a) It is NP-hard to decide if there is a side-effect-free annotation for a PJ query. (b) There is a polynomial time algorithm for queries which do not simultaneously contain a Project and a Join operation. Wang-Chiew Tan, Penn Database Group

  8. x3 x5 x2 Cm Assignment tuples: All possible satisfying assignments for Cm F T F Cm F F F C1 Assignment tuples: All possible satisfying assignments for C1 F F T Cm F T F C1 F T T Cm F F T C1 T F F Cm F T T C1 T T F Cm T F F C1 T F T Cm T F T C1 T T T Cm T T T C1 d d d Cm Dummy tuple d d d C1 Dummy tuple Query Output ... Cm C1 Query:Join, then Project on C1 … Cm Project and Join Query T - true F - false . . . C1 Cm • Intuition: PJ can encode 3SAT (x1 + x2 + x3) . . . ( x3 + x5 + x2) x1 x2 x3 C1 ... Wang-Chiew Tan, Penn Database Group

  9. Project and Join Query T - true F - false C1 Cm • Intuition: PJ can encode 3SAT (x1 + x2 + x3) … ( x3 + x5 + x2) x3 x5 x2 Cm x1 x2 x3 C1 Assignment tuples: All possible satisfying assignments for Cm F T F Cm F F F C1 Assignment tuples: All possible satisfying assignments for C1 F F T Cm F T F C1 ... F T T Cm F F T C1 T F F Cm F T T C1 T T F Cm T F F C1 T F T Cm T F T C1 T T T Cm T T T C1 d d d Cm d d d C1 Dummy tuple Dummy tuples d d d C’m Output ... Cm C1 Query: Join, then Project on C1 … Cm ... C’m C1 Wang-Chiew Tan, Penn Database Group

  10. Related Work on Annotations • Superimposed Information (D. Maier, L. Delcambre [WebDB’99]) • data “placed over” existing information eg. bookmark files, schema of a database • Annotation Systems • Annotea (W3C) • annotate web pages • location is defined with XPointer • Multivalent Browser (R. Wilensky, T. A. Phelps. UC Berkeley DL Project) • annotate on PDF files, HTML, etc. • robust locations • BioDAS (Distributed Annotation Server) (L.Stein et. al ) • annotate on genome sequences • notion of location is genome specific • No one has formally studied annotation placement problem Wang-Chiew Tan, Penn Database Group

  11. The classical view deletion problem • A view tuple is to be deleted • What changes should be made to the source? • Many kinds of view-to-source deletion translations • eg. deletion-to-insertion, deletion-to-modification, etc. • Update Semantics of Relational Views (F. Banchilon, N. Spyratos, [TODS’81]) • On the correct translation of Update Operations on Relational Views (U. Dayal, P. Bernstein, [TODS’82]) • Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins (A. M. Keller, [PODS’85]) • deletion-to-deletion • Run-Time translations of View Tuple Deletions Using Data Lineage (Y. Cui, J. Widom, [2001]) • exploits lineage information to find “side-effect free” deletions whenever possible Wang-Chiew Tan, Penn Database Group

  12. Query View Deletion Problem(Deletion-to-deletion translation) • View Deletion Problem (minimize view side-effect): • Given a view V=Q(S) and a tuple t in V, decide if there is a side-effect free deletion for t • “side-effect-free deletion” : a set of source tuples whose removal from the database will only remove t from the view Source: Relational Database View : result of query applied on source Wang-Chiew Tan, Penn Database Group

  13. A Dichotomy Theorem Theorem: (a) It is NP-hard to decide if there is a side-effect free deletion for a PJ or JU query in normal form. (b) There is a polynomial time algorithm to find the set of source deletions with minimum side-effects for all other queries, i.e., queries that involve only S,P,U or S,J operators). • Theorem (a) is true even for a constant size PJ query involving only two relations! PROJ A,C(R1 JOIN R2) Wang-Chiew Tan, Penn Database Group

  14. PROJ A,C(R1 JOIN R2) x1 c x2 c a x1 A C For each xi, decide whether to delete (a,xi) or (xi,c). x3 c a x2 a c (x1+x2+x3)(x2+x4+x5)(x4+x1+x3) x4 c a x3 a c1 x5 c a x4 a c3 x1 c1 a x5 c2 c x2 c1 c2 x2 c2 c1 x3 c1 c2 x4 c2 c3 x4 c3 c2 x5 x1 c3 x3 c3 View Deletion: PJ Query Theorem: It is NP-hard to decide if there is a side-effect free deletion for a PJ query in normal form. R2 R1 B C A B Wang-Chiew Tan, Penn Database Group

  15. Ongoing and Future Work • Implementation of annotation system • on RDBMS • special cases of PJ queries with polynomial time algorithm • PJ queries that do not project out key information • on XML • effects on query languages? Wang-Chiew Tan, Penn Database Group

  16. =a [Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”] • Equivalent queries in the same language, but different annotation behavior Q1= SELECT e.Name, e.Sal FROM Empe WHERE e.Sal = “50K” Q2= SELECT e.Name, “50K” AS Sal FROM Emp e WHERE e.Sal = “50K” [Name:”Joe”, Sal:50k ] [Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”] Do we need an “annotation-conscious” QL? Emp(Name, Sal, Dept) [Name:”Joe”, Sal:50K , Dept:”Marketing” ] Department(Dept, Manager) [Dept:”Marketing” , Manager:”Jane”] • The same query in different languages, but different annotation behavior Relational Algebra: Emp JOIN Department SQL: SELECT e.Name, e.Sal, e.Dept, d.Manager FROM Empe, Department d WHERE e.Dept = d.Dept [Name:”Joe”, Sal:50k] Wang-Chiew Tan, Penn Database Group

  17. Do we need an “annotation-conscious” QL? • Relational algebra seems to suggest a natural set of propagation rules • SQL seems to suggest another natural propagation rule • one that is based on variable bindings • Not clear how we extend the semantics of query languages so that annotation propagation is “well-behaved”. • Should a query language be “annotation-conscious” ? OR • Should the user be allowed to control which annotation gets propagated to where? Wang-Chiew Tan, Penn Database Group

  18. End of Talk Wang-Chiew Tan, Penn Database Group

More Related