180 likes | 287 Views
On Propagation of Deletions and Annotations through Views. Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna. Data Annotations (share annotations). Knowledge sharing through “annotations”
E N D
On Propagation of Deletions and Annotations through Views Wang-Chiew Tan University of Pennsylvania Database Group Joint work with Peter Buneman and Sanjeev Khanna
Data Annotations (share annotations) • Knowledge sharing through “annotations” • Annotations on data at various levels of granularity, annotations on annotations • Improve accuracy of data • data and annotations can be reviewed by independent parties • Annotations: • loosely structured • Source Data: • proprietary • fixed schema • A system that overlays annotations on existing data • “big business” in scientific databases Wang-Chiew Tan, Penn Database Group
Serves fine French Cuisine in elegant setting. Jackets required. Cost Type Restaurant Pacifica $ Chinese $ Soho Kitchen & Bar American Cost Type Restaurant Extensive wine list! Peacock Alley $$$ French Bull & Bear $$$ Seafood Pacifica $ Chinese $ Soho Kitchen & Bar American Data Annotations (share annotations) NYRestaurants (Source Table) Cost Type Restaurant Zip Peacock Alley $$$ French 10022 Bull & Bear $$$ Seafood 10022 Pacifica $ Chinese 10013 $ Soho Kitchen & Bar American 10022 Yummy chicken curry!! Cheap Restaurants (View 2) All Restaurants (View 1) Wang-Chiew Tan, Penn Database Group
Query Data Annotations • Communicate “meta data” through annotations • “bounce” or “spread” annotations around by piggybacking annotations on data items in the source-query-view model. • An annotation is placed in the view • where do we place the annotation on source? • Annotation placement problem presented in relational setting • results carry over to fragments of XML (hierarchical model) Model: Source: Relational Database View : result of query applied on source Not an easy problem! Wang-Chiew Tan, Penn Database Group
Location and Propagation Rules relation name tuple in R A is an attribute in schema of R • A location is a triple: (R, t, A) • Propagation Rules: • Select: • Project: • Join: • Union: A1 A2 A3 A1 A2 A3 R A1 A2 A3 A3 R A1 A2 A2 A3 A1 A2 A3 R2 R1 A1 A2 A3 A1 A2 A3 R1 A1 A2 A3 R2 Wang-Chiew Tan, Penn Database Group
V=Q(S) Q Annotation Placement Problem • Annotation Placement Problem: • Given a view V = Q(S) and an annotation A placed in the view V, decide if there is an annotation in the source that when propagated to the view, produces no other annotation except A. • Q = query • S = data source • “side-effect-free annotation” : an annotation on the source that produces no other annotation except A in the view S Wang-Chiew Tan, Penn Database Group
V=Q(S) Q S A Dichotomy Theorem Theorem: (a) It is NP-hard to decide if there is a side-effect-free annotation for a PJ query. (b) There is a polynomial time algorithm for queries which do not simultaneously contain a Project and a Join operation. Wang-Chiew Tan, Penn Database Group
x3 x5 x2 Cm Assignment tuples: All possible satisfying assignments for Cm F T F Cm F F F C1 Assignment tuples: All possible satisfying assignments for C1 F F T Cm F T F C1 F T T Cm F F T C1 T F F Cm F T T C1 T T F Cm T F F C1 T F T Cm T F T C1 T T T Cm T T T C1 d d d Cm Dummy tuple d d d C1 Dummy tuple Query Output ... Cm C1 Query:Join, then Project on C1 … Cm Project and Join Query T - true F - false . . . C1 Cm • Intuition: PJ can encode 3SAT (x1 + x2 + x3) . . . ( x3 + x5 + x2) x1 x2 x3 C1 ... Wang-Chiew Tan, Penn Database Group
Project and Join Query T - true F - false C1 Cm • Intuition: PJ can encode 3SAT (x1 + x2 + x3) … ( x3 + x5 + x2) x3 x5 x2 Cm x1 x2 x3 C1 Assignment tuples: All possible satisfying assignments for Cm F T F Cm F F F C1 Assignment tuples: All possible satisfying assignments for C1 F F T Cm F T F C1 ... F T T Cm F F T C1 T F F Cm F T T C1 T T F Cm T F F C1 T F T Cm T F T C1 T T T Cm T T T C1 d d d Cm d d d C1 Dummy tuple Dummy tuples d d d C’m Output ... Cm C1 Query: Join, then Project on C1 … Cm ... C’m C1 Wang-Chiew Tan, Penn Database Group
Related Work on Annotations • Superimposed Information (D. Maier, L. Delcambre [WebDB’99]) • data “placed over” existing information eg. bookmark files, schema of a database • Annotation Systems • Annotea (W3C) • annotate web pages • location is defined with XPointer • Multivalent Browser (R. Wilensky, T. A. Phelps. UC Berkeley DL Project) • annotate on PDF files, HTML, etc. • robust locations • BioDAS (Distributed Annotation Server) (L.Stein et. al ) • annotate on genome sequences • notion of location is genome specific • No one has formally studied annotation placement problem Wang-Chiew Tan, Penn Database Group
The classical view deletion problem • A view tuple is to be deleted • What changes should be made to the source? • Many kinds of view-to-source deletion translations • eg. deletion-to-insertion, deletion-to-modification, etc. • Update Semantics of Relational Views (F. Banchilon, N. Spyratos, [TODS’81]) • On the correct translation of Update Operations on Relational Views (U. Dayal, P. Bernstein, [TODS’82]) • Algorithms for Translating View Updates to Database Updates for Views Involving Selections, Projections and Joins (A. M. Keller, [PODS’85]) • deletion-to-deletion • Run-Time translations of View Tuple Deletions Using Data Lineage (Y. Cui, J. Widom, [2001]) • exploits lineage information to find “side-effect free” deletions whenever possible Wang-Chiew Tan, Penn Database Group
Query View Deletion Problem(Deletion-to-deletion translation) • View Deletion Problem (minimize view side-effect): • Given a view V=Q(S) and a tuple t in V, decide if there is a side-effect free deletion for t • “side-effect-free deletion” : a set of source tuples whose removal from the database will only remove t from the view Source: Relational Database View : result of query applied on source Wang-Chiew Tan, Penn Database Group
A Dichotomy Theorem Theorem: (a) It is NP-hard to decide if there is a side-effect free deletion for a PJ or JU query in normal form. (b) There is a polynomial time algorithm to find the set of source deletions with minimum side-effects for all other queries, i.e., queries that involve only S,P,U or S,J operators). • Theorem (a) is true even for a constant size PJ query involving only two relations! PROJ A,C(R1 JOIN R2) Wang-Chiew Tan, Penn Database Group
PROJ A,C(R1 JOIN R2) x1 c x2 c a x1 A C For each xi, decide whether to delete (a,xi) or (xi,c). x3 c a x2 a c (x1+x2+x3)(x2+x4+x5)(x4+x1+x3) x4 c a x3 a c1 x5 c a x4 a c3 x1 c1 a x5 c2 c x2 c1 c2 x2 c2 c1 x3 c1 c2 x4 c2 c3 x4 c3 c2 x5 x1 c3 x3 c3 View Deletion: PJ Query Theorem: It is NP-hard to decide if there is a side-effect free deletion for a PJ query in normal form. R2 R1 B C A B Wang-Chiew Tan, Penn Database Group
Ongoing and Future Work • Implementation of annotation system • on RDBMS • special cases of PJ queries with polynomial time algorithm • PJ queries that do not project out key information • on XML • effects on query languages? Wang-Chiew Tan, Penn Database Group
=a [Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”] • Equivalent queries in the same language, but different annotation behavior Q1= SELECT e.Name, e.Sal FROM Empe WHERE e.Sal = “50K” Q2= SELECT e.Name, “50K” AS Sal FROM Emp e WHERE e.Sal = “50K” [Name:”Joe”, Sal:50k ] [Name:”Joe”, Sal:50K , Dept:”Marketing” , Manager:”Jane”] Do we need an “annotation-conscious” QL? Emp(Name, Sal, Dept) [Name:”Joe”, Sal:50K , Dept:”Marketing” ] Department(Dept, Manager) [Dept:”Marketing” , Manager:”Jane”] • The same query in different languages, but different annotation behavior Relational Algebra: Emp JOIN Department SQL: SELECT e.Name, e.Sal, e.Dept, d.Manager FROM Empe, Department d WHERE e.Dept = d.Dept [Name:”Joe”, Sal:50k] Wang-Chiew Tan, Penn Database Group
Do we need an “annotation-conscious” QL? • Relational algebra seems to suggest a natural set of propagation rules • SQL seems to suggest another natural propagation rule • one that is based on variable bindings • Not clear how we extend the semantics of query languages so that annotation propagation is “well-behaved”. • Should a query language be “annotation-conscious” ? OR • Should the user be allowed to control which annotation gets propagated to where? Wang-Chiew Tan, Penn Database Group
End of Talk Wang-Chiew Tan, Penn Database Group