1 / 57

Computing Full Disjunctions

Computing Full Disjunctions. Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem. Overview of the Talk. OR-semantics and weak semantics for querying incomplete data Complexity of query evaluation

haroun
Download Presentation

Computing Full Disjunctions

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Computing Full Disjunctions Yaron Kanza Yehoshua Sagiv The Selim and Rachel Benin School of Engineering and Computer Science The Hebrew University of Jerusalem

  2. Overview of the Talk • OR-semantics and weak semantics for querying incomplete data • Complexity of query evaluation • Full disjunctions as a special case of weak semantics • Generalizing full disjunctions – the join constraints are not restricted to be equality constraints • Lower bounds for some related problems

  3. Querying Incomplete Data Requires a Special Semantics • Usually, answers to a query are completeassignments of database objects (or values) to the query variables • Consequently, partial information is lost • For example, dangling tuples are lost when joining several relations • The purpose of outerjoins and full disjunctions is to solve this problem, i.e., answers could be partial assignments (to some of the variables)

  4. Querying Incomplete Semistructured Data • In semistructured data, incompleteness of data is prevalent • OR-semantics and weak semantics were introduced so that queries over semistructured data would return maximal answers rather than complete answers [Kanza, Nutt & Sagiv 1999]

  5. In the Semistructured Data Model • Both data and queries are labeled rooted directed graphs • Query nodes are variables • Database nodes are objects • Matchings are assignments of database objects to query variables, such that • The database root is assigned to the query root, and • Labels are preserved

  6. 1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year WoodyAllen language 11 9 1/12/1935 7 6 1998 1983 English director acted in acted in A Semistructured Database About Movies

  7. A Query v1 actor movie name title director v3 v2 w3 w1 language date of birth w4 w2 acted in Under complete semantics, the query returns actor-movie pairs, such that the actor played in the movie and was also the director of the movie

  8. 1 2 4 5 10 11 6 1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in v1 actor acted in movie name title director v3 v2 A complete matching of the query variables to database objects w3 w1 language date of birth w4 w2 acted in

  9. r 1 x 9 l l y 11 Constraints on Complete Matchings • The root constraint is satisfied if the query root is mapped to the database root • A query edge is an edge constraint: • A query edge with a label l is satisfied if it is mapped to a database edge with the same label l Query Root Database Root

  10. language 6 6 English English 1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year WoodyAllen language 11 9 1/12/1935 7 1998 1983 director acted in Suppose that Node 6 is missing acted in

  11. 1 2 4 5 10 11 w2 null 1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen 11 9 1/12/1935 7 1998 1983 director acted in v1 actor acted in movie An incomplete matching name title director v3 v2 w3 w1 language date of birth This matching is maximal w4 w2 acted in

  12. Database r 1 1 1 1 l1 l2 r l1 l2 r l2 l1 w 5 7 x l3 w x l3 l4 l4 l3 l4 y 8 9 l5 z l5 y z l6 l6 l5 v v l6 v v 55 55 Query The Reachability Constrainton Partial Matchings • A query node v that is mapped to a database object o satisfies the reachability constraint if there is a path from the query root to v, such that all edge constraints along this path are satisfied

  13. null null x x 9 x 9 x 9 l l l l m l m y y 11 y 11 y 11 null null Weak Satisfaction ofEdge Constraints • An edge constraint is weakly satisfied if it is either • Satisfied (as defined earlier), or • One (or more) of its nodes is mapped to a null value

  14. Weak Matchings • A partial matching is a weak matching if • The root constraint is satisfied • The reachability constraint is satisfied by every query node that is mapped to a database node • Every edge constraint is weakly satisfied

  15. 1 2 4 5 10 11 null 1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen 11 9 1/12/1935 7 1998 1983 director acted in v1 actor acted in movie name title director v3 v2 A weak matching w3 w1 language date of birth w4 w2 w2 acted in

  16. director director 1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year WoodyAllen 11 9 1/12/1935 7 1998 1983 acted in acted in A Movie Database Consider the case where the director edge is missing

  17. 1 2 4 5 10 11 There is an edge that is not weakly satisfied null 1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen 11 9 1/12/1935 7 1998 1983 acted in v1 actor acted in movie An incomplete matching that is not a weak matching name title director v3 v2 w3 w1 language date of birth w4 w2 w2 acted in

  18. OR Matchings • A partial matching is an OR matching if • The root constraint is satisfied • The reachability constraint is satisfied by every query node that is mapped to a database node Differently from a weak matching, in an OR Matching, an edge constraint does not have to be weakly satisfied

  19. Maximal Matchings • Matchings can be represented as tuples (where numbers are object id’s) • A matching t1subsumesa matching t2 if t1 can be obtained from t2 by replacing some nulls in t2 with non-null values • A matching is maximal if no other matching subsumes it • A query result consists only of maximal matchings t1=(1, 5, 2, null) t2=(1, null, 2, null)

  20. More Examples

  21. 1 movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year WoodyAllen language 11 9 1/12/1935 7 6 1998 1983 English director acted in acted in The Movie Database Before the Removals

  22. 1 2 4 5 10 11 6 1 In the result, the actor must be both an actor in the movie and the director of the movie movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in v1 actor acted in movie name title director v3 v2 w3 w1 language It is also a maximal OR-matching A complete matching It is also a maximal weak matching date of birth w4 w2 acted in

  23. 1 3 8 null null null null 1 In the result, if the actor and the movie are assigned non-null values, then the actor must be both an actor in the movie and the director of the movie movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in v1 actor acted in movie name title director v3 v2 w3 w1 language date of birth A second maximal weak matching w4 w2 acted in

  24. 1 3 4 8 10 11 It is not a weak matching null 1 In the result, the actor either played in the movie, directed the movie, or is not related at all to the movie movie actor movie 2 3 4 title title name 5 8 year date of birth 10 Zelig Antz year Woody Allen language 11 9 1/12/1935 7 6 1998 1983 English director acted in v1 actor acted in movie name title director v3 v2 w3 w1 language date of birth A maximal OR-matching w4 w2 acted in

  25. Complexity of Evaluating Maximal Weak Matchingsand Maximal OR Matchings

  26. Data Complexity • Under data complexity, the time complexity is a function of • the size of the database

  27. Two Alternatives forQuery Evaluation • A naïve algorithm computes all matchings and then removes subsumed matchings • A better algorithm avoids computing all matchings – ideally it only computes maximal matchings • Under data complexity, both algorithms are polynomial time

  28. Input-Output Complexity • Under input-output complexity, the time complexity is a function of • the size of the query, • the size of the database, and • the size of the result

  29. A Naïve Algorithm vs.A Better Algorithm • Under I-O complexity, a naïve algorithm is exponential • Is there a better algorithm with a polynomial time I-O complexity? • The answer is positive for DAG queries [Kanza, Nutt & Sagiv 1999]

  30. Cyclic Queries Theorem: For a query Q and a database D, the set of all maximal weak matchings can be computed in O(q3dm2) time, where q is the size of the query, d is the size of the database and m is the size of the result (computing all maximal OR matchings has the same complexity)

  31. Full Disjunctions What is the full disjunction of a set of relations? How are full disjunctions related to queries with incomplete answers?

  32. Movies Actors Acted-in Actors-that-Directed The Full Disjunction of the Given Relations

  33. This tuple will not be in the full disjunction Movies The Full Disjunction of the Given Relations The full disjunction does not include subsumed tuples

  34. Movies Actors Acted-in Actors-that-Directed The Full Disjunction of the Given Relations The full disjunction does not include tuples that are based on Cartesian Product rather than join

  35. In the Full Disjunctionof a Given Set of Relations: Every tuple of the input is a part of at least one tuple of the output Tuples are joined as in a natural join, padded with null values The result includes only “maximal connected portions”

  36. Motivation for Full Disjunctions • Full disjunctions have been proposed by Galiando-Legaria as an alternative for outerjoins [SIGMOD’94] • Rajaraman and Ullman suggested to use full disjunctions for information integration [PODS’96]

  37. Computing Full Disjunctionsfor γ-acyclic Relation Schemas • Rajaraman and Ullman have shown how to evaluate the full disjunction by a sequence of natural outerjoins when the relation schemas are γ-acyclic • Hence, the full disjunction can be computed in polynomial time, under input-output complexity, when the relation schemas are γ-acyclic

  38. Weak Semantics GeneralizesFull Disjunctions • Relations can be converted into a semistructured database • The full disjunction can be expressed as the union of several queries that are evaluated under weak semantics

  39. Example r Movies Actors Acted-in A node is created for each tuple Edges are added between connected tuples, in both directions A root is added, and edges are added from the root to every node We use colors instead of labels Creating The Database

  40. r Acted-in Actors Movies Example Movies Actors Acted-in A node is created for each relation schema Edges are added between connected schemas, in both directions The number of queries is equal to the number of schemas In each query, the root is connected to a different schema r Creating The Queries

  41. Example r Movies Actors Acted-in r Acted-in Actors Movies Queries are Evaluated under Weak Semantics

  42. Example r Movies Actors Acted-in r Acted-in Actors Movies Queries are Evaluated under Weak Semantics

  43. Example r Movies Actors Acted-in r Acted-in Actors Movies Queries are Evaluated under Weak Semantics

  44. null null Example r Movies Actors Acted-in r Acted-in Actors Movies Queries are Evaluated under Weak Semantics

  45. Example r Movies Actors Acted-in r Acted-in Actors Movies Queries are Evaluated under Weak Semantics

  46. null null Example r Movies Actors Acted-in r Acted-in Actors Movies

  47. The Algorithm Computes Full Disjunctions in Polynomial TimeUnder Input-Output Complexity Theorem: The full disjunction of relations r1, …, rn can be computed in O(n5s2f 2) time, where n is the number of relations, s is the total size of all the relations and f is the size of the result

  48. Generalizing Full Disjunctions • In a full disjunction, tuples are joined according to equality constraints as in a natural join (or equi-join) • We can generalize full disjunctions to support constraints that are not merely equality among attributes

  49. The date of the historical event is a date in the year when the movie was released The filming location is near the historical site Example Movies (m-id, title, year, language, location) Actors (a-id, name, date-of-birth) Acted-in (a-id, m-id, role) Actors-that-Directed (a-id, m-id) Historical-Events (name, date, description) Historical-Sites (Country, State, City, Site)

  50. The General Idea • A set of constraints specifies how tuples should be joined • The queries and the database are constructed according to the given constraints • A pair of nodes is connected by an edge when it satisfies the corresponding constraint • Queries are evaluated w.r.t. the database under weak semantics

More Related