460 likes | 595 Views
An Abstract Framework for Generating Maximal Answers to Queries. Sara Cohen, Yehoshua Sagiv. Motivation. Queries and Databases. Answers and Semantics. Graph Properties. The Problem. In many different domains, we are given the option to query some source of information
E N D
An Abstract Framework for Generating Maximal Answers to Queries Sara Cohen, Yehoshua Sagiv ICDT 2005
Motivation Queries and Databases Answers and Semantics Graph Properties ICDT 2005
The Problem • In many different domains, we are given the option to query some source of information • Usually, the user only gets results if the query can be completely answered (satisfied) • In many domains, this is not appropriate, e.g., • The user is not familiar with the database • The database does not contain complete information • There is a mismatch between the ontology of the user and that of the database • The query is a “search” that is not expected to be correct ICDT 2005
Search for papers by “Smith” that appeared in ICDT 2004 ICDT 2005
Sorry, no matching record found ICDT 2005
Search for buses from “Haifa-Technion” to “Ben Gurion Airport” ICDT 2005
There is no direct bus line between the required destinations ICDT 2005
Search for buses to “Ben Gurion Airport” ICDT 2005
Must choose From and To ICDT 2005
What Do Users Need? • Users need a way to get interesting partial answers to their queries, especially if a complete answer does not exist • These partial answers should contain maximal information • Main Problems: • What should be the semantics of partial answers? • How can all partial answers be efficiently computed? ICDT 2005
Previous Work • Many solutions have been given for the main problems • solutions differ, according to the problem domain • Examples: • Full disjunctions: Galindo-Legaria (94), Rajaraman, Ullman (96), Kanza, Sagiv (03) • Queries with incomplete answers over semistructured data: Kanza, Nutt, Sagiv (99) • FleXPath: Amer-Yahia, Lakshmanan, Pandit (04) • Interconnections: Cohen, Kanza, Sagiv (03) ICDT 2005
Our Contribution • In the past, for each semantics considered, the query evaluation problem had to be studied anew. In this paper, we: • Present a general framework for defining semantics for partial answers • Framework is general enough to cover most previously studied semantics • Query evaluation problem can be solved once within this framework – and reused for new semantics • Results improve upon previous evaluation algorithms • Presents relationship between this problem and that of the maximal P-subgraph problem ICDT 2005
Motivation Queries and Databases Answers and Semantics Graph Properties ICDT 2005
Databases • Databases are modeled as data graphs: (V, E, r, lV, lE) • r: Can have a designated root • lV: Labels on the vertices • lE: Labels on the edges • Note: • Nodes correspond to data items • Even databases that do not have an inherent graph structure can be modeled as graphs, e.g., relational databases ICDT 2005
XML as a Data Graph University Name Dept Dept Technion Name Faculty Name Faculty Computer Science Biology Professor Lecturer Teaches Teaches Teaches Name Name Avi Levy Bioinformatics Chana Israeli Databases Molecular Biology ICDT 2005
Relational Database as a Data Graph Sites Climates Accommodations ICDT 2005
(C, (Canada, diverse)) (C, (UK, temporate)) (C, (USA, temporate)) Relational Database as a Data Graph Sites Climates Accommodations ICDT 2005
(A, (UK, London, Plaza)) (C, (Canada, diverse)) (C, (UK, temporate)) (A, (Canda, Montreal, Hilton)) (C, (USA, temporate)) (A, (Canda, Toronto, Ramada)) Relational Database as a Data Graph Sites Accommodations ICDT 2005
(S, (UK, London, Buckingham)) (A, (UK, London, Plaza)) (C, (Canada, diverse)) (S, (USA, NY, Metropolitan)) (C, (UK, temporate)) (A, (Canda, Montreal, Hilton)) (C, (USA, temporate)) (A, (Canda, Toronto, Ramada)) Relational Database as a Data Graph Sites ICDT 2005
(S, (UK, London, Buckingham)) (A, (UK, London, Plaza)) (C, (Canada, diverse)) (S, (USA, NY, Metropolitan)) (C, (UK, temporate)) (A, (Canda, Montreal, Hilton)) (C, (USA, temporate)) (A, (Canda, Toronto, Ramada)) Relational Database as a Data Graph ICDT 2005
Queries • Queries are modeled as query graphs: (V, E, r, CV, CE, s) • r:Can have a designated root • CV : Vertex constraints on the vertices (basically, a boolean function on vertices) • CE : Edge constraints on the edges (basically, a boolean function on pairs of vertices) • s:A structural constraint, one of the letters C, R, N(defines the required structure of answers, i.e., connected,rooted or none) • Note: Nodes correspond to query variables ICDT 2005
XML Query as a Graph • Returns faculty members from the Biology Department = University Is Descendent = Dept and ContainsText(Biology) Is Child Structural Constraint: Rooted = Faculty Is GrandChild = Name ICDT 2005
Join Query as a Graph • C A S Structural Constraint: Connected Belongs to: C q1 C.Country = A.Country C.Country = S.Country q2 q3 Belongs to: A Belongs to: S A.Country = S.Company and A.City = S.City ICDT 2005
Motivation Queries and Databases Answers and Semantics Graph Properties ICDT 2005
Assignment Graphs • Assignment graphs are used to compactly represent assignments of query nodes to database nodes • Basically, assignment graph for Q and D, written QD has: • Node (q,d) for each pair q Q and d D such that d satisfies the constraint on q • Edge ((q,d), (q’,d’)) if there is an edge (q,q’) in Q and (d,d’) satisfies the constraint on (q,q’) • May also have a root (details omitted) ICDT 2005
(S, (UK, London, Buckingham)) s1 (C, (UK, temporate)) (A, (UK, London, Plaza)) c1 a1 (A, (Canda, Toronto, Ramada)) (C, (Canada, diverse)) a2 c2 (C, (USA, temporate)) (A, (Canda, Montreal, Hilton)) a3 c3 s2 (S, (USA, NY, Metropolitan)) Belongs to: C (q1, c1) q1 C.Country = A.Country C.Country = S.Country (q1, c2) (q1, c3) Belongs to: S Belongs to: A q2 q3 A.Country = S.Company and A.City = S.City ICDT 2005
(S, (UK, London, Buckingham)) s1 (C, (UK, temporate)) (A, (UK, London, Plaza)) c1 a1 (A, (Canda, Toronto, Ramada)) (C, (Canada, diverse)) a2 c2 (C, (USA, temporate)) (A, (Canda, Montreal, Hilton)) a3 c3 s2 (S, (USA, NY, Metropolitan)) Belongs to: C (q2, a1) (q1, c1) q1 C.Country = A.Country C.Country = S.Country (q2, a2) (q1, c2) (q1, c3) (q2, a3) Belongs to: S Belongs to: A q2 q3 A.Country = S.Company and A.City = S.City ICDT 2005
(S, (UK, London, Buckingham)) s1 (C, (UK, temporate)) (A, (UK, London, Plaza)) c1 a1 (A, (Canda, Toronto, Ramada)) (C, (Canada, diverse)) a2 c2 (C, (USA, temporate)) (A, (Canda, Montreal, Hilton)) a3 c3 s2 (S, (USA, NY, Metropolitan)) (q3, s1) Belongs to: C (q2, a1) (q1, c1) q1 C.Country = A.Country C.Country = S.Country (q2, a2) (q1, c2) (q1, c3) (q2, a3) Belongs to: S Belongs to: A q2 q3 A.Country = S.Company and A.City = S.City (q3, s2) ICDT 2005
(S, (UK, London, Buckingham)) s1 (C, (UK, temporate)) (A, (UK, London, Plaza)) c1 a1 (A, (Canda, Toronto, Ramada)) (C, (Canada, diverse)) a2 c2 (C, (USA, temporate)) (A, (Canda, Montreal, Hilton)) a3 c3 s2 (S, (USA, NY, Metropolitan)) (q3, s1) Belongs to: C (q2, a1) (q1, c1) q1 C.Country = A.Country C.Country = S.Country (q2, a2) (q1, c2) (q1, c3) (q2, a3) Belongs to: S Belongs to: A q2 q3 A.Country = S.Company and A.City = S.City (q3, s2) ICDT 2005
(S, (UK, London, Buckingham)) s1 (C, (UK, temporate)) (A, (UK, London, Plaza)) c1 a1 (A, (Canda, Toronto, Ramada)) (C, (Canada, diverse)) a2 c2 (C, (USA, temporate)) (A, (Canda, Montreal, Hilton)) a3 c3 s2 (S, (USA, NY, Metropolitan)) (q3, s1) Belongs to: C (q2, a1) (q1, c1) q1 C.Country = A.Country C.Country = S.Country (q2, a2) (q1, c2) (q1, c3) (q2, a3) Belongs to: S Belongs to: A q2 q3 A.Country = S.Company and A.City = S.City (q3, s2) ICDT 2005
(S, (UK, London, Buckingham)) s1 (C, (UK, temporate)) (A, (UK, London, Plaza)) c1 a1 (A, (Canda, Toronto, Ramada)) (C, (Canada, diverse)) a2 c2 (C, (USA, temporate)) (A, (Canda, Montreal, Hilton)) a3 c3 s2 (S, (USA, NY, Metropolitan)) Belongs to: C q1 C.Country = A.Country C.Country = S.Country Belongs to: S Belongs to: A q2 q3 A.Country = S.Company and A.City = S.City (q3, s1) (q2, a1) (q1, c1) (q2, a2) (q1, c2) (q1, c3) (q2, a3) ICDT 2005 (q3, s2)
Partial Assignment • A partial assignment is any subgraph of QD that does not contain two different nodes (q,d) and (q,d’) • otherwise, would map the node q to two different database nodes • Can distinguish special types of partial assignments: • vertex complete • edge complete • structurally consistent Every query node must appear in the partial assignment The partial assignment satisfies the query’s structural constraint Every edge constraint between query variables in the partial assignment holds ICDT 2005
Vertex Complete, Edge Complete, Structurally Consistent Vertex Complete, Edge Complete, Structurally Consistent Vertex Complete, Edge Complete, Structurally Consistent Example (q3, s1) (q2, a1) (q1, c1) (q2, a2) (q1, c2) (q1, c3) (q2, a3) (q3, s2) ICDT 2005
Semantics • All partial assignments for Q over D that satisfy the vertex and edge constraints are encoded in QD • A semantics defines which subgraphs of the answer graph (i.e., which partial assignments) are in fact answers, e.g., • Sves allows all partial assignments that are vertex complete, edge complete and structurally consistent • Ses allows all partial assignments that are edge complete and structurally consistent • Ss allows all partial assignments that are structurally consistent • Usually, we are only interested in maximal partial assignemnts ICDT 2005
Example: Join (q3, s1) Using semantics Sves we get the natural join (q2, a1) (q1, c1) (q2, a2) (q1, c2) (q1, c3) (q2, a3) (q3, s2) ICDT 2005
Example: Join “becomes” a Full Disjunction (q3, s1) Using semantics Ses we get the full disjunction (q2, a1) (q1, c1) (q2, a2) (q1, c2) (q1, c3) (q2, a3) (q3, s2) ICDT 2005
Other Examples • Queries with incomplete answers over semistructured data: Kanza, Nutt, Sagiv (PODS 99) • Weak semantics modeled by Ses;Or-semantics modeled by Ss • FleXPath: Amer-Yahia, Lakshmanan, Pandit (Sigmond 04) • Modeled by Ses • Interconnections: Cohen, Kanza, Sagiv (03) • Complete interconnection can be modeled by Ses; Reachable interconnection can be modeled by Ss ICDT 2005
Motivation Queries and Databases Answers and Semantics Graph Properties ICDT 2005
Semantics are a type of Graph Property • A graph property Pis a set of graphs, e.g., • is a clique • is a bipartite graph • A semantics defines a set of graphs, for every Q, D (these graphs are subgraphs of QD) • Therefore, semantics are a type of graph property ICDT 2005
Hereditary Graph Properties and their Variants • There are several interesting types of graph properties that have been studied in graph theory • A graph property P is hereditary if every induced subgraph of a graph in P, is also in P (e.g., clique, is a forest) • A graph property P is connected-hereditary if every connected induced subgraph of a graph in P, is also in P (e.g., is a tree) • Can define rooted-hereditarysimilarly ICDT 2005
Semantics are usually Hereditary • Most semantics for partial answers considered in the past are hereditary (in some sense), i.e., subgraphs of a partial answer are also partial answers • Many semantics require connectivity of results (e.g., full disjunctions) • Some require answers to be rooted (e.g., FlexPath) ICDT 2005
Maximal P-Subgraph Problem • Given a graph property P, and a graph G The maximal P-subgraph problem is: Find all maximal induced subgraphs of G that have property P • Therefore, the problem of finding all maximal answers for a query over a database, under a given semantics, is a special case of the maximal P-subgraph problem ICDT 2005
Efficient Query Evaluation • There are efficient algorithms that find all maximal P-subgraphs for hereditary, connected hereditary and rooted hereditary properties • Efficient in terms of the input and the output (i.e., incremental polynomial time) • Use these algorithms to find maximal query answers, e.g., to find full disjunctions, weak answers, or-answers, etc. • Improves upon previous results ICDT 2005
Conclusion • Presented abstract framework • Can model many different types of queries, databases and semantics in the framework • Semantics in the framework are graph properties • Solve the maximal P-subgraph problem once and reuse it to find maximal query answers ICDT 2005
Future Work • It is convenient to define ranking functions and return answers in ranking order • How/when can this be done in our framework? • Note: From the modeling it is immediately apparent that ranking cannot always be performed efficiently • The problem of finding a maximal P-subgraph of size k is NP complete for hereditary and connected-hereditary graph properties (Yannakakis, STOC 78) ICDT 2005
Thank you!Questions? ICDT 2005