160 likes | 244 Views
Local as View: Some refinements. IM: Filtering irrelevant sources Views with restricted access patterns A summary of IM. IM: Filtering irrelevant sources. When there are many sources, it is important to weed out those that are irrelevant to a query
E N D
Local as View: Some refinements IM: Filtering irrelevant sources Views with restricted access patterns A summary of IM lav-ii
IM: Filtering irrelevant sources When there are many sources, it is important to weed out those that are irrelevant to a query Comparison constraints can help (e.g., qu >= w98) What more can be done? The IM system suggests to introduce classes with a class hierarchy into source descriptions lav-ii
car Example : -- disjoint classes Additionally, the global schema contains a relation details(car, year, mileage, price, sellerContact) [ c, y, mi, p, s ] (we will also abbreviate class names) JapaneseCar usedCar AmericanCar EurpoeanCar carForSale newCar GermanCar ItalianCar FrenchCar lav-ii
The views: v1(c, y, mi, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c), ,y >= 1990 v2(c, y, p, s) :- details(c,y,mi,p,s) , cFSale(c), EurCar(c) v3(c, y, p, s) :- details(c,y,mi,p,s), cFSale(c), uCar(c), p>= $25000 // luxury cars v4(c, y, p):- details(c,y,mi,p,s), cFSale(c), uCar(c), y<= 1980 //vintage cars v5(c, y, p, s) :- details(c, mc, y, p, s), cFSale(c), nCar(c), c=Toyota Assume a query: Q: q(c, mc, y, p, s) :- details(c, y, mi, p, s) , cFSale(c), Jcar(c), y>= 1992 , p<= $12000 Some candidate rewritings will be rejected, since they are inconsistent with Q lav-ii
When a view is considered for consistency with Q, • v4 will be discarded – y<=1980, y>=1992 is inconsistent • v3 will be discarded – p>=$25000, p<=$12000 is inconsistent • v2 will be discarded – EurCar(c), JCar(c) is inconsistent • v5 – depends on what is known about the relationship between Toyota and the various car classes Reasoning about disjoint-ness of classes (given a hierarchy as above) is easy and efficient lav-ii
The true story (a side trip): IM uses a (PTIME)Description Logic for source description A DL is a formalism that describes classes & binary relationships intentionally. For example, a class can be given by a name (e.g. JCar) or by an expression that describes its properties: cheapJCar :- uCar and JCar and price < $9000 A DL also contains containment and disjoint-ness axioms for class expressions (containment is called subsumption in DL jargon) To be useful, a DL needs to support containment and disjoint-ness queries on classes and membership queries on individuals – this is an inference problem lav-ii
Many DL’s are known Complexity (for subsumption) ranges from polynomial (rare), to NP-complete, to exptime-complete, to undecidable Recent interest focuses on using DL’s for the Semantic Web The W3C OWL standard is essentially a DL (this use is essentially the same as in IM) That is it on DL’s lav-ii
Views with restricted access patterns Many sources do not support full SQL: • They are legacy systems, e.g. • finger on UNIX accepts email, returns other attributes • A bibliography source requires author, or title, or but does not accept a year as input • They do not want to disclose all their data, e.g., • a carSale source will not present all the cars it has for sale • An airline requires from and destination as input for flight info The questions: • How do we describe such sources? • What are good rewritings and do we find them? lav-ii
Restricted sources can be described by binding patterns Two equivalent styles : (there are more sophisticated schemes) Example: assume global relations email(F, L, E), office(F, L, O), phone(O, P) (F-first, L-last, E-email, O-office, P-phone) The views are finger, userId, described as follows: • Adding $ to attributes that can be given as input finger(F, L, $E, O, P) :- email(F, L, E), office(F, L, O), phone(O, P) userId($O, E) :- office(F, L, O), email(F, L, E) • Using b, fstrings on predicates, where b means bound (i.e., in) fingerffbff(F, L, E, O, P) :- email(F, L, E), office(F, L, O), phone(O, P) userIdbf(O, E) :- office(F, L, O), email(F, L, E) lav-ii
Example, cont’d : Q: qbf(O, F) :- office(F, L, O) (or q($O, F) :- office(F, L, O) ) • Cannot be answered by using finger – it requires E as input • Cannot be answered by using userId – it does not return F The following is a good rewriting: q’(O, F):- userId(O, E), finger(F, L, E, O, P) jump For two reasons: • It is executable with respect to the sources: executing the body left-to-right respects the access restrictions O for userId –from the query, E for finger – from userId • Its expansion is contained in the query (check!) lav-ii
These two reasons are a characterization of a good rewriting: • It is executable with respect to the sources: executing the body left-to-right respects the access restrictions • Its expansion is contained in the query (check!) Indeed • If it is not a contained rewriting, then being executable is no good • Being contained but not executable is also no good lav-ii
The IM approach: After a rewriting is found to be consistent and contained, it is checked for being executable – can the sub-goals in the body be ordered so that the input required for each is supplied from the query or the sub-goals to its left lav-ii
A summary of IM • Introduced (with other concurrent systems) the notion of LAV and query rewriting using views • Also, detailed source descriptions using DL’s • An efficient algorithm for finding contained and executable rewritings • Worked well, for about 100 sources lav-ii
But : • The fact that a contained rewriting needs a number of views at most the number of atoms in the query has been proved only for CQ’s , without • comparisons, • access restrictions • constraints on the global db Does it hold for these cases? (see example in p. 10) For access restricted sources, it has been proved that for equivalent rewritings one needs at most n+m views, where n is the number of atoms in the query, m is the number of different variables in it The proof does not hold for contained rewritings lav-ii
Even for “pure” CQ’s, is the bucket algorithm guaranteed to find all rewritings? The answers to all these questions are negative! • The bucket algorithm does not find all rewritings • For the more general cases, longer rewritings are needed; actually, there may be an infinite number of them, with no bound on length There is a need for another approach lav-ii