100 likes | 116 Views
This article explores the problem of queries that may not be fully covered by a data integration system and presents a solution that provides contextualized extensional and intensional responses. It discusses the use of the LAV architecture and the L and Q query languages to derive query plans and logical descriptions. Examples from the art museum domain are given, along with the translation to natural language.
E N D
Data Integration under the Schema Tuple Query Assumption Michael Minock The University of Umeå, Sweden Michael Minock (mjm@cs.umu.se)
Introduction • Problem: • Queries may be over information that is not (yet) covered by the data integration system • ”List museums in Vienna or Bratislava holding paintings by Klimt or Picasso.” • A purely extensional response misleads • Solution: • Give available extension, but contextualize with intensional descriptions of coverage • Certain: ”The following are all the museums in Vienna that hold paintings of Picasso: …” • Possible: ”The following museums in Vienna do not provide inventory records, so they may have paintings by Klimt:…” • Incomplete: ”There is no information for museums in Bratislava.” Michael Minock (mjm@cs.umu.se)
Approach • LAV (Local as View) architecture • user queries and data source descriptions restricted to schema tuple queries in L(or Q) • currently sources must contain complete and correct views • broker mediates user query over sources and supplies a mixed extensional/intensional response • Use ’algebraic’ properties of L (or Q) to derive: • query plan (using cache) • logical descriptions of certain, uncertain and incomplete sets • Exploit subsumption properties for: • query simplification • natural language generation Michael Minock (mjm@cs.umu.se)
The Schema Tuple Query Languages L (and Q) • Assumptions: • L :Tuple relational queriesof the form: • Q: • Properties: • L and Q decidable for satisfiability • Unlike , Q closed over negation • May calculate difference and intersection and decide containment, equivalence and disjointness for queries built using L and Q Michael Minock (mjm@cs.umu.se)
Example: Art museum domain • QUERY: ”List museums in Vienna or Bratislava • holding paintings by Klimt or Picasso.” Artist(id, name, country, DOB,DOD) Museum (id, name, address, city, country) Painting (id, title,year, artistId) HasPainting (museumId, paintingId) Central European Museums MAK Inventory Picasso Locator Albertina Inventory Michael Minock (mjm@cs.umu.se)
Example: Input Expressions … (m Museum (IN m city ("Vienna" "Bratislava")) (+ (y1 y2 y3) (HasPainting y1)(Painting y2)(Artist y3) (= m id y1 museumId)(= y1 paintingId y2 id)(= y2 artistId y3 id) (IN y3 name ("Klimt" "Picasso")))) (h HasPainting (+ (y1 y2) (Painting y1) (Artist y2) (= h paintingId y1 id) (= y1 artistId y2 id) (= y2 name "Picasso")))) (m Museum (IN m city ("Vienna" "Prague” "Berlin” …)))) (h HasPainting (+ (y1) (Museum y1) (= h museumId y1 id) (= y1 name "MAK") (= y1 city "Vienna")))) (h HasPainting (+ (y1) (Museum y1) (= h museumId y1 id) (= y1 name ”Albertina") (= y1 city "Vienna")))) Michael Minock (mjm@cs.umu.se)
Example: Output Expressions … (m Museum (= m city ”Vienna") (+ (y1 y2 y3) (HasPainting y1)(Painting y2)(Artist y3) (= m id y1 museumId)(= y1 paintingId y2 id) (= y2 artistId y3 id)(= y3 name "Picasso"))) (m Museum (= m city ”Vienna") (IN m name (”Albertina” ”MAK”)) (+ (y1 y2 y3) (HasPainting y1)(Painting y2)(Artist y3) (= m id y1 museumId)(= y1 paintingId y2 id) (= y2 artistId y3 id)(= y3 name "Klimt"))) Certain (m Museum (= m city ”Vienna") (NOT_IN m name (”Albertina” ”MAK”)) (+ (y1 y2 y3) (HasPainting y1)(Painting y2)(Artist y3) (= m id y1 museumId)(= y1 paintingId y2 id)(= y2 artistId y3 id) (= y3 name "Klimt"))) Uncertain (m Museum (= m city "Bratislava") (+ (y1 y2 y3) (HasPainting y1)(Painting y2)(Artist y3) (= m id y1 museumId)(= y1 paintingId y2 id)(= y2 artistId y3 id) (IN y3 name ("Klimt" "Picasso")))) Incomplete Michael Minock (mjm@cs.umu.se)
Example: To Natural Language • QUERY: ”List museums in Vienna or Bratislava • holding paintings by Klimt or Picasso.” ”Museums in Vienna named ’Albertina’ or ’MAK’ that have paintings by Klimt.” Certain ”Museums in Vienna that have paintings by Picasso” Museums in Vienna not named ’Albertina’ or ’MAK’ that have paintings by Klimt.” Uncertain Incomplete ”Museums in Bratislava that have paintings by Picasso or Klimt.” Michael Minock (mjm@cs.umu.se)
Pros and cons of L and Q • Pros • May represent n-ary relations • Direct translation to SQL! • Some negation • General cyclic queries ”The artists without paintings in a museum in the country of their origin.” • Cons • No projection! • Certain quantifier prefixes prohibited ”The artists with paintings in all of the museums in the country of their origin” Michael Minock (mjm@cs.umu.se)
Next ’STEP’… • STEP 1.0 (Schema Tuple Expression Processor) • Incomplete and/or incorrect source views • Real applications Datasource Descriptions Phrasal Lexicon Cache DB Broker NLG Differencing Engine/Simplifier L2DomainCalculus SPASS theorem prover Michael Minock (mjm@cs.umu.se)