360 likes | 738 Views
Global-as-View and Local-as-View for Information Integration. CS652 Spring 2004 Presenter: Yihong Ding. Common Integration Architecture. Information Integration Systems Global-as-view (Gav.) vs. Local-as-view (Lav.) Query Reformulation Specification of Source Description
E N D
Global-as-View and Local-as-Viewfor Information Integration CS652 Spring 2004 Presenter: Yihong Ding
Common Integration Architecture • Information Integration Systems • Global-as-view (Gav.) vs. Local-as-view (Lav.) • Query Reformulation • Specification of Source Description • Adding new sources
Query Reformulation • Problem: rewrite a user query expressed in the mediated schema into a query expressed in the source schema Given a query Q in terms of the mediator schema relations, and descriptions of information sources Find a query Q’ that uses only thesource relations, such that • Q’ Q, and • Q’ provides all possible answers to Q given the sources
Solving Queries by Views Mediator Relations Source Relations
Query Rewriting Using Views • Query Containment: q’ q D q’(D) q(D) • Query Equivalence: q’=q q’ q ^ q q’ Given query q and view definitions V={v1, …, vn} • q’ is an Equivalent Rewriting of q using V if • q’ refers only to views in V, and • q’ = q • q’ is an Maximally-Contained Rewriting of q using V if • q’ refers only to views in V and • q’ q, and • There is no rewriting q1, such that q’ q1 and q1q’
Complexity of Query Containment • Conjunctive Queries (CQ) (NP-Complete) • Q1: p(X,Z) :- a(X,Y) & a(Y,Z) • Q2: p(X,Z) :- a(X,Y) & a(V,Z) • CQ’s With Negation ( -Complete) • Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & NOT a(X,Z) • CQ’s With Arithmetic Comparison ( -Complete) • Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & X<Y • Datalog Programs • p(A,C) :- a(A,B) & b(B,C)
Specification of Source Description • Views: resources that used by integrator to help to answer queries • Gav. Mediator relation defined as view over source relations • Lav. Source relation defined as view over mediator relations
Information Integration Systems • Tsimmis • Stanford and IBM • Global-as-View (Gav) • Mediator relations defined as views of source relations • Information Manifold (IM) • AT&T • Local-as-View (Lav) • Description logic • Source relations defined as views of mediator relations ( a collection of global predictions)
TSIMMIS – Gav Solution • The Stanford-IBM Manager of Multiple Information Sources (TSIMMIS) • Offers: • A flexible data model • A common query language • Other supporting tools
TSIMMIS – Components • OEM (Object-Exchange Model) • LOREL (Lightweight Object REpository Language) • MSL (Mediator Specification Language) • Wrappers
TSIMMIS – OEM • Object Exchange Model • The data model for TSIMMIS • “self-describing” (labels carry all of the information that there is about an object) • Flexible • First order logic
TSIMMIS – OEM “set” or “string” Object Identifier type value OID: label Human Understandable A set or a string
TSIMMIS – OEM library set book set author string Aho title string Compilers…
TSIMMIS – OEM First order predicate logic 123 author string Aho author( T, “Aho” ) This would return the object IDs of all objects with a label “author” and value “Aho”.
TSIMMIS – LOREL • Lightweight Object REpository Language • An OQL for OEM • The end-user language for TSIMMIS
TSIMMIS – LOREL • Example select library.book.title from library where library.book.author = “Aho”
TSIMMIS – LOREL • Partial Match Semantics select R.A from R, S, T where R.A = S.A or R.A = T.A • This would fail to return anything in SQL if either S or T were empty. • Because of partial match semantics this does not fail in LOREL
TSIMMIS – MSL • Mediator Specification Language • Allows declarative specification of mediators • Object oriented, logical query language • Targeted to OEM
title string Compilers… TSIMMIS – MSL Query library set Mediator book set Mediator author string Aho Wrapper Wrapper Source Source <booktitle X> :- <library { <book { <title X> <author “Aho”> } > } > @s1
TSIMMIS – Wrappers Query • Wrappers are similar to database drivers • Wrappers are written with MSL Mediator Mediator Wrapper Wrapper Source Source
TSIMMIS – Wrappers • Wrappers have the form: MSL template // action // • Example: <books X> :- <library { X:<book {<title X> <author $AU>}> }>@s1 // sprintf(lookup-query, “find author %s”, $AU) //
TSIMMIS – Summary • End users need to specify their sources w.r.t. a mediator model – OEM in TSIMMIS • Query specification is standard – LOREL • Query rewriting is straightforward – MSL and wrappers • To add a new source is not easy – need to specify it in the mediator model
Information Manifold • Challenges for Information Integration • Interrelated data over multiple information sources • Large number of the sources • Limited size of data in many of the sources • Greatly variant details of interacting with each source
Bucket algorithm 1 2 3 IM Architecture
World View Classes: Product NewCar Automobile Car Automobile Car Motorcycle UsedCar CarForSale Virtual Relations: Product(Model) Automobile(Model, Year, Category) Motorcycle(Model, Year) Car(Model, Year, Category) NewCar(Model, Year, Category) UsedCar(Model, Year, Category) CarForSale(Model, Year, Category, Price, SellerContact)
Source Descriptions • For each source: • Content Record • Capability Record Web Sources for Automobile Application
desired input set possible output set capable selection set Capability Records of Auto Sources
Query Reformulation • Containing instead of equivalent • Incomplete source • Useful subset • Utilizes Plan Generator to: • Prune irrelevant sources • Split query into subgoals • Generate conjunctive query plans • Find executable ordering of subgoals
The Bucket Algorithm Given: user query q, source descriptions {Vi} • Find relevant source (fill buckets) For each relation g in query q • Find Vj that contains relation g • Check that constraints in Vj are compatible with q • Combine source relations {Vj} from each bucket into a conjunctive query q’ and check for containment (q’ q)
The Bucket Algorithm: Example q(m,p,r) CarForSale(c),Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r)
CarForSale(c), Category(c,t), Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r) y1992 t=sportscar V1(c1) V1(c1,t1) V1(c1,y1) V1(c1,m1) V1(c1,p1) V2(c2) V2(c2,t2) V2(c2,y2) V2(c2,m2) V2(c2,p2) V3(c3) V3(c3,t3) V3(c3,y3) V3(c3,m3) V3(c3,p3) V5(m5,y5,r5) 1. Filling the Buckets q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r)
? Expanded Query q’(m,p,r) CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y1992 2. Checking Containment User Query q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r) Result Query q’(m,p,r) V1(c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)1992, Category(c)=sportscar}), V5(m,y,r)({m:Model(c), y:Year(c)}, {r}, {}).
BindAvail1 = {CarForSale(c,sportscar),Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)} CarForSale(c), Category(c,t), Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r) y1992 t=sportscar BindAvail1 = {CarForSale(c,sportscar),Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)} BindAvail1 = {CarForSale(c,sportscar),Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y1992} Finding an Executable Ordering V1(c) V1(c,t) V1(c,y) V1(c,m) V1(c,p) V5(m,y,r)
Advantages and Disadvantages • Gav: Tsimmis • Advantage • Query reformulation: rule unfolding • Disadvantage • Mediation description • Adding, removing, and modifying source description • Better for static, centralized systems • Lav: Information Maniford • Advantage: adding new sources • Mediator (global predicates, source descriptions) • Query processing • Disadvantages • query reformulation (Bucket algorithm) • Better for dynamic, distributed systems