1 / 36

Global-as-View and Local-as-View for Information Integration

Global-as-View and Local-as-View for Information Integration. CS652 Spring 2004 Presenter: Yihong Ding. Common Integration Architecture. Information Integration Systems Global-as-view (Gav.) vs. Local-as-view (Lav.) Query Reformulation Specification of Source Description

hazina
Download Presentation

Global-as-View and Local-as-View for Information Integration

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Global-as-View and Local-as-Viewfor Information Integration CS652 Spring 2004 Presenter: Yihong Ding

  2. Common Integration Architecture • Information Integration Systems • Global-as-view (Gav.) vs. Local-as-view (Lav.) • Query Reformulation • Specification of Source Description • Adding new sources

  3. Query Reformulation • Problem: rewrite a user query expressed in the mediated schema into a query expressed in the source schema Given a query Q in terms of the mediator schema relations, and descriptions of information sources Find a query Q’ that uses only thesource relations, such that • Q’  Q, and • Q’ provides all possible answers to Q given the sources

  4. Solving Queries by Views Mediator Relations Source Relations

  5. Query Rewriting Using Views • Query Containment: q’ q D q’(D) q(D) • Query Equivalence: q’=q q’ q ^ q q’ Given query q and view definitions V={v1, …, vn} • q’ is an Equivalent Rewriting of q using V if • q’ refers only to views in V, and • q’ = q • q’ is an Maximally-Contained Rewriting of q using V if • q’ refers only to views in V and • q’  q, and • There is no rewriting q1, such that q’ q1 and q1q’

  6. ComputationComplexity

  7. Complexity of Query Containment • Conjunctive Queries (CQ) (NP-Complete) • Q1: p(X,Z) :- a(X,Y) & a(Y,Z) • Q2: p(X,Z) :- a(X,Y) & a(V,Z) • CQ’s With Negation ( -Complete) • Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & NOT a(X,Z) • CQ’s With Arithmetic Comparison ( -Complete) • Q1: p(X,Z) :- a(X,Y) & a(Y,Z) & X<Y • Datalog Programs • p(A,C) :- a(A,B) & b(B,C)

  8. Specification of Source Description • Views: resources that used by integrator to help to answer queries • Gav. Mediator relation defined as view over source relations • Lav. Source relation defined as view over mediator relations

  9. Information Integration Systems • Tsimmis • Stanford and IBM • Global-as-View (Gav) • Mediator relations defined as views of source relations • Information Manifold (IM) • AT&T • Local-as-View (Lav) • Description logic • Source relations defined as views of mediator relations ( a collection of global predictions)

  10. TSIMMIS – Gav Solution • The Stanford-IBM Manager of Multiple Information Sources (TSIMMIS) • Offers: • A flexible data model • A common query language • Other supporting tools

  11. TSIMMIS – Components • OEM (Object-Exchange Model) • LOREL (Lightweight Object REpository Language) • MSL (Mediator Specification Language) • Wrappers

  12. TSIMMIS – OEM • Object Exchange Model • The data model for TSIMMIS • “self-describing” (labels carry all of the information that there is about an object) • Flexible • First order logic

  13. TSIMMIS – OEM “set” or “string” Object Identifier type value OID: label Human Understandable A set or a string

  14. TSIMMIS – OEM library set book set author string Aho title string Compilers…

  15. TSIMMIS – OEM First order predicate logic 123 author string Aho author( T, “Aho” ) This would return the object IDs of all objects with a label “author” and value “Aho”.

  16. TSIMMIS – LOREL • Lightweight Object REpository Language • An OQL for OEM • The end-user language for TSIMMIS

  17. TSIMMIS – LOREL • Example select library.book.title from library where library.book.author = “Aho”

  18. TSIMMIS – LOREL • Partial Match Semantics select R.A from R, S, T where R.A = S.A or R.A = T.A • This would fail to return anything in SQL if either S or T were empty. • Because of partial match semantics this does not fail in LOREL

  19. TSIMMIS – MSL • Mediator Specification Language • Allows declarative specification of mediators • Object oriented, logical query language • Targeted to OEM

  20. title string Compilers… TSIMMIS – MSL Query library set Mediator book set Mediator author string Aho Wrapper Wrapper Source Source <booktitle X> :- <library { <book { <title X> <author “Aho”> } > } > @s1

  21. TSIMMIS – Wrappers Query • Wrappers are similar to database drivers • Wrappers are written with MSL Mediator Mediator Wrapper Wrapper Source Source

  22. TSIMMIS – Wrappers • Wrappers have the form: MSL template // action // • Example: <books X> :- <library { X:<book {<title X> <author $AU>}> }>@s1 // sprintf(lookup-query, “find author %s”, $AU) //

  23. TSIMMIS – Summary • End users need to specify their sources w.r.t. a mediator model – OEM in TSIMMIS • Query specification is standard – LOREL • Query rewriting is straightforward – MSL and wrappers • To add a new source is not easy – need to specify it in the mediator model

  24. Information Manifold • Challenges for Information Integration • Interrelated data over multiple information sources • Large number of the sources • Limited size of data in many of the sources • Greatly variant details of interacting with each source

  25. Bucket algorithm 1 2 3 IM Architecture

  26. World View Classes: Product NewCar Automobile Car Automobile Car Motorcycle UsedCar CarForSale Virtual Relations: Product(Model) Automobile(Model, Year, Category) Motorcycle(Model, Year) Car(Model, Year, Category) NewCar(Model, Year, Category) UsedCar(Model, Year, Category) CarForSale(Model, Year, Category, Price, SellerContact)

  27. Source Descriptions • For each source: • Content Record • Capability Record Web Sources for Automobile Application

  28. Content Records of Auto Sources

  29. desired input set possible output set capable selection set Capability Records of Auto Sources

  30. Query Reformulation • Containing instead of equivalent • Incomplete source • Useful subset • Utilizes Plan Generator to: • Prune irrelevant sources • Split query into subgoals • Generate conjunctive query plans • Find executable ordering of subgoals

  31. The Bucket Algorithm Given: user query q, source descriptions {Vi} • Find relevant source (fill buckets) For each relation g in query q • Find Vj that contains relation g • Check that constraints in Vj are compatible with q • Combine source relations {Vj} from each bucket into a conjunctive query q’ and check for containment (q’  q)

  32. The Bucket Algorithm: Example q(m,p,r) CarForSale(c),Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r)

  33. CarForSale(c), Category(c,t), Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r) y1992 t=sportscar V1(c1) V1(c1,t1) V1(c1,y1) V1(c1,m1) V1(c1,p1) V2(c2) V2(c2,t2) V2(c2,y2) V2(c2,m2) V2(c2,p2) V3(c3) V3(c3,t3) V3(c3,y3) V3(c3,m3) V3(c3,p3) V5(m5,y5,r5) 1. Filling the Buckets q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r)

  34. ?  Expanded Query q’(m,p,r) CarForSale(c), UsedCar(c), Category(c,t), t=sportscar, Model(c,m), Year(c,y), Price(c,p), ProductReview(m,y,r), y1992 2. Checking Containment User Query q(m,p,r) CarForSale(c), Category(c,sportscar), Year(c,y), y1992, Model(c,m), Price(c,p), ProductReview(m,y,r) Result Query q’(m,p,r) V1(c)({Category(c):sportscar}, {Price(c), Model (c), Year(c)}, {Year(c)1992, Category(c)=sportscar}), V5(m,y,r)({m:Model(c), y:Year(c)}, {r}, {}). 

  35. BindAvail1 = {CarForSale(c,sportscar),Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s)} CarForSale(c), Category(c,t), Year(c,y), Model(c,m), Price(c,p), ProductReview(m,y,r) y1992 t=sportscar BindAvail1 = {CarForSale(c,sportscar),Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r)} BindAvail1 = {CarForSale(c,sportscar),Model(c,m), Year(c,y), Price(c,p), SellerContact(c,s), ProductReview(m,y,r), y1992} Finding an Executable Ordering  V1(c) V1(c,t) V1(c,y) V1(c,m) V1(c,p) V5(m,y,r)

  36. Advantages and Disadvantages • Gav: Tsimmis • Advantage • Query reformulation: rule unfolding • Disadvantage • Mediation description • Adding, removing, and modifying source description • Better for static, centralized systems • Lav: Information Maniford • Advantage: adding new sources • Mediator (global predicates, source descriptions) • Query processing • Disadvantages • query reformulation (Bucket algorithm) • Better for dynamic, distributed systems

More Related