1 / 20

Scaling Heterogeneous Databases and Design of DISCO

Scaling Heterogeneous Databases and Design of DISCO. Anthony Tomasic Louiqa Raschid Patrick Valduriez. D ISCO Architecture. A : Application M : Mediator C : Catalog W : Wrapper D : Data Source. Problems with the Architecture.

alijah
Download Presentation

Scaling Heterogeneous Databases and Design of DISCO

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling Heterogeneous Databases and Design of DISCO Anthony Tomasic Louiqa Raschid Patrick Valduriez

  2. DISCO Architecture A : Application M : Mediator C : Catalog W : Wrapper D : Data Source

  3. Problems with the Architecture • Fragile mediator Problem - Mediator schema may have to be changed when a new source is added. • Source capability problem - Different wrappers may have different functionality. • Graceless failure - The query can not be processed in presence of unavailable data sources.

  4. Overview • Mediator Query Processing • Describing Source Capabilities • Mediator Cost Model • Partial Evaluation of Queries

  5. Mediator Query Processing

  6. Incorporating Source Capabilities • Describing the operators : Wrapper exports information about which operators it can execute and on which collections. Select [publications 1 { bind Author (=) bind KeywordTitle (=) }] project [publications 2 { bind combine Author () bind combine Title () }] scan [ ALL ] • Mediators can also accept context-free grammar which describes the functionality of the wrapper.

  7. Mediator Cost Model • The Mediator has a generic cost model : • Unary Operators : • sequential scan and index scan • cost formulae derived using calibrating approach • Binary Operators : • index join, nested loops and sort-merge join • if index is available, index join is chosen, otherwise the best of the other two • Wrapper can override the mediator model by exporting statistics and/or cost formulae.

  8. Cost Communication • Exporting Statistics - Wrapper can export statistics through two special methods attribute and extent attached to each interface description. • Exporting Formulae - Wrapper specific cost formulae can be described using rules. For example, select(C, A = V) <== CountObject = C.CountObject * selectivity(A, V) TotalSize = CountObject * C.ObjectSize TotalTime = C.TotalTime + C.TotalSize * 25 • Mediator selects the most specific rule.

  9. Partial Evaluation of Queries • If a data source is unavailable, DISCO evaluates as much of the query as possible and returns another query. • Example : Consider the following query run when person2 is unavailable: select x.name from x in person0, y in person1, z in person2 where x.name = y.name and y.name = z.name Returns the following result (where t0 is person0 join person1) : select w.name from w in t0, z in person2 where w.name = z.name

  10. Extracting Information • Opaque Partial Answers : No extraction possible. • Transparent Partial Answers : Can ask a “parachute” query which is related to the original query. For example, a parachute query for the earlier example can be: select x.name from x in person0, y in person1 where x.name = y.name • Parachute query is evaluated by rewriting it over the materialized relations.

  11. Constrained Evaluation of Queries • The optimizer tries to ensure that the parachute queries can always be evaluated (if possible at all) in case of failures. For example, if the parachute query is (A join C), then it will not be possible to evaluate it if B fails.

  12. Partial Evaluation of Queries • Open Issues : • Semantics with updates to data sources • Tradeoffs between materializing partial answers and resubmitting the original queries • Aggregate queries ? • APPROXIMATE ?

  13. The Good • It can handle wrappers with different capabilities. • Mediator uses a generic cost model which can be overridden by the wrapper. • Partial evaluation of queries and extraction of information from partial answer.

  14. The Bad • Queries involving different wrappers have to be done at the mediator. • Only implemented a relational subset of the model. • Data replication not addressed.

  15. The Ugly • Arbitrary source capabilities can not be easily handled. • Proliferation of wrapper specific cost rules can make query optimization very expensive. • Centralized query optimization - wrappers don’t have much control over it. • Autonomous data sources ?

  16. Mediator Query Processing • Reformulate the query into local schemas. • Transform the query into logical operator trees. • Decompose each query into wrapper sub-queries and a composition query. • Modify the wrapper sub-queries and the composition query to reflect the capabilities of the wrappers. • Generate distributed execution plans . • Select the minimum cost plan. • Send the wrapper sub-trees to the wrappers and execute the composition query on the results.

  17. Mediator Data Model • Extensions to ODMG 2.0 • multiple extents per interface using MetaExtents • interface MetaExtent { attribute String name; attribute Extent e; attribute Type interface; attribute Wrapper wrapper; attribute Map map; • } • type mapping

  18. Accessing Data Sources • Define a wrapper object. wrapper w0 rmi://rodin.inria.fr/PersonWrapper • Define a wrapper schema. extent p0 of Person; interface Person { attribute String name; attribute Short salary; } This is exported to the mediator. • Definethe mediator schema.

  19. Accessing Data Sources • Define the mediator extents extent person0 of Person wrapper w0 extent p0; extent person1 of Person wrapper w1 extent s1 map (name = sname); • Can use subtyping and views to define more complex transformations on the data sources. define double as select struct (name: x.name, salary: x.salary+y.salary) from x in person0 and y in person1 where x.name = y.name

  20. Mediator Query Processing

More Related