1 / 24

Scaling Heterogeneous Information Access for Wide area Environments

Scaling Heterogeneous Information Access for Wide area Environments. Michael Franklin and Louiqa Raschid. Wide-Area Data Access. Problems Scalability of Wrapper-Mediator Systems Publishing and Discovery of Sources Dissemination of Relevant Information Relevant Technologies

ekram
Download Presentation

Scaling Heterogeneous Information Access for Wide area Environments

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid University of Maryland

  2. Wide-Area Data Access Problems • Scalability of Wrapper-Mediator Systems • Publishing and Discovery of Sources • Dissemination of Relevant Information Relevant Technologies • Flexible Architectures • Adaptive Systems • Metadata Management University of Maryland

  3. The Big Picture University of Maryland

  4. the little picture Web sources Predator O-R DBMS Planner MDT Scrambler Remote wrapper interface Wrapper interface University of Maryland

  5. Querying Web Sources • Generating wrappers for Web accessible sources to provide an API for queries and structured answers. • Obtaining and representing source capability and content descriptions to use in query planning. • Estimating the response time for cost-based optimization University of Maryland

  6. Web application wrapper toolkit • Define the capabilities of Web sources • A wrapper interface to publish source capability • A wrapper toolkit • Translation from query + bindings –› URL • Declarative language to specify Extractors Simple extractors HTML or XMLData –» structured object Complex extractors - customizable crawler utility for extraction of meta-information • Generator for JDBC compliant wrappers • Metadata and query and answer interface University of Maryland

  7. Weather source University of Maryland

  8. Results from the Weather source University of Maryland

  9. University of Maryland

  10. Query Planning for Web sources Objective: Generate safe optimal plans with possibly replicated sources • Multiple heterogeneous sources • Limited capability (bindings) • Possible replication of contents • Complete / Incomplete sources • Use meta-information to construct lattices • Generate safe plans with alternatives • Mediator algebra and rules for optimization University of Maryland

  11. University of Maryland

  12. Content and Capability Descriptions • Domain information • Capability descriptions: • I/O relationships: Time,Date Channel,Title,Category • Content:Date:CurrentYear Time:{0, …,23} Channel:CNW • Completeness information,Complete.SourceS3provides completeanswer whenTime and Dateare bound andChannel=ppvand Category=Movies. • Explicitly provided by the source DBA. • Augmented by inference. • Augmented by learning based on query feedback University of Maryland

  13. Sources in Lattices University of Maryland

  14. Display pay-per-view movies shown on August 14th,1998 at 9:30am. Using Buckets (S1|S3) in AlternatePartition and (S5 S1) and (S5S3)in SimilarPartition University of Maryland

  15. Web Source Response Time Estimation Tool - MDT Problem: Difficulty in determining evaluation costs • Physical implementation details unknown • Load on network and source unknown Objective: Tool to estimate response time based on query feedback and estimate confidence. To be used in a combined cost-model and to choose between alternate sources. • MDT is a tool that estimates response time based on Day, Time, Quantity, etc. University of Maryland

  16. Configuring and learning in the MDT • MDT is configured for some hierarchy of dimensions • Calibration of each dimension • min/ max/ scale • Allowed deviation • Confidence window • Learning algorithm • Cell splitting algorithm • Value correction algorithm • Estimate response time and confidence University of Maryland

  17. Correcting the confidence of estimated value University of Maryland

  18. University of Maryland

  19. University of Maryland

  20. University of Maryland

  21. University of Maryland

  22. University of Maryland

  23. Conclusions • Extend the Predator O-R DBMS with scalable mediator functionality • Current implementation status • Scrambling enabled optimizer • Mediator algebra and logical optimizer • Cost-based optimizer based on MDT estimation • Toolkit for generating wrappers for Web sources University of Maryland

  24. Still to come … • Publishing source metadata • Discovering sources • Source selection using metadata • User profiles • Dissemination of relevant data University of Maryland

More Related