240 likes | 372 Views
Scaling Heterogeneous Information Access for Wide area Environments. Michael Franklin and Louiqa Raschid. Wide-Area Data Access. Problems Scalability of Wrapper-Mediator Systems Publishing and Discovery of Sources Dissemination of Relevant Information Relevant Technologies
E N D
Scaling Heterogeneous Information Access for Wide area Environments Michael Franklin and Louiqa Raschid University of Maryland
Wide-Area Data Access Problems • Scalability of Wrapper-Mediator Systems • Publishing and Discovery of Sources • Dissemination of Relevant Information Relevant Technologies • Flexible Architectures • Adaptive Systems • Metadata Management University of Maryland
The Big Picture University of Maryland
the little picture Web sources Predator O-R DBMS Planner MDT Scrambler Remote wrapper interface Wrapper interface University of Maryland
Querying Web Sources • Generating wrappers for Web accessible sources to provide an API for queries and structured answers. • Obtaining and representing source capability and content descriptions to use in query planning. • Estimating the response time for cost-based optimization University of Maryland
Web application wrapper toolkit • Define the capabilities of Web sources • A wrapper interface to publish source capability • A wrapper toolkit • Translation from query + bindings –› URL • Declarative language to specify Extractors Simple extractors HTML or XMLData –» structured object Complex extractors - customizable crawler utility for extraction of meta-information • Generator for JDBC compliant wrappers • Metadata and query and answer interface University of Maryland
Weather source University of Maryland
Results from the Weather source University of Maryland
Query Planning for Web sources Objective: Generate safe optimal plans with possibly replicated sources • Multiple heterogeneous sources • Limited capability (bindings) • Possible replication of contents • Complete / Incomplete sources • Use meta-information to construct lattices • Generate safe plans with alternatives • Mediator algebra and rules for optimization University of Maryland
Content and Capability Descriptions • Domain information • Capability descriptions: • I/O relationships: Time,Date Channel,Title,Category • Content:Date:CurrentYear Time:{0, …,23} Channel:CNW • Completeness information,Complete.SourceS3provides completeanswer whenTime and Dateare bound andChannel=ppvand Category=Movies. • Explicitly provided by the source DBA. • Augmented by inference. • Augmented by learning based on query feedback University of Maryland
Sources in Lattices University of Maryland
Display pay-per-view movies shown on August 14th,1998 at 9:30am. Using Buckets (S1|S3) in AlternatePartition and (S5 S1) and (S5S3)in SimilarPartition University of Maryland
Web Source Response Time Estimation Tool - MDT Problem: Difficulty in determining evaluation costs • Physical implementation details unknown • Load on network and source unknown Objective: Tool to estimate response time based on query feedback and estimate confidence. To be used in a combined cost-model and to choose between alternate sources. • MDT is a tool that estimates response time based on Day, Time, Quantity, etc. University of Maryland
Configuring and learning in the MDT • MDT is configured for some hierarchy of dimensions • Calibration of each dimension • min/ max/ scale • Allowed deviation • Confidence window • Learning algorithm • Cell splitting algorithm • Value correction algorithm • Estimate response time and confidence University of Maryland
Correcting the confidence of estimated value University of Maryland
Conclusions • Extend the Predator O-R DBMS with scalable mediator functionality • Current implementation status • Scrambling enabled optimizer • Mediator algebra and logical optimizer • Cost-based optimizer based on MDT estimation • Toolkit for generating wrappers for Web sources University of Maryland
Still to come … • Publishing source metadata • Discovering sources • Source selection using metadata • User profiles • Dissemination of relevant data University of Maryland