1 / 25

Efficient Evaluation of Queries in a Mediator for WebSources

Efficient Evaluation of Queries in a Mediator for WebSources. Louiqa Raschid University of Maryland Joint work with Zadorozhny, Vidal, Urhan, Bright. Wide-Area Applications (WAA ). Problem: Scalability of Wrapper/Mediator architectures to WebSources that are accessible via WANs

cara
Download Presentation

Efficient Evaluation of Queries in a Mediator for WebSources

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Efficient Evaluation of Queries in a Mediator for WebSources Louiqa Raschid University of Maryland Joint work with Zadorozhny, Vidal, Urhan, Bright

  2. Wide-Area Applications (WAA) Problem: Scalability of Wrapper/Mediator architectures to WebSources that are accessible via WANs • Multiple sources and complex computational capabilities (WSI) • Complex queries on multiple sources, e.g., drug lead discovery using biomolecular sources • Dissimilar access costs (end-to-end latencies) and other metrics for each WSI • Dynamic Wide area environment is noisy (introduces unpredictable delays) L. Raschid — University of Maryland

  3. Mediator Web Query Optimizer Catalog Capability-Based Pre-optimizer Extended Randomized Optimizer Web Query Broker ExecutionEngine Wrapper Wrapper WAN Web Source Web Source Detailed Architecture Query WCM WebPT WCM WebPT

  4. Relevant Technologies • Wrapper generation Toolkit: CoopIS 1999, VLDB Journal special issue 2000 • Wrapper Mediator Architecture (Predator ORDBMS platform) ICDE 2000 demo • CBR Tool, Wrapper Cost model (WCM) CoopIS 1999 • Web prediction tool (WebPT) VLDB Journal 2000 special issue, CoopIS 2001 • Web Query Optimizer (WQO) Sigmod 2002, ICDCS 2002 L. Raschid — University of Maryland

  5. Outline of the talk • Motivation and architecture • Example • Two Phase Optimizer (WQO) and Pre-plans • Heuristics used by WQO to choose Pre-plans • Evaluation • Related Work • Future Work L. Raschid — University of Maryland

  6. ACM DL Web Source Schema Paper(FirstAuthor, Title, PaperID, Keywords, PaperPDF) Coauthor(PaperId, CName) Editor(PaperId, EName) Reviewer(PaperId, RName) Limited capabilities - WSIs ior1 Paper:FirstAuthor —› Title, PaperID, Keywords, PaperPDF ior3 Paper: FirstAuthor —› Title, PaperID, Keywords ior4 Paper:PaperID —› PaperPDF ior2 CoAuthor: PaperID —› CName ior5 Editor: PaperID —› EName ior6 Reviewer: {} —› PaperId, RName Dependencies: (ior1—›ior2), (ior3—›ior4),(ior3&ior4—›ior2) ... Composed WSI ior3 & ior4 Atomic WSI ior1 L. Raschid — University of Maryland

  7. 2 Phase Optimization for WebSources Objective: Generate safe good plans in a large search space (due to multiple alternate WSIs) • Pre-plan • 2 Phase optimization • Cost based heuristics • Cost based optimization (randomized optimizer) L. Raschid — University of Maryland

  8. CBR Tool and Pre-plan • Pre-plan – partition subgoals on mediator relations • (partial) Ordering between subgoals (WSI dependencies) • Select WSI for each subgoal SELECTP.Title,P.PaperPDF,CoAuthor.CName FROMPaper P, CoAuthor, Editor, Reviewer WHEREP.1stAuthor=“Franklin”&P.PaperId=CoAuthor.PaperId& P.PaperId=Reviewer.PaperId&P.PaperId=Editor.PaperId {{Paper (ior1), Reviewer(ior6)} {CoAuthor(ior2), Editor(ior5)} } Paper(ior1)—› CoAuthor(ior2), Paper(ior1) —› Editor(ior5) {{Paper (ior3;ior4), Reviewer(ior6)} {CoAuthor(ior2), Editor(ior5)} } Paper(ior3;ior4)—› CoAuthor(ior2), Paper(ior3;ior4) —› Editor(ior5) L. Raschid — University of Maryland

  9. 2 Phase Optimization for WebSources • WQO uses CBR tool to select WSIs and generate pre-plans • WQO uses cost-based heuristics to select one or more good pre-plans • WQO uses randomized relational optimizer and cost model and chooses safe good plans Objective: Choose a good pre-plan that will lead to a good plan L. Raschid — University of Maryland

  10. Traditional optimizer behavior Query with 5-way join, 3 WSIs per relationDetails of relative costs of WSIs in paper L. Raschid — University of Maryland

  11. WQO behavior using heuristics to explore a few a good pre-plan(s) L. Raschid — University of Maryland

  12. WQO functionality • Metrics of Wrapper Cost Model (WebPT) • Ignore local processing costs (assume optimizer will choose best plan possible for pre-plan) • Choose WSIs and pre-plans to minimize remote costs • Why use heuristics to choose pre-plan? • Impact of cost / delay on heuristics versus impact on a cost model • Impact of noise on heuristics versus cost model • Limitations of heuristics L. Raschid — University of Maryland

  13. Metrics in Wrapper Cost Model • WebSource and Network Costs • Remote Cost at Web Source - TTF • Downloading data from Web Source (extraction cost) • Total cost - TTL • Wrapper Statistics • Number of Pages Accessed • Cardinality of Result • Statistics dependent on value of query binding • WebPT - a tool for learning using query feedback and predicting access cost based on parameters such as Day, Time, Qty, Cardinality, etc. L. Raschid — University of Maryland

  14. WQO Heuristics to reduce remote costs • Reduce number of WSI calls • Favors atomic WSIs • … but composed WSIs may reduce result cardinality hence reduce overall number of calls • Choose WSI with lowest access cost • Favors lower cost WSIs • … but more expensive WSIs may provide more filtering of results and may reduce result cardinality • Reduce result cardinality • More selective WSIs • WSIs with more input bindings L. Raschid — University of Maryland

  15. WQO Heuristics to choose good pre-plans • Top-down versus bottom-up evaluation • Requires knowledge of dependencies • Details in paper {R1 R3 R4 R5} {R2(S21)} UR3 -> R2 {R1 R2(S22) R3 R4 R5} Uø • Choice of atomic versus composed WSIs ior1 ior3 + ior4 • Cost&selectivity measure • Favor both low access costs and low cardinality • E.g., if access costs of 2 WSIs are similar choose WSI with greater selectivity L. Raschid — University of Maryland

  16. Sample Queries for ACM DL Web Source Query Q3: and keyword = “broadcast disks” Query Q1: SELECT Title, PaperPDF,CoAuthorName FROM Paper, Coauthor WHERE AuthorName = “Michael Franklin” and Coauthor.PaperID = Paper.PaperID Query Q2: SELECT Title, PaperPDF,CoAuthorName FROM Paper, Coauthor WHERE AuthorName = “Michael Franklin” and Coauthor.PaperID = Paper.PaperID and Coauthor.CoAuthorName = ”Stan Zdonik”

  17. Plan 3: Q1 bad Q2 good Plan 1: Plan 2: Coauthor Paper Coauthor Paper ior4 costly ior1 ior2 ior2 Paper Paper Paper Coauthor ior3 ior2 ior3 ior4 costly -- DependentJoin Execution Plans for Query Q1 and Q2 Use atomic ior1 Use composed (ior3,ior4) Use composed (ior3,ior4) Q3 good costly L. Raschid — University of Maryland

  18. Sample Queries for ACM DL Web Source Query Q3: and keyword = “broadcast disks” Query Q1: SELECT Title, PaperPDF,CoAuthorName FROM Paper, Coauthor WHERE AuthorName = “Michael Franklin” and Coauthor.PaperID = Paper.PaperID Query Q2: SELECT Title, PaperPDF,CoAuthorName FROM Paper, Coauthor WHERE AuthorName = “Michael Franklin” and Coauthor.PaperID = Paper.PaperID and Coauthor.CoAuthorName = ”Stan Zdonik”

  19. Plan1 Plan1 Plan2 Plan2 Plan3 Plan3 Response Time for Queries Q1 and Q2 Query # L. Raschid — University of Maryland

  20. Plan1 Plan1 Plan2 Plan2 Plan3 Plan3 Quantile plots for Queries Q1 and Q2 L. Raschid — University of Maryland

  21. Summary of WQO Heuristics Objective: reduce the amount of data delivered to the mediator and minimize remote access cost Factors: Cardinality / Selectivity of dependent join (binding attributes) of WSIs Selectivity / cardinality of the remote relations Costly WSIs Good choice of WSIs and pre-plans relies on ability to construct a realistic cost model

  22. Choice of WSIs for Complex Queries L. Raschid — University of Maryland

  23. WQO Limitations: Choice of (A)tomic and (C)omposed capability 4way and 5way join queries 2 remote relations -(A)tomic /(C)omposed AA AC CA CC Qcard100-AAbest plan (ms) worst plan (ms) 2545 200667AA poor heuristic 639214748 CC Qcard1000-AA good heuristic 20547 200847 AA 24229 2147480 CC Qcard1000-CC good heuristic 20265 200534 AA 2399 466704 CC L. Raschid — University of Maryland

  24. Related Work Capability based rewriting Wrapper cost models: Garlic(IBM), DISCO(INRIA), HP Mediator Optimizers: Garlic(IBM), WSQ/DSQ(Stanford), IRO-DB(Versailles) Adaptive operators: Telegraph, Tukwila, XJoin Reactive optimizers: Query scrambling (Maryland), LEC optimizer (Cornell), EC+D Optimizer (Maryland) Cost/Quality trade-off: Nie+Rao (ASU), Naumann+Freitag (Germany) …..

  25. Current implementation status of WQO • Extensions to randomized relational optimizer. • WebPT tool to predict response time from WebSources. • WebWrapper cost model for WebSources. • Cost based heuristics to choose pre-plans. • Integration into a scrambling enabled optimizer. • Study of pre-plan choice (explored search space) on choice of good plan. L. Raschid — University of Maryland

More Related