1 / 26

Welcome to the Workshop on Query Optimization

Welcome to the Workshop on Query Optimization. I. I. T. Bombay Jointly organized with and sponsored by: PSPL Ltd, Pune October 1 and 2, 1999. Workshop Overview. Goal: tutorial on research in query optimization

terry
Download Presentation

Welcome to the Workshop on Query Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Welcome to the Workshop on Query Optimization I. I. T. Bombay Jointly organized with and sponsored by: PSPL Ltd, Pune October 1 and 2, 1999

  2. Workshop Overview • Goal: • tutorial on research in query optimization • foster R&D in query optimization in India • Programme: • 5 research sessions • 2 discussion sessions • free time after dinner on Friday for discussions • lunch and dinner • sponsored by PSPL

  3. Workshop Highlights • Extensions of query optimization • parametric query optimization, dynamic reordering of query plans, .. • Multi-query optimization • Caching • Semi-structured data and information retrieval • including XML • Directories and databases • Discussion sessions: VLDB99 update, and Future Directions

  4. Introductions…. Please introduce yourself (in less than 10 seconds!)

  5. A Brief History of Query Optimization S. Sudarshan CSE Dept, IIT Bombay

  6. The Dark Ages ... • Till late 70s: • Before relational databases • Dark ages, all optimization by hand • Needed sorcerers and wizards (the human kind, not the Microsoft kind!)

  7. Let there be light ... • Late 70’s to mid 80s • SQL: declarative syntax, optimization is job of system • System R • Seminal work on join order optimization, set stage for all later query optimization • Distributed database query optimization (System R*)

  8. Why Query Optimization? • SQL is declarative • Up to system to decide how to execute query • Disk access was and is slow compared to memory access • Join order etc. can have major impact on speed • e.g. query ran for days on MS SQL-Server 6.5 due to bad optimization decision, half an hour with good decision • Different from optimization in other domains • e.g. linear programming, compiler optimizations

  9. System R • Join order selection • A1 A2 A3 .. An • Left deep join trees • Dynamic programming • Best plan computed for each subset of relations • Best plan (A1, .., An) = min cost plan of( Best plan(A2, .., An) A1 Best plan(A1, A3, .., An) A2 …. Best plan(A1, .., An-1)) An An Ai

  10. System R (cont) • Selects and projects pushed down to lowest possible place • Sort order • join may be cheaper if inputs are sorted on join attributes • => Best plan(set-of-relations, sort-order) • Heuristic handling of other SQL features • views, nested queries, aggregates, selects, unions, ... • Many other databases used heuristic optimizers

  11. Brief History (Contd) • Mid 80’s to early 90s • Extensible query optimization • Exodus, Volcano • New data models • Nested relational model • Object oriented data model • Foreign functions • Recursive queries / deductive databases

  12. Volcano Extensible Query Optimizer Generator • General purpose cost based query optimizer, based on equivalence rules on algebras • e.g. equivalences: join associativity, select push down, aggregate push down, etc • extensible: new operations and equivalences can be easily added • notion of physical properties generalizes “interesting sort order” idea of System R • Developed by Graefe and McKenna 1993

  13. ABC AB AC BC A B C Key Ideas in Volcano • DAG representation of query • Equivalence node and operation nodes • Compactly represents set of all evaluation plans • choose one child of each equivalence node, and all children of operation nodes

  14. Main Benefits of Volcano • Extensible • can handle arbitrary algebraic expressions • new operators and equivalence rules easy to add • must be careful of search space though • Yet (reasonably) efficient • generalizes the dynamic programming idea of System-R optimizer • Optimizations of Pellenkroft et al. [VLDB 97] eliminate redundant derivations for joins • Used in MS SQL Server and Tandem

  15. Optimization in OODB/ORDBs • Major issues • Path expressions: • e.g. forall ( p in person) print (p->spouse->name) • can convert pointer dereferences to joins • can “assemble objects” in a clever sequence to minimize I/O (Graefe 93, Blakeley et al, Open OODB optimizer 95) • Path indices • e.g. forall (p in person suchthat p->spouse->name = “Rabri”) …

  16. Optimization in ORDBs • Expensive predicates/functions in selects/projects • e.g. selects based on image manipulation • usual heuristic of “push select predicates to lowest possible level’’ does not work • Extended ADTs with methods • optimizer order of applying methods, and ordering of method evaluation and joins

  17. Brief History (Contd.) • Mid 90’s to late 90s • Materialized views, view and index selection • Web, data warehouses, virtual databases • OLAP, data mining, … • Approximate query answering • Improvements on earlier techniques • handling outerjoins, aggregates • PLUS: all the topics covered in this workshop

  18. Materialized Views • Can materialize (precompute and store) views to speed up queries • Incremental maintenance • when database is updated, propagate updates to materialized view without complete recomputation • Deciding when to use materialized views • even if query does not refer to materialized view, optimizer can figure out it can be used

  19. Deciding What to Materialize • maintenance cost and query cost • workload: • queries and update transactions • weights for each component of workload • workload cost depends on what is materialized • Goal: find set of views that gives minimum cost if materialized, subject to space constraints • Note: materializing views can reduce even update costs • indices, and SQL assertions

  20. Data Warehouses • Characteristics: • Very large • typical schema: very large fact table, small dimension tables • typical query: aggregate on join of fact table and dimension tables • Can exploit above characteristics for optimizing queries • e.g., join dimension tables (even if cross product), build in memory index, scan fact table, probe index. Summarize if required and output

  21. Data Warehouses (Cont) • Synchronized scans • multiple queries can share a scan of fact table • slow some queries down so others catch up • Bit map indices • for selections on low cardinality attributes • e.g.: M 10011100011001 F 01100011100110 • idea:and-ing of bit maps is very efficient, use on bitmaps to filter to relevant tuples, retrieve them • Quass and O’Neill [Sigmod 1997]

  22. Virtual Warehouses and Databases • Data sources are numerous and distributed • may be accessible only via HTML / XML • => wrappers needed • may support only limited number of access types through forms interfaces • site descriptions: describe what data is contained at a site Levy et al [1995]. • Query sent only to relevant sites. • Stanford TSIMMIS project, Junglee, and others

  23. And on to the workshop ...

  24. Decorrelation • Idea: convert nested subqueries to joins • Consider select * from emp E where E.numchildren <> (select count(*) from person where person.parent = E.name • Can’t always express using basic rel. algebra • Long history: • special cases: Kim 88, Dayal 88, Muralikrishna 93 • general case: P. Seshadri et al 95: use outerjoin

  25. Incremental View Maintenance • E.g. R S (R U ir) S = R S U ir S (R - dr) S = R S - dr S • similar techniques for selection, projection (must maintain multiplicity counters though) and aggregation • Blakeley et al. [SIGMOD 87], Gupta and Mumick survey [DE Bulletin 95].

  26. Deciding What to Materialize • History • Roussopolous [1982]: exhaustive A* algorithm • Ross, Srivastava and Sudarshan [SIGMOD 96] suggest materializing views can reduce update costs, give heuristics • Labio et al. [1997], Gupta [1997], Sellis et al [1997], Yang, Karlapalem and Li [1997] give various exhaustive/heuristic/greedy algorithms • Chaudhuri and Narsayya [1998] considers only indices, being introduced in SQL server • Exhaustive algos are all doubly exponential!

More Related