1 / 17

Query Analysis and Rewriting: Schema-based Optimization

This lecture explores query rewriting techniques with schema resources and optimizations for regular path expressions using graph schemas. It discusses query containment, minimization, and the problem of rewriting queries based on schema information.

ulyssesl
Download Presentation

Query Analysis and Rewriting: Schema-based Optimization

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Managing XML and Semistructured Data Lecture 15: Query Analysis Prof. Dan Suciu Spring 2001

  2. In this lecture • Query rewriting • examples • Query rewriting with schema Resources • Optimizing Regular Path Expressions Using Graph Schemas, M.Fernandez and D.Suciu, Data Engineering, 98 • Query Optimization for Structured Documents Based on Knowledge on the Document Type Definition, K. Bohm, K. Gayer, K. Aberer, T. Özsu

  3. Query Analysis Generic term to describe: • Query rewriting based on schema information • Query containment and minimization

  4. Query Rewriting Problem: • Given a query Q • Regular path expression • Or more complex Xquery expression • Given a schema S • graph schema • DTD • XML-Schema • Rewrite Q to some QS s.t. • Q is equivalent to QS over databases conforming to S • QS is more efficient than Q

  5. Query Rewriting Optimizing Regular Path Expressions Using Graph Schemas, M.Fernandez and D.Suciu, Data Engineering, 98 Simplest setting: • Regular path expression • Graph schemas

  6. Example of Query Rewriting • Naive evaluation: need to traverse entire graph (or tree) Q = //Department//Project

  7. Example of Query Rewriting Graph Schema: s1 S = other Org s2 other “Project” “Member” s3 other Org = “Department”  “College”  “School” other = Org  ”Project”  ”Member” s4 other

  8. Example of Query Rewriting • Schema says: “there can be at most one Department edge; below, there can be at most one Project edge” • QS can be evaluated more efficiently than Q • Why ? Q = //Department//Project QS = (other)*/Department/(other)*/Project other =  “Department”  “College”  “School”  ”Project”  ”Member”

  9. Example of Query Rewriting • How to construct QS systematically from Q and S ? • Step 1 build the automaton A for Q • Step 2 build the product automaton S x A • Step 3 QS = expression of S x A

  10. Example of Query Rewriting true true Project Dept A = a3 a2 a1 S x A = false other false other S = s1 other false Dept Org Org Org other other false false s2 false other Project false Project Project other other false false Member Member s3 other other false false other s4 other QS = (other)*/Department/(other)*/Project

  11. Query Rewriting Correctness: Proposition If the instance I conforms to S, then Q(I) = QS(I) That is, Q and QS are equivalent over databases conforming to S

  12. Query Rewriting Efficiency • Given query Q, instance I, define: cost(Q,I) = | {w(I) | wprefix(Lang(Q))} | Proposition If Q and Q’ are equivalent over all databases conforming to S, and if I conforms to S, then cost(QS,I)  cost(Q’,I) Hence, QS is optimal (in a certain sense)

  13. Query Rewriting Query Optimization for Structured Documents Based on Knowledge on the Document Type Definition, K. Bohm, K. Gayer, K. Aberer, T. Özsu More complex settings: • Schema = DTD • Query = region algebrar (think: Xpath) Problem is more complex; this works proposes some solution

  14. Query Rewriting Idea: analyze DTD and extract 3 relations: Exclusivity. Element is E1 exclusively contained in E2 if every path from the root to E1 goes through E2 Xpath simplification: E1[ancestor-or-self::E2]  E1

  15. Query Rewriting Obligation E1 obligatorily contains E2 if it has a child of type E2 E1[E2]  E1

  16. Query Rewriting Entrance Location E is an entrance location for E1, E2 if every path from E1 to E2 goes through some E E1[ancestor-or-self::E2] E1[ancestor-or-self::E[ancestor-or-self::E2]]

  17. Query Rewriting Add these rules, plus variations, to a rule-based optimizer • HyperStorM – a Structured Document Database • On top of VODAK – an oo database system Open question: does this approach exploit all the information in a DTD/XML-Schema ? How can we exploit what is not used ?

More Related