1 / 37

Rewriting Procedures for Batched Bindings

Rewriting Procedures for Batched Bindings. Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay. Appeared in VLDB 2008. Motivation. Queries/updates are often invoked repeatedly. E.g.:. Nested subqueries (with correlated evaluation)

leona
Download Presentation

Rewriting Procedures for Batched Bindings

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Rewriting Procedures for Batched Bindings Ravindra Guravannavar and S. Sudarshan Indian Institute of Technology, Bombay Appeared in VLDB 2008

  2. Motivation Queries/updates are often invoked repeatedly. E.g.: • Nested subqueries (with correlated evaluation) • Queries embedded in user-defined functions (UDFs) that are invoked from other queries • Queries/updates in stored procedures that are invoked repeatedly (e.g. batch jobs) • Queries/updates invoked inside loops in application programs

  3. Example: A Nested Query List the orders of high worth (> 10K) SELECT O.orderid, O.custid FROM orders O WHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid); IterativeExecution: For each record in the result of the outer query block - bind the parameters for the nested sub-query - evaluate the nested sub-query - process the results (check predicate, output)

  4. Optimizing Repeated Invocations • Iterative execution of queries often performs poorly • Redundant I/O • Random I/O and poor buffer effects • Network round-trip delays • Decorrelation is widely used for optimizing nested queries

  5. Query Decorrelation Original Query SELECT O.orderid, O.custid FROM orders O WHERE 10000 < (SELECT sum(lineprice) FROM lineitem L WHERE L.orderid=O.orderid); After Decorrelation(Most systems do this automatically) SELECT O.orderid, O.custid FROM orders O, lineitem L WHERE O.orderid=L.orderid GROUPBY O.orderid, O.custid HAVING sum(L.lineprice) > 10000;

  6. Limitations of Decorrelation Tricky at times… • COUNT aggregate • Non-equality correlation predicates The solutions may not produce the best plan Decorrelation techniques are not applicable for: • Nested invocation of procedures with complex logic and embedded query invocations • Iterative invocation of queries/updates from application code

  7. Example: UDF Invoked from a Query SELECT * FROM category WHEREcount_items(category-id) > 50; // Count the items in a given category and its sub-categories int count_items(int categoryId) { … while(…) { … … SELECT count(item-id) INTO icount FROM item WHERE category-id = :curcat; … } } Procedural logic with embedded queries

  8. Key Idea: Parameter Batching • Repeated invocation of an operation is replaced by a single invocation of its batched form • Batched form: Works on a set of parameters • Benefits • Choice of efficient set-oriented plans • Repeated selection → Join • Efficient integrity checks • Efficient disk access (sort RIDs before fetch) • Reduced network round-trip delay

  9. Batched Forms of Basic Operations • Insert • insert into <table1> select … from <table2> … • Bulk load (SQLServer: bcp, Oracle: sqlldr, DB2: load) • Update • update <table1> from <table2> where … (equivalent to SQL:2003 merge statement) • Queries • Make use of join or outer-join (seen in decorrelation)

  10. SQL Merge GRANTMASTER GRANTLOAD merge into GRANTMASTER GM using GRANTLOAD GL on GM.empid=GL.empidwhen matched then update set GM.grants=GL.grants; Notation:Mc1=c1’,c2=c2’,… cn=cn’(r, s)

  11. Effect of Batch Size on Inserts Bulk Load: 1.3 min

  12. Iterative and Set-Oriented Updates on the Server Side • TPC-H PARTSUPP (800,000 records), Clustering index on (partkey, suppkey) • Iterative update of all the records using T-SQL script (each update has an index lookup plan) • Single commit at the end of all updates Takes 1 minute Same update processed as a merge (update … from …) Takes 15 seconds

  13. The Challenge Given a procedure, how to obtain its batched form? Possible to manually rewrite, but time consuming and error-prone.

  14. Our Work • Automatic generation of batched forms of UDFs/stored procedures using rewrite rules • Automatic rewrite of programs to replace looping with batched invocation

  15. Batched Forms • Batched form qb of a pure function q • Returns results for a set of parameters • Result in the form {(parameter, result)} • For queries: standard techniques for creating batched forms (from work on decorrelation) Example: Original query: SELECT item-id FROM item WHERE category-id=? Batched form: SELECT pb.category-id, item-id FROM param-batch pb LEFT OUTER JOIN item ON pb.category-id = item.category-id;

  16. Batch Safe Operations • Batched forms – no guaranteed order of parameter processing • Can be a problem for operations having side-effects Batch-Safe operations • All operations without side effects • Also a few operations with side effects • E.g.: INSERT on a table with no constraints • Operations inside unordered loops (e.g., cursor loops with no order-by)

  17. Generating Batched Forms of Procedures Step 1: Create trivial batched form. Transform: procedure p(r) { body of p} To procedure p_batched(pb) { for each record r in pb { < body of p> < collect the return value > } return the collected results paired with corrsp. params; } Step 2: Optimize query invocations in the trivial batched form

  18. Rule 1A: Rewriting a Simple Set Iteration Loop where q is any batch-safe operation with qb as its batched form Rule 1B Handles return values

  19. Condition for Invocation Return values Operation to Batch // Batched invocation Let s = // Merge the results s where Rule 1C: Batching Conditional Statements

  20. Table(T) t; while(p) { ss1 modified to save local variables as a tuple in t } Collect the parameters for each r in t { sq modified to use attributes of r; } Can apply Rule 1A-1C and batch. for each r in t { ss2 modified to use attributes of r; } Process the results Rule 2: Splitting a Loop while (p) { ss1; sq; ss2; } * Conditions Apply

  21. Rule 2: Pre-conditions • The conditions make use of the data dependence graph • Data Dependence Graph • Nodes: program statements • Edges: dependencies between statements that read/write same location • Types of Dependencies • Flow (Write  Read), Anti (Read  Write) and Output (Write  Write) • Loop-carried flow/anti/output • Dependencies across iterations • Pre-conditions for Rule-2 • No loop-carried flow/output dependencies cross the points at which the loop is split • No loop-carried dependencies through external data (e.g., DB)

  22. Data Dependencies Flow Dependence Anti Dependence Output Dependence Control Dependence Loop-Carried Need for Reordering Statements (s1) while (category != null) { (s2) item-count = q1(category); (s3) sum = sum + item-count; (s4) category = getParent(category); }

  23. Reordering Statements to Enable Rule 2 while (category != null) { int item-count = q1(category); // Query to batch sum = sum + item-count; category = getParent(category); } Splitting made possible after reordering while (category != null) { int temp = category; category = getParent(category); int item-count = q1(temp); sum = sum + item-count; }

  24. Cycles of Flow Dependencies

  25. Rule 4: Control Dependencies while (…) { item = …; qty = …; brcode = …; if (brcode == 58) { brcode = 1; q(item, qty, brcode); } } Remember the branching decision in a boolean variable while (…) { item = …; qty = …; brcode = …; boolean cv = (brcode == 58); cv? brcode = 1; cv? q(item, qty, brcode); }

  26. Cascading of Rules After applying Rule 2 Table(…) t; while (…) { r.item = …; r.qty = …; r.brcode = …; r.cv = (r.brcode == 58); r.cv? r.brcode = 1; t.addRecord(r); } for each r in t { r.cv? q(r.item, r.qty, r.brcode); } Rule 1C qb(Pitem,qty,brcode(scv=true(t))

  27. Batching Across Multiple Levels while(…) { …. while(…) { ... q(v1, v2, … vn); … } … } while(…) { …. Table t (…); while(…) { ... } qb(t); … } Batch q w.r.t inner loop

  28. Parameter Batches as Nested Tables

  29. Nest and Unnest Operations • μc(T) : Unnest T w.r.t. table-valued column c • vS s(T) : Group T on columns other than S and nest the columns in S under the name s

  30. Rule 6: Unnesting Nested Batches

  31. Implementation and Evaluation • Conceptually the techniques can be used with any language (PL/SQL, Java, C#-LINQ) • We implemented for Java using the SOOT framework for program analysis Evaluation • No benchmarks for procedural SQL • Scenarios from three real-world applications, which faced performance problems • Data Sets: TPC-H and synthetic

  32. Application 1: ESOP Management App Process records from a file in custom format. Repeatedly called a stored procedure with mix of queries, updates and business logic. - Validate inputs - Lookup existing record - Update or Insert Rewritten program used outer-join and merge.

  33. Application 2: Category Traversal Find the maximum size of any part in a given category and its sub-categories. Clustered Index CATEGORY (category-id) Secondary Index PART (category-id) Original Program Repeatedly executed a query that performed selection followed by grouping. Rewritten Program Group-By followed by Join

  34. Application 3: Value Range Expansion Expand records of the form: (start-num, end-num, issued-to, …) Performed repeated inserts. Rewritten program Pulled the insert stmt out of the loop and replaced it with batched insert. ~75% improvement ~10% overhead Log scale

  35. Related Work • Query unnesting • E.g. Kim [TODS82], Dayal [VLDB87], Seshadri et al. [ICDE96], Galindo Legaria et al. [SIGMOD01] • We extend the benefits of unnesting to procedural nested blocks • Graefe [BTW03] highlights the importance of batching in nested iteration plans (a motivation for our work) • Optimizing set iteration loops in database programming languages - Lieuwen and DeWitt [SIGMOD 92] • Also perform program rewriting, but • Do not address batching of queries/procedure calls within the loop • Limited language constructs - No WHILE loops, IF-THEN-ELSE • Parallelizing compilers Kennedy[90], Padua[95] • We borrow and extend the techniques

  36. Conclusion • Automatic rewrite of programs for set-orientation is possible • Combining query rewrite with program analysis is the key • Our experiments on real-world scenarios show significant benefits due to batching Future Work • Cost-based selection of operations to batch • Handling exceptions • Automatically deciding whether an operation is batch-safe • Implementing rewriting for PL/SQL

  37. Questions?

More Related