1 / 28

Adaptive Execution of Variable-Accuracy Functions - UC Berkeley/Fred Alger, Inc.

Explore VAOs (Variable Accuracy Operators) for cost-effective execution of functions with trade-offs between accuracy and efficiency. Learn about iterative strategies and a new UDF API for dynamic query optimization.

wilmet
Download Presentation

Adaptive Execution of Variable-Accuracy Functions - UC Berkeley/Fred Alger, Inc.

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Adaptive Execution of Variable-Accuracy Functions Matt Denny - UC Berkeley/Fred Alger, Inc.Michael Franklin - UC Berkeley VLDB Conference Seoul September 2006

  2. Introduction • Many applications apply expensive functions to streams of data • Finance: real-time market monitoring with securities models • Power Management: overload prediction using current weather conditions • Supply Chain Management: inventory models using RFID data to find shortages in real-time Matt Denny, Mike Franklin UC Berkeley EECS

  3. Example: Bond Pricing • BondData: table of bond data (maturity, coupon, etc.) • IntRate: stream of interest rate data • model(): C++/Java routine takes bond data and interest rate, and returns a price Filtering SELECT BD.BondID FROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100 Aggregation SELECT MAX(model(BD,IR.rate)) FROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 Continuous Queries w/ UDFs Matt Denny, Mike Franklin UC Berkeley EECS

  4. The Problem • Analytical functions can be expensive! • minutes or hours per data point. • Query processor has no control over execution of individual function calls. • UDF API is aBlack Box • Earlier work aims to avoid UDF calls: • predicate reordering ([HS93][KMPS94][CS96])) • memoization and caching ([HN96], [DF05]) • Remaining calls can still be a showstopper. Matt Denny, Mike Franklin UC Berkeley EECS

  5. SELECT BD.* FROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100 The Intuition • Many functions have accuracy/cost tradeoffs. e.g., iterative solvers. • UDFs often appear in predicates and aggregates where exact answers are not required. Matt Denny, Mike Franklin UC Berkeley EECS

  6. Our Solution VAOs (Variable Accuracy Operators) New query operators that: • Expose function cost/accuracy tradeoffs using a new UDF API. • Exploit this tradeoff to avoid excess work while correctly answering the query. Matt Denny, Mike Franklin UC Berkeley EECS

  7. VAOs - Basic Idea • Initially run function to obtain a coarse answer. • This needs to be cheaper than running to a more accurate answer. • If more accuracy needed - iterate! Matt Denny, Mike Franklin UC Berkeley EECS

  8. BD 1 Select > 100 ? Result BD 1 $105.01 execute model (IR.Rate,BD) Bond Data Interest Rate . . . 10.1% Traditional Execution - Select SELECT BD.bondID FROM BondData BD, IntRate IR [Rows 1] WHERE model(BD,IR.rate) > $100; Matt Denny, Mike Franklin UC Berkeley EECS

  9.  -VAO Select > 100 ? L H BD 1 $98 $110 Result Object execute model (IR.Rate,BD) Bond Data Interest Rate . . . 10.1% VAO Execution: Select SELECT BD.bondID FROM BondData BD, IntRate IR [Rows 1] WHERE model(BD,IR.rate) > $100; Matt Denny, Mike Franklin UC Berkeley EECS

  10. BD 1  -VAO Select > 100 ? L H Iterate() BD 1 $101 $108 Result Object execute model (IR.Rate,BD) Bond Data Interest Rate . . . 10.1% VAO Execution: Select SELECT BD.bondID FROM BondData BD, IntRate IR [Rows 1] WHERE model(BD,IR.rate) > $100; Matt Denny, Mike Franklin UC Berkeley EECS

  11. VAO API • Use iterative interface • Traditional: <number> = f(<args>) • VAO: <result object> = f(<args>) • fields for (conservative) error bounds • iterate() method: refines bounds with more work • for some vaos: also need estimates for CPU cost and error reduction of next iteration • Useful for: • Any sort of iterative function (e.g. root finders, numerical integration) • Any technique with iterative step refinement (e.g. PDEs) Matt Denny, Mike Franklin UC Berkeley EECS

  12. Iteration Strategy • Selection iterates over an object until predicate value is known. • Aggregate operators more difficult • Answer dependent on sets of result objects • Need to decide how to iterate over multiple result objects Matt Denny, Mike Franklin UC Berkeley EECS

  13. f ( x ) b o u n d s f ( x ) b o u n d s f ( x ) b o u n d s Iterate Over f(x2) Iterate Over f(x1) initial bounds f ( x ) b o u n d s x x x x x x x x x 1 2 Iterate Over both 1 2 1 2 x x x 1 2 Example: MAX(f(x1), f(x2)) Need an iteration strategy that attempts to minimize cost Matt Denny, Mike Franklin UC Berkeley EECS

  14. Solution: Greedy Strategy • Iterate over the object that has the best ratio of benefit to CPU cost among the current choices. • Good strategy if functions converge • Later iterations likely to have less benefit/unit cost • Operator-dependent Matt Denny, Mike Franklin UC Berkeley EECS

  15. Example Revisited MAX(f(x1),f(x2)) • Goal State: no overlap between f(x1) and f(x2) • Greedy Strategy: • choose best overlap reduction per CPU cost • Use error reduction estimates to estimate overlap reduction. • Cost estimation depends on function. Matt Denny, Mike Franklin UC Berkeley EECS

  16. f(x) x x x 1 2 $.04 4 sec. $.04 4 sec. Example Revisited • Determine if f(x1) > f(x2) Matt Denny, Mike Franklin UC Berkeley EECS

  17. f(x) x x x 1 2 $.04 $.01 8 sec. 4 sec. $.04 $.02 4 sec. 4 sec. Example Revisited • Determine if f(x1) > f(x2) Matt Denny, Mike Franklin UC Berkeley EECS

  18. $.01 $0 8 sec. 8 sec. $0 $.02 4 sec. 8 sec. Example Revisited • Determine if f(x1) > f(x2) f(x) x x x 1 2 Matt Denny, Mike Franklin UC Berkeley EECS

  19. Aggregates Matt Denny, Mike Franklin UC Berkeley EECS

  20. Performance Setup • Standalone implemenation of VAO framework in C++ • Used numeric bond model and bond data from [DF05] • Real Bond Data - 500 Mortgage-backed Securities. • Synthetic Bond Data - to stress test VAOs • Single Interest Rate. Matt Denny, Mike Franklin UC Berkeley EECS

  21. VAO Implementation • Numeric bond model [S95] implemented with traditional and VAOs interface • Based on PDE solver • VAO iterate(): double size of PDE grid • Bounds and error reduction estimates derived by using current and previous iteration results and Richardson’s Extrapolation [BF01] Matt Denny, Mike Franklin UC Berkeley EECS

  22. Selection Performance 500 bonds, 1 interest rate Runtime depends on number of bonds close to predicate. Matt Denny, Mike Franklin UC Berkeley EECS

  23. Stress Test • Generate bonds with accurate values near the predicate Gaussian, mean = predicate value, vary std. dev. • Std. dev. of real • bonds: $7.78 Matt Denny, Mike Franklin UC Berkeley EECS

  24. In the Paper • Other Results • Max • Real bonds: 111 sec. vs. 6953 sec. • Synthetic bonds: VAOs better than traditional above $.05 std. dev. • Average • Up to 5x improvement if a small number of bonds are weighted heavily in average. • Details on Error and Cost estimates for PDE-based bond model. • Other types of models covered in Matt’s thesis. Matt Denny, Mike Franklin UC Berkeley EECS

  25. Conclusion • Many emerging CQ applications require the repeated execution of expensive functions. • VAOs are new operators that change how these functions execute • Use new iterative API that exposes work-accuracy tradeoff in functions • Do only enough work to answer the query using greedy strategy to choose iterations • With real bond data and models, VAOs show 1-2 orders of magnitude improvement. • For more detailed information: mdenny@cs.berkeley.edu Matt Denny, Mike Franklin UC Berkeley EECS

  26. Student Advisor The Advisor’s Dodge Relative Contribution to Research 100 80 This Work 60 Percent Contribution 40 20 0 … 0 1 2 3 4 5 Time in Program (years) Courtesy of Jennifer Widom Matt Denny, Mike Franklin UC Berkeley EECS

  27. Bibliography • [HS93] J. M. Hellerstein and M. Stonebraker, “Predicate Migration: Optimizing Queries with Expensive Predicates”, SIGMOD 1993. • [HN96] J. M. Hellerstein and J. Naughton, “Query Execution Techniques for Caching Expensive Predicates”, SIGMOD 1996. • [DF05] M. Denny and M.J. Franklin. “Predicate Result Range Caching for Continuous Queries”, SIGMOD 2005 Matt Denny, Mike Franklin UC Berkeley EECS

  28. Bibliography • [S95] R. Stanton, “Rational Prepayment and the Valuation of Mortgage-Backed Securities,” The Review of Financial Studies, Vol. 8, No. 3, 677-708. • [BF01] R.L. Burden, J.D. Faires, Numerical Analysis. Brooks/Cole, 2001. Matt Denny, Mike Franklin UC Berkeley EECS

More Related