280 likes | 340 Views
Explore VAOs (Variable Accuracy Operators) for cost-effective execution of functions with trade-offs between accuracy and efficiency. Learn about iterative strategies and a new UDF API for dynamic query optimization.
E N D
Adaptive Execution of Variable-Accuracy Functions Matt Denny - UC Berkeley/Fred Alger, Inc.Michael Franklin - UC Berkeley VLDB Conference Seoul September 2006
Introduction • Many applications apply expensive functions to streams of data • Finance: real-time market monitoring with securities models • Power Management: overload prediction using current weather conditions • Supply Chain Management: inventory models using RFID data to find shortages in real-time Matt Denny, Mike Franklin UC Berkeley EECS
Example: Bond Pricing • BondData: table of bond data (maturity, coupon, etc.) • IntRate: stream of interest rate data • model(): C++/Java routine takes bond data and interest rate, and returns a price Filtering SELECT BD.BondID FROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100 Aggregation SELECT MAX(model(BD,IR.rate)) FROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 Continuous Queries w/ UDFs Matt Denny, Mike Franklin UC Berkeley EECS
The Problem • Analytical functions can be expensive! • minutes or hours per data point. • Query processor has no control over execution of individual function calls. • UDF API is aBlack Box • Earlier work aims to avoid UDF calls: • predicate reordering ([HS93][KMPS94][CS96])) • memoization and caching ([HN96], [DF05]) • Remaining calls can still be a showstopper. Matt Denny, Mike Franklin UC Berkeley EECS
SELECT BD.* FROM BondData BD, IntRate IR [Rows 1] WHERE BD.numHeld > 0 AND model(BD,IR.rate) > $100 The Intuition • Many functions have accuracy/cost tradeoffs. e.g., iterative solvers. • UDFs often appear in predicates and aggregates where exact answers are not required. Matt Denny, Mike Franklin UC Berkeley EECS
Our Solution VAOs (Variable Accuracy Operators) New query operators that: • Expose function cost/accuracy tradeoffs using a new UDF API. • Exploit this tradeoff to avoid excess work while correctly answering the query. Matt Denny, Mike Franklin UC Berkeley EECS
VAOs - Basic Idea • Initially run function to obtain a coarse answer. • This needs to be cheaper than running to a more accurate answer. • If more accuracy needed - iterate! Matt Denny, Mike Franklin UC Berkeley EECS
BD 1 Select > 100 ? Result BD 1 $105.01 execute model (IR.Rate,BD) Bond Data Interest Rate . . . 10.1% Traditional Execution - Select SELECT BD.bondID FROM BondData BD, IntRate IR [Rows 1] WHERE model(BD,IR.rate) > $100; Matt Denny, Mike Franklin UC Berkeley EECS
-VAO Select > 100 ? L H BD 1 $98 $110 Result Object execute model (IR.Rate,BD) Bond Data Interest Rate . . . 10.1% VAO Execution: Select SELECT BD.bondID FROM BondData BD, IntRate IR [Rows 1] WHERE model(BD,IR.rate) > $100; Matt Denny, Mike Franklin UC Berkeley EECS
BD 1 -VAO Select > 100 ? L H Iterate() BD 1 $101 $108 Result Object execute model (IR.Rate,BD) Bond Data Interest Rate . . . 10.1% VAO Execution: Select SELECT BD.bondID FROM BondData BD, IntRate IR [Rows 1] WHERE model(BD,IR.rate) > $100; Matt Denny, Mike Franklin UC Berkeley EECS
VAO API • Use iterative interface • Traditional: <number> = f(<args>) • VAO: <result object> = f(<args>) • fields for (conservative) error bounds • iterate() method: refines bounds with more work • for some vaos: also need estimates for CPU cost and error reduction of next iteration • Useful for: • Any sort of iterative function (e.g. root finders, numerical integration) • Any technique with iterative step refinement (e.g. PDEs) Matt Denny, Mike Franklin UC Berkeley EECS
Iteration Strategy • Selection iterates over an object until predicate value is known. • Aggregate operators more difficult • Answer dependent on sets of result objects • Need to decide how to iterate over multiple result objects Matt Denny, Mike Franklin UC Berkeley EECS
f ( x ) b o u n d s f ( x ) b o u n d s f ( x ) b o u n d s Iterate Over f(x2) Iterate Over f(x1) initial bounds f ( x ) b o u n d s x x x x x x x x x 1 2 Iterate Over both 1 2 1 2 x x x 1 2 Example: MAX(f(x1), f(x2)) Need an iteration strategy that attempts to minimize cost Matt Denny, Mike Franklin UC Berkeley EECS
Solution: Greedy Strategy • Iterate over the object that has the best ratio of benefit to CPU cost among the current choices. • Good strategy if functions converge • Later iterations likely to have less benefit/unit cost • Operator-dependent Matt Denny, Mike Franklin UC Berkeley EECS
Example Revisited MAX(f(x1),f(x2)) • Goal State: no overlap between f(x1) and f(x2) • Greedy Strategy: • choose best overlap reduction per CPU cost • Use error reduction estimates to estimate overlap reduction. • Cost estimation depends on function. Matt Denny, Mike Franklin UC Berkeley EECS
f(x) x x x 1 2 $.04 4 sec. $.04 4 sec. Example Revisited • Determine if f(x1) > f(x2) Matt Denny, Mike Franklin UC Berkeley EECS
f(x) x x x 1 2 $.04 $.01 8 sec. 4 sec. $.04 $.02 4 sec. 4 sec. Example Revisited • Determine if f(x1) > f(x2) Matt Denny, Mike Franklin UC Berkeley EECS
$.01 $0 8 sec. 8 sec. $0 $.02 4 sec. 8 sec. Example Revisited • Determine if f(x1) > f(x2) f(x) x x x 1 2 Matt Denny, Mike Franklin UC Berkeley EECS
Aggregates Matt Denny, Mike Franklin UC Berkeley EECS
Performance Setup • Standalone implemenation of VAO framework in C++ • Used numeric bond model and bond data from [DF05] • Real Bond Data - 500 Mortgage-backed Securities. • Synthetic Bond Data - to stress test VAOs • Single Interest Rate. Matt Denny, Mike Franklin UC Berkeley EECS
VAO Implementation • Numeric bond model [S95] implemented with traditional and VAOs interface • Based on PDE solver • VAO iterate(): double size of PDE grid • Bounds and error reduction estimates derived by using current and previous iteration results and Richardson’s Extrapolation [BF01] Matt Denny, Mike Franklin UC Berkeley EECS
Selection Performance 500 bonds, 1 interest rate Runtime depends on number of bonds close to predicate. Matt Denny, Mike Franklin UC Berkeley EECS
Stress Test • Generate bonds with accurate values near the predicate Gaussian, mean = predicate value, vary std. dev. • Std. dev. of real • bonds: $7.78 Matt Denny, Mike Franklin UC Berkeley EECS
In the Paper • Other Results • Max • Real bonds: 111 sec. vs. 6953 sec. • Synthetic bonds: VAOs better than traditional above $.05 std. dev. • Average • Up to 5x improvement if a small number of bonds are weighted heavily in average. • Details on Error and Cost estimates for PDE-based bond model. • Other types of models covered in Matt’s thesis. Matt Denny, Mike Franklin UC Berkeley EECS
Conclusion • Many emerging CQ applications require the repeated execution of expensive functions. • VAOs are new operators that change how these functions execute • Use new iterative API that exposes work-accuracy tradeoff in functions • Do only enough work to answer the query using greedy strategy to choose iterations • With real bond data and models, VAOs show 1-2 orders of magnitude improvement. • For more detailed information: mdenny@cs.berkeley.edu Matt Denny, Mike Franklin UC Berkeley EECS
Student Advisor The Advisor’s Dodge Relative Contribution to Research 100 80 This Work 60 Percent Contribution 40 20 0 … 0 1 2 3 4 5 Time in Program (years) Courtesy of Jennifer Widom Matt Denny, Mike Franklin UC Berkeley EECS
Bibliography • [HS93] J. M. Hellerstein and M. Stonebraker, “Predicate Migration: Optimizing Queries with Expensive Predicates”, SIGMOD 1993. • [HN96] J. M. Hellerstein and J. Naughton, “Query Execution Techniques for Caching Expensive Predicates”, SIGMOD 1996. • [DF05] M. Denny and M.J. Franklin. “Predicate Result Range Caching for Continuous Queries”, SIGMOD 2005 Matt Denny, Mike Franklin UC Berkeley EECS
Bibliography • [S95] R. Stanton, “Rational Prepayment and the Valuation of Mortgage-Backed Securities,” The Review of Financial Studies, Vol. 8, No. 3, 677-708. • [BF01] R.L. Burden, J.D. Faires, Numerical Analysis. Brooks/Cole, 2001. Matt Denny, Mike Franklin UC Berkeley EECS