400 likes | 516 Views
Indexing For Function Approximation. Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University. Motivation. Simulations are important in science Large simulations computationally infeasible Driven by complex mathematical models
E N D
Indexing For Function Approximation Biswanath Panda Mirek Riedewald, Stephen B. Pope, Johannes Gehrke, L. Paul Chew Cornell University VLDB 2006, Seoul
Motivation • Simulations are important in science • Large simulations computationally infeasible • Driven by complex mathematical models • Require solution to complex differential equations • Approximation techniques speed up simulations • Bounded error in the simulation • Approximate simulation steps using information from previous steps VLDB 2006, Seoul
Outline • Example scientific application • Combustion simulation • Function approximation problem • Formulation • Hardness • Algorithm • Indexing problem VLDB 2006, Seoul
Combustion Simulation High Dimensional Composition Vector Air Outflow Inflow Methane Mixing & Reaction Air + Methane VLDB 2006, Seoul
Properties Of Simulation • Composition dimensionality • 9 for simple hydrogen simulations • >50 for complex methane simulations • Cost of reaction function evaluation: 30ms • Number of function evaluations: 108 to 1010 • Total simulation time • 108 function evaluations ≈ 35 days VLDB 2006, Seoul
Function Approximation • Approximate the reaction function • Approach • Use previous function evaluations to approximate future function evaluations • ISAT (In Situ Adaptive Tabulation) [Pope’ 97] • Definition: ε-approximation of f(x) • Let f: Rm → Rn be a function, let x Rm and ε R. f*(x) is an ε-approximation of f(x) if || f*(x) –f(x)|| < ε VLDB 2006, Seoul
Example f Cost VLDB 2006, Seoul
Example f f*(x2) = f(x) + s * (x2 - x) ( x, f(x) ) ε An ε-Local Region Rf,f*(x, ε) Rm ε x1 x2 Original Cost Cost VLDB 2006, Seoul
Example f2* f f3* f1* x1 x2 x3 x4 x5 x6 Original Cost Cost VLDB 2006, Seoul
Example f2* f f3* f1* x1 x2 x3 x4 x5 x6 When should a local region be added? VLDB 2006, Seoul
Example f2* f4* f f3* Each query point can be covered by several Local Regions f1* x1 x2 x3 x4 x5 x7 x6 x8 VLDB 2006, Seoul
Challenges • Finding good f* s and corresponding Local Regions • Computing a set of Local Regions • Data management: storing Local Regions for future use • Problem: Minimize total simulation time by computing and storing a set of Local Regions VLDB 2006, Seoul
Finding The Optimal Set Of Local Regions • Simplified cost model • Both the function value and Local Region at a point can be obtained at some constant cost equal across all regions • Approximations have zero cost • Offline Problem • Given a set X={ x1, x2, … xn} of query points, find the smallest set L={ l1, l2, … lk } of Local Regions, such that for each xi X there is an lj L which contains xi • NP-Complete: Reduction from Geometric Covering By Discs • Online Problem • No online algorithm is competitive VLDB 2006, Seoul
Algorithm Illustration f2* f4* f f3* f1* x1 x2 x3 x4 x5 x7 x6 x8 VLDB 2006, Seoul
Algorithm Initialize S Retrieve Lookup x in S Simulation N Y Local Region Found? Return Approximation Evaluate function at x Add new region containing x to S Add VLDB 2006, Seoul
Possible Instantiation Of Local Regions • Local Regions can be approximated using high dimensional ellipsoids [Pope ‘97] • Based on Taylor Expansion of function • Two step approach • Initial conservative approximation • Grow x x1 VLDB 2006, Seoul
Example x ε’ < ε x1 x2 VLDB 2006, Seoul
Example x ε’ < ε x’1 x’2 VLDB 2006, Seoul
Example x ε’ < ε ε x’1 x’2 VLDB 2006, Seoul
Updating Existing Regions N Evaluate function at x Y Can existing region contain x? N Grow Update existing regions to contain x Add new region containing x to S VLDB 2006, Seoul
Outline • Example scientific application • Combustion Simulation • Function Approximation Problem • Formulation • Hardness • Algorithm • Indexing problem VLDB 2006, Seoul
Indexing Problem • Workload • Retrieve: Find ellipsoid containing query point VLDB 2006, Seoul
Indexing Problem • Workload • Retrieve: Find ellipsoid containing query point • Grow • Find ellipsoids to be grown • Update grown ellipsoids VLDB 2006, Seoul
Indexing Problem • Workload • Retrieve: Find ellipsoid containing query point • Grow • Find ellipsoids to be grown • Update grown ellipsoids • Add: Insert a new ellipsoid VLDB 2006, Seoul
New Indexing Problem • Shape of regions • Updates and queries interleaved • Additional costs: ellipsoid maintenance costs • Overall aim: Reduce total simulation time • Retrieve/grow/add are all optional • Tuning parameters at each step VLDB 2006, Seoul
Outline • Example scientific application • Combustion simulation • Function approximation problem • Formulation • Hardness • Algorithm • Indexing problem • Cost structure, tuning parameters and effects • Index structures and experiments VLDB 2006, Seoul
Grow Effects Cmiss = tf + tgrowsearch + Igrow * Cgrow + (1-Igrow)*Cadd • Tuning Parameter: Ellg • Limit on number of ellipsoids examined for growing • No pruning criteria • Affects • tgrowsearch • Chance of finding a growable ellipsoid • Tuning Parameter: Ngrown • Number of ellipsoids grown per step • Affects • Cgrow • Structure of the index (overlapping ellipsoids) VLDB 2006, Seoul
Retrieve Effects Ctot = tsearch + Iret * tla + (1-Iret) * Cmiss • Tuning Parameter: Ellr • Limit on number of ellipsoids examined during retrieve • Limits how much of the index is searched • Affects • tsearch • Chances of a current retrieve and also future retrieves VLDB 2006, Seoul
Add Effects Cmiss = tf + tgrowsearch + Igrow * Cgrow + (1-Igrow)*Cadd • Tuning parameter: Indirectly controlled by retrieves and grows • Affects • Should query point be covered by an add or grow? (-) Computing new ellipsoids is expensive (-) New ellipsoids cover smaller part of the domain (+) May lead to better ellipsoid distribution VLDB 2006, Seoul
Candidate Index Structures • Bounding Box Rtree • Point Rtree • Ellipsoid Rtree • Random Projection Rtree • Binary Tree • MRU List + Rtree VLDB 2006, Seoul
Binary Tree 1 1 B 2 A A 2 C q C B Primary Retrieve VLDB 2006, Seoul
Binary Tree 1 q 1 B 2 A A 2 C C B Secondary Retrieve VLDB 2006, Seoul
Binary Tree 1 1 B 2 A A 2 C C B VLDB 2006, Seoul
Binary Tree 1 1 B 2 A A 2 C B C 3 D 3 C D Secondary Retrieve now Primary Retrieve VLDB 2006, Seoul
Effects In Action: Binary Tree • 32 dimensional Methane simulation • 6 x 106 queries • Windows XP machine (2.4 Ghz, 2GB) VLDB 2006, Seoul
MRU List + Rtree • MRU List for retrieving • High locality • Rtree for searching growable ellipsoids Rtree MRU List VLDB 2006, Seoul
Effects In Action: MRU List + Rtree • Effects very different from Binary Tree VLDB 2006, Seoul
Total Simulation Times VLDB 2006, Seoul
Conclusion & Future Work • Formulated the function approximation problem • New class of applications for high dimensional indexing • Understand index selection for function approximation • Future work • Dynamic parameter settings • New benchmark for index structures • Evaluation of other index structures • Comparison with other function approximation techniques VLDB 2006, Seoul
Questions? VLDB 2006, Seoul