180 likes | 274 Views
Advanced Database Aggregation Query Processing. Donghui Zhang Computer Science Department University of California, Riverside. Aggregation Problem.
E N D
Advanced Database Aggregation Query Processing Donghui Zhang Computer Science Department University of California, Riverside EDBT Ph.D. Workshop 2002
Aggregation Problem • Maintain a set of objects, each having a value. Given a condition which holds for a sub-set of objects, compute the total value of objects in this sub-set. • E.g. “find the total salary of employees who joined the company less than a year”. EDBT Ph.D. Workshop 2002
Aggregation over Objects with Extent • Objects with extent: versus point objects. • Real-life applications: temporal, spatial, etc. • An employee works for the company during a certain period of time; “find the total salary of employees who worked for the company during 1999”. • A rainfall record occurs within a spatial region; “find the total volume of rainfall in Los Angeles”. EDBT Ph.D. Workshop 2002
Functional Box-Sum • Maintain a set of objects, each having a box and a value function; • given query box q, computethe total value of objects intersecting q, where • the contribution of an object is the integral of its value function over its intersection with q. EDBT Ph.D. Workshop 2002
Functional Box-Sum • functional box-sum: 4*50+3*12 = 236. EDBT Ph.D. Workshop 2002
20 ò - - = ( 11 7 ) ( x 2 ) d x 310. 15 Functional Box-Sum • Moreover, object value can be a function; • FBS= EDBT Ph.D. Workshop 2002
Straightforward Approaches • No index. For each query, scan through all records. Not efficient. • Maintain the objects in an R-tree (which speeds up the selection query). To compute an aggregate, select the objects and aggregate their values on-the-fly. Query time: O(n). EDBT Ph.D. Workshop 2002
Our Solution • We reduce the functional box-sum problem into a simpler problem (the dominance-sum problem) and we build an index specialized for computing the dominance-sums. • Instead of storing the original data, the specialized index stores specially aggregated information, which leads to O(log2n) query time. EDBT Ph.D. Workshop 2002
Functional Box-Sum OIFBS • A special case of functional box-sum is OIFBS (Origin-Involved Functional Box-Sum), where the query box contains the origin of space. • A functional Box-Sum query can be reduced to the OIFBS: we compute the OIFBS from origin to upper right corner of the query box, then subtract the parts to the left and below the query box (which are also OIFBS queries). EDBT Ph.D. Workshop 2002
Dominance-Sum • Maintain a set of weighted points; • Given query point p, compute total weight of objects dominated by p (i.e. to the lower left of p). dominance-sum = 18 EDBT Ph.D. Workshop 2002
OIFBS Dominance-Sum • Idea: to insert an object (with a rectangular region), insert its corner points, associating a function with each corner. • To compute an OIFBS regarding box [origin, p], compute the dominance-sum regarding p, i.e. the summation of functions associated with points dominated by p. EDBT Ph.D. Workshop 2002
New Dominance-Sum Index • For the dominance-sum problem, we propose the BA-tree: • a k-d-B-tree augmented with additional information at index records. • O(log2n) query time, when balanced. EDBT Ph.D. Workshop 2002
Performance Functional box-sum query cost EDBT Ph.D. Workshop 2002
Summary of Our Aggregation Work • The functional box-sum solution described here is to appear in [PODS’02]. • Also in [PODS’02], we solved a variation: a simple box-sum aggregation problem, which is to find the total value of objects intersecting the query rectangle. • We solved some other aggregation problems... EDBT Ph.D. Workshop 2002
Range-Temporal Aggregation • Maintain a set of temporal records, each having a key, a value and a time interval. Given a key range r and time interval i, compute the total value of records whose keys are in r and whose intervals intersect i. • Appeared in [PODS’01]. EDBT Ph.D. Workshop 2002
Temporal Aggregation over Data Streams • Temporal aggregation in the circumstance when records accumulate in a streaming manner. There is limited storage, but we want to answer aggregation queries both for recent data and for older data. • To appear in [EDBT’02]. EDBT Ph.D. Workshop 2002
Box-Max Aggregation • Maintain a set of spatial objects, each having a spatial region and a value. Given a query region r, find the Min/Max value over all objects intersecting r. • Appeared in [GIS’01]. EDBT Ph.D. Workshop 2002
Conclusions • We have proposed specialized index structures for various complex aggregation problems. • In all cases, our proposed methods have much better query performance than the existing approaches, sometimes over 100 times faster. • We recommend that these indices should be implemented in commercial DBMS in circumstances when the aggregates need to be computed very fast. EDBT Ph.D. Workshop 2002