Efficient Computation of Temporal Aggregates with Range Predicates

Efficient Computation of TemporalAggregates with Range Predicates D. Zhang*, A. Markowetz**, V. J. Tsotras*, D. Gunopulos* and B. Seeger** * University of California, Riverside ** Philipps Universität Marburg, Germany

Outline • Introduction & Motivation • Problem Decomposition • The MVSB-tree • Performance Results • Conclusions

Introduction & Motivation • Consider a collection of temporal records. • Each record: key k , value v , time interval [t1 , t2]. • E.g.: employees and their salaries over time. • Temporal Aggregation: aggregate values over time. • Focus on SUM/COUNT/AVG. Introduction & Motivation

Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] Introduction & Motivation

Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] E.g. the sum at t2 is 13. ‘Given interval [t1, t2], aggregate over all records that intersect [t1, t2]’. (SB-tree [YW01]) Introduction & Motivation

Previous Work ‘Given time t, aggregate over all records that contain t’. [Tum92, KS95, YK97, GHR+, MLI00] E.g. the sum at t2 is 13. ‘Given interval [t1, t2], aggregate over all records that intersect [t1, t2]’. (SB-tree [YW01]) E.g. the sum over [t1 , t2] is 28. Introduction & Motivation

Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t1, t2] with keys in range [k1, k2]’. E.g. the RTA-sum over [k1, k2]x[t1, t2] is 19. Introduction & Motivation

Range-Temporal Aggregation (RTA) ‘Aggregate over all records intersecting interval [t1, t2] with keys in range [k1, k2]’. E.g. the RTA-sum over [k1, k2]x[t1, t2] is 19. • Find AVG salary over past ten years of all employees whose last names start with ‘B’. Introduction & Motivation

Previous approaches would need a separate index for each possible key range. (inefficient) • Alternative: • index the records; • selection query: ‘find all records intersecting [k1, k2]x [t1, t2]’. • Query time is O(n). • Our solution: O(logbn). Introduction & Motivation

Problem Decomposition • Decompose RTA into LKST and LKLT queries. LKST query: given k, t, aggregate over all records with keys less than k and intervals containing t. E.g. LKST(k2, t2)=11. Problem Decomposition

LKLT query: given k, t, aggregate over all records with keys less than k and intervals ending before t. E.g. LKLT(k2, t2)=20. Problem Decomposition

- + = RTA([k1, k2]x[t1, t2]) Problem Decomposition

= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - + Problem Decomposition

= RTA([k1, k2]x[t1, t2]) - LKST(k1, t2) - + Problem Decomposition

= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + Problem Decomposition

= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) Problem Decomposition

= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + - LKLT(k1, t2) Problem Decomposition

= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) Problem Decomposition

= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) LKLT(k2, t1) Problem Decomposition

= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) - LKLT(k1, t1) Problem Decomposition

= RTA([k1, k2]x[t1, t2]) LKST(k2, t2) - LKST(k1, t2) - + LKLT(k2, t2) - LKLT(k1, t2) LKLT(k2, t1) - LKLT(k1, t1) Problem Decomposition

RTA([k1, k2]x[t1, t2]) = LKST(k2, t2) - LKST(k1, t2) + LKLT(k2, t2) - LKLT(k1, t2) - LKLT(k2, t1) + LKLT(k1, t1) • The RTA query is decomposed to LKST and LKLT. Problem Decomposition

Index Design • Both LKST and LKLT are point queries: ‘given k, t, return value’. • An index for LKST and LKLT should: • store points in key-time space; • maintain a value for each point; • support point queries. Index Design

a record: at t1, inserted as: at t2, updated as: Model • Assume updates come in increasing time order (transaction-time model). Index Design

at t1 at t2 The LKST index The effect of inserting record (k, [t1, t2], v): Index Design

at t2 The LKLT index The effect of inserting record (k, [t1, t2], v): no update at t1 Index Design

Update Operation • Common update operation for both: insert (k, t):v. • That is: add v to all points in [k, t] x [kmax, tmax]. • Conclusion: an index supporting point query and the above update can be used for LKLT and LKST. Index Design

The MVSB-tree • A partially persistent SB-tree. It inherits features from both the SB-tree [YW01] and the MVBT [BGO+96]. The MVSB-tree

Insertion The MVSB-tree

Insertion (cont.) • To handle overflow, copy records with end=tmax to a new page. The MVSB-tree

copy root2: [4, tmax) root1: [1, 4) Insertion (cont.) • To handle overflow, copy records with end=tmax to a new page. • Strong overflow: limit the number of records in a new page. The MVSB-tree

Point Query (k , t ) • Follows a single path: the nodes containing (k , t ). • Aggregates the values found in this path. The MVSB-tree

Point Query (k , t ) • Follows a single path: the nodes containing (k , t ). • Aggregates the values found in this path. • E.g.: PointQuery(23, 7) = 5+2 = 7. The MVSB-tree

Efficiency • Theorem: with 2 MVSBT indices, we achieve: • RTA query: O(logbn); • Update: O(logbK); • Space: O( * logbK). • n = number of updates; • K= number of different keys; • b = page capacity (in records). The MVSB-tree

Performance Results • Sun Enterprize 250 Server; two 300 Mhz Ultra SPARC-II processors; Solaris 2.8; GNU C++; • Datasets: created using the TimeIT [KS98] software and transformed to add record keys. • Each dataset has a million records (10k unique keys; on average 100 intervals per key). • Compare against the straightforward approach using the MVBT [BGO+96] as temporal index. Performance Results

Index Sizes Performance Results

Query Speedup • Query time is averaged over 100 queries of the same query rectangle size. Performance Results

Conclusions • We addressed the range-temporal aggregation (RTA) problem; • New index structure (MVSB-tree) for incrementally maintaining and efficiently computing RTAs; • Query time reduced from O(n) to O(logbn) with small space overhead; • Open problems: • Min/Max range-temporal aggregation; • Valid-time environment; • Multi-dimensional aggregation over objects with extents.

Efficient Computation of Temporal Aggregates with Range Predicates

Efficient Computation of Temporal Aggregates with Range Predicates

Presentation Transcript

Efficient Computation of Trade-Off Skylines

On spatial-temporal characters of Computation

Detecting Temporal Logic Predicates on Distributed Computations

AGGREGATES There are two types of aggregates Coarse Aggregates Fine Aggregates

Energy Efficient Designs with Wide Dynamic Range

Efficient Computation of Reverse Skyline Queries

On the limitations of efficient computation

Efficient distributed computation of human mobility aggregates through user mobility profiles

Efficient computation of photohadronic interactions

Parallel Computation of Knowledge-Based Temporal Abstraction

Parallel Computation of Knowledge-Based Temporal Abstractions

Range-Efficient Computation of F 0 over Massive Data Streams

Efficient computation of diverse query results

Efficient processing of path query with not-predicates on XML data

Approximate Computation of Multidimensional Aggregates of Sparse Data Using Wavelets

Spatio-Temporal Predicates

Efficient Computation of Diverse Query Results

Efficient Computation of Minimum Recombination With Genotypes (Not Haplotypes)

Efficient Computation of Substring Equivalence Classes with Suffix Arrays

The Limits of Efficient Computation

An Efficient Cure with a Range of Anxiety Disorders