320 likes | 485 Views
Online Interval Skyline Queries on Time Series. Bin Jiang, Jian Pei. Outline. Problem Definition An On-the-fly Method Interval Skyline Query Answering Algorithm Online Interval Skyline Query Algorithm Radix Priority Search Tree A View-Materialization Method
E N D
Online Interval Skyline Queries on Time Series Bin Jiang, Jian Pei
Outline • Problem Definition • An On-the-fly Method • Interval Skyline Query Answering Algorithm • Online Interval Skyline Query Algorithm • Radix Priority Search Tree • A View-Materialization Method • Non-redundant skyline time series---NRSky[i:j] • Experiments
Problem Definition • Notions • Time Series: A time series s consists of a set of ( value, timestamp) pairs.Here we denote the value of s at timestamp I by s[i], and s as a sequence of values s[1],s[2],… • Time Interval: a range in time, denoted as [i : j]. We write if ; if . Some Notions in This Paper
Problem Definition • Interval Skyline • Given a set S of time series and interval[i:j], the interval skyline is the set of time series that are not dominated by any other time series in [i:j], denoted by Suppose S={S1, S2, S3} S1 and S2 are in Sky[16:22], while S3 is doninated by S2. S2 S1 S3
Problem Definition • Interval Skyline Property 1:If there exist timestamps k1,…,kl(i≤k1<…<kl≤j) such that and s is the only such a time series, then time series is in .
Problem Definition • Problem Definition • Given a set of time series S such that each time series is in the base interval ,we want to maintain a data structure D such that any interval skyline queries in interval can be answered efficiently using D. • Methods • An On-The-Fly Method • Original Interval Skyline Query Algorithm • Online Interval Skyline Query Algorithm • A View-Materialization Method
Outline • Problem Definition • An On-the-fly Method • Interval Skyline Query Answering Algorithm • Online Interval Skyline Query Algorithm • Radix Priority Search Tree • A View-Materialization Method • Non-redundant skyline time series---NRSky[i:j] • Experiments
An Interval Skyline Query Algorithm • Idea Using the maximum value and minimum value of the time series, we can determine the domination of some time series without checking the details.
An Interval Skyline Query Algorithm • Algorithm • Set current Skyline Set Sky is null; • Sort the time series in a list L in the descending order of their maximum value; • Set the maximum value of the minimum value of the time series in Sky • For each time series s that satisfies in L, determine whether it can dominate or be dominated by time series in Sky; If it can not be dominated: • add it into Sky ; • delete its dominance in Sky ; • update ; • Return Sky;
An Interval Skyline Query Algorithm • Example Goal: compute the skyline in interval [2:3] Steps: 1. s2->Sky, maxmin =1 2. s3->Sky, maxmin =2 3. s5->Sky, maxmin =4 4. s5->s1, s1 is discarded, maxmin =4 5. s4.min=3<4=maxmin, s4 is discarded. Return Sky={s2,s3,s5}
An Interval Skyline Query Algorithm • Disadvantage Checking the max value for each time series and the min[i:j] for the query interval [i:j] is costly. • Improvement Idea • Utilize Radix Priority Search Tree to maintain the min[i:j] • Use a sketch to keep the max value for each time series
Online Interval Skyline Query Algorithm • Radix Priority Search Tree Radix Priority Search Tree is a two-dimensional data structure, a hybrid of a heap on one dimension and a binary search tree on the other dimension. • Advantages: • Insertion in O(h) • Deletion in O(h) • Query in O(h) • h: the height of the tree
Online Interval Skyline Query Algorithm • Radix Priority Search Tree • Build • Use the timestamps as the binary tree dimension X and the data value as the heap dimension Y; • Map W into a fixed domain of X, {0,1,...,w-1}; • The height of the tree is O(logw) • Update → One insertion s[ ] One deletion s[ ] : the most recent timestamp
Maintain max values Using Sketches • Sketches • A pair (v,t) is maintained if no other pair (v1,t1) such that v1>v, t1>t; • These pairs form the skyline of points in the interval; • The expected number of points in the skyline is O(logw); • With the sketches, finding the maximum value in W costs O(1) time ; W=[1,3] Sketches : (4,1),(3,2),(2,3) W=[1,4] Sketches : (5,4)
Online Interval Skyline Query Algorithm • Complexity • Space • Radix priority search tree O(w) • Sketch of the max values O(logw) Total: O(nw) • Time • Radix priority search tree O(logw) • Sketch of the max values O(logw) Total: O(nlogw)
Outline • Problem Definition • An On-the-fly Method • Interval Skyline Query Answering Algorithm • Online Interval Skyline Query Algorithm • Radix Priority Search Tree • A View-Materialization Method • Non-redundant skyline time series---NRSky[i:j] • Experiments
A View-Materialization Method • Non-redundant interval skylines A time series s is called a non-redundant skyline time series in interval [i:j] if • S is in the skyline in interval[i:j] • S is not in the skyline in any subinterval[i׳:j׳] [i:j] It can be proved by pigeonhole principle, if there are more than w skyline intervals, at least two of them will share the same starting timestamps, then one of them is not a minimum skyline interval.
A View-Materialization Method • Idea Suppose all non-redundant interval skylines are materialized, we can union all these skylines over all intervals in [i:j] and remove those fail Lemma 2. • Algorithm
A View-Materialization Method • Example W= [2:4] Goal: compute the interval skyline in [3:4] Steps: 1. s3->Sky 2. s4->Sky 3. s1->Sky(s2 is dominated by s1) Return Sky={s1,s3,s4} How to maintain the non-redundant skylines ?
Maintain Non-Redundant Interval Skylines • Step1 • Use the on-the-fly algorithm to obtain the interval skyline in the new interval W׳. • Find possible false negatives .
Maintain Non-Redundant Interval Skylines • Step2-Shared Divide-and-Conquer Algorithm • This algorithm is an extension of the divide-and conquer algorithm(DC). • In SDC, a space is defined as a time interval. Each timestamp represents a dimension. • The related spaces(intervals) are organized as a path, eg. [j:j],[j-1,j],...,[i,j](i<j).
Divide-and-Conquer Algorithm Merge Step Divide Step S12 S22 B B S1 S2 B P4 P4 P3 P3 P3 P1 P1 P1 mB P5 P5 P5 P2 P2 P2 S11 S21 mA mA A A A
SDC Algorithm • Comparisons • Results
Maintain Non-Redundant Interval Skylines • Step3-Remove “redundant time series”
Outline • Problem Definition • An On-the-fly Method • Interval Skyline Query Answering Algorithm • Online Interval Skyline Query Algorithm • Radix Priority Search Tree • A View-Materialization Method • Non-redundant skyline time series---NRSky[i:j] • Experiments
Experiments • Parameters
Experiments • Synthetic Data Sets • Data Sets Properties • Query Efficiency
Experiments • Synthetic Data Sets • Update Efficiency • Space Cost
Experiments • Stock Data Sets • Query Time