100 likes | 180 Views
Temporal DB. by Zbigniew W. Ras. Temporal DB. 1. Valid time – the time in which information is true . 2. Transaction time – time associated with transaction that inserted this record. - [ts, te] is associated with each record in DB [start time, end time] – time range.
E N D
Temporal DB by Zbigniew W. Ras
Temporal DB 1. Valid time – the time in which information is true . 2. Transaction time – time associated with transaction that inserted this record. - [ts, te] is associated with each record in DB [start time, end time] – time range. - continuous view of temporal data /printouts of heartbeats/ - timestamps – equal intervals (time series data). Types of Queries Vd = [tsd, ted] - valid time for tuple d. Intersection query q: tuple d is retrieved if Vd Vq Inclusion query q: tuple d is retrieved if tsq ≤ tsd ≤ ted ≤ teq Containment query: tuple d is retrieved if tsd ≤ tsq ≤ teq ≤ ted Point query : tuple d is retrieved if tsd = tsq = teq = ted (tuple has to be valid at a particular point in time)
Temporal DB • Four types of DB: • Snapshot – no support for temporal attribute • Transaction time only given • Valid time only given • Bitemporal (both transaction and valid time given) • now – current time value. • Modeling Temporal Events: • Markov Model (MM) – directed graph [V,A], V={v1, v2,…, vn} – states, • A = {<i,j>: vi, vj V }- arcs (show transitions between states). • Each arc is labeled with probability pij of transitioning from vi to vj . • At time t, one state is designated as current state vt , and probability of any future transitions depends only on vt . Transition probabilities are learned in training phase.
Temporal DB Time series for attribute A: {<t1, a1>, <t2, a2>,…, <tn, an>}. If points in time are well defined, we take < a1, a2,…, an> vector. If <y1, y2,…, yn> - time series, then its subsequence is called time subseries. Problems: Similarities between different time series. Predicting future value of an attribute.
Temporal DB Trend Analysis. Smoothing – finding moving averages of attribute values (local average in a window is computed). Correlation between two attributes with time series X, Y and means X, Y (Pearson’s coefficient) R = [ (xi - X )(yi - Y )]/sqr[(xi - X )2(yi - Y ) 2]. Value of R close to 1 – attributes strongly correlated. Value of R close to 0 – attributes not correlated at all. Pattern Detection (time series): KMP, Boyer-Moore algorithms. Sequences – ordered list of itemsets {s1, s2 ,…, sn}, where si I. (set of items). Subsequence T = {ti1 ,…, tim} of S if ( j)( k)[( tij ≤ sk) & ij ≤ ij+1)].
Temporal DB Customer-sequence: sequence of itemsets purchased by customer. Example: S = {{A},{C}} – sequence. Support of sequence S – percentage of total customers whose customer-sequence contains S. s(S)= 1/3. Confidence of S T : ratio of the number of customer-sequences that contain both S and T to the number that contain S.
SPADE Algorithm. A,B,C,D – attributes. {A} {B} {C} Temporal DB SPADE Algorithm. A,B,C,D – attributes. {A} {C} {B} {D}
Temporal DB Candidates (2-sequences): {AB} {AD} ({A}, {B}) ({A},{D})
Temporal DB {AC} ({B},{C}) ({A},{C}) {BC}
Temporal DB • Example explaining the next step: • ({B}, {BC}, {DE}), ({AB}, {BC}, {D}) – have the overlap ({B}, {BC}, {D}) and because of this, two candidate sequences are generated: • ({AB}, {BC}, {DE}), ({AB}, {BC}, {D}, {E}). • In our example, from {AB}, ({B}, {C}) we generate: ({AB},{C}) From ({A}, {B}), {BC} we generate: ({A},{BC})