670 likes | 1.04k Views
AutoPlait: Automatic Mining of Co-evolving Time Sequences. Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos (CMU). Motivation. Given : co-evolving time-series e .g., MoCap (leg/arm sensors) “Chicken dance”. Time . Motivation.
E N D
AutoPlait: Automatic Mining of Co-evolving Time Sequences Yasuko Matsubara (Kumamoto University) Yasushi Sakurai (Kumamoto University) Christos Faloutsos (CMU) Y. Matsubara et al.
Motivation Given: co-evolving time-series • e.g., MoCap (leg/arm sensors) “Chicken dance” Time Y. Matsubara et al.
Motivation Given: co-evolving time-series • e.g., MoCap (leg/arm sensors) “Chicken dance” Q. Any distinct patterns? Time Q. If yes, how many? Q. What kind? Y. Matsubara et al.
Motivation Challenges: co-evolving sequences • Unknown # of patterns (e.g., beaks) • Different durations … Time Y. Matsubara et al.
Motivation Challenges: co-evolving sequences • Unknown # of patterns (e.g., beaks) • Different durations • Q. Can we summarize it automatically ? … Time Y. Matsubara et al.
Motivation Goal:find patterns that agree with human intuition Time Y. Matsubara et al.
Motivation Goal:find patterns that agree with human intuition AutoPlait: “fully-automatic” mining algorithm Y. Matsubara et al.
Importance of “fully-automatic” No magic numbers! ... because, Big data mining: -> we cannot afford human intervention!! Manual sensitive to the parameter tuning long tuning steps (hours, days, …) Automatic (no magic numbers) - no expert tuning required Y. Matsubara et al.
Outline • Motivation • Problem definition • Compression & summarization • Algorithms • Experiments • AutoPlait at work • Conclusions Y. Matsubara et al.
Problem definition Key concepts • Bundle: • Segment: • Regime: • Segment-membership: given hidden hidden hidden Y. Matsubara et al.
Problem definition • Bundle : set of d co-evolving sequences given Bundle X (d=4) Y. Matsubara et al.
Problem definition • Segment: convert X -> m segments, S hidden s1 s2 s8 s3 s4 s5 s6 s7 Segment (m=8) Y. Matsubara et al.
Problem definition • Regime: segment groups: : model of regime r hidden Regimes (r=4) beaks wings Y. Matsubara et al.
Problem definition • Segment-membership: assignment hidden F = { 2, 4, 1, 3, 2, 4, 1, 3 } Segment-membership (m=8) Y. Matsubara et al.
Problem definition • Given: bundle X Y. Matsubara et al.
Problem definition • Given: bundle X • Find: compact description C of X Y. Matsubara et al.
Problem definition • Given: bundle X • Find: compact description C of X m segments r regimes Segment-membership Y. Matsubara et al.
Outline • Motivation • Problem definition • Compression & summarization • Algorithms • Experiments • AutoPlait at work • Conclusions Y. Matsubara et al.
Main ideas Goal: compact description of X without user intervention!! Challenges: Q1. How to generate ‘informative’ regimes ? Q2. How to decide # of regimes/segments ? Y. Matsubara et al.
Main ideas Goal: compact description of X without user intervention!! Challenges: Q1. How to generate ‘informative’ regimes ? Idea (1): Multi-level chain model Q2. How to decide # of regimes/segments ? Idea (2): Model description cost Y. Matsubara et al.
Idea (1): MLCM: multi-level chain model Q1. How to generate ‘informative’ regimes ? beaks claps Model wings Sequences Regimes Y. Matsubara et al.
Idea (1): MLCM: multi-level chain model Q1. How to generate ‘informative’ regimes ? Idea (1): Multi-level chain model • HMM-based probabilistic model • with “across-regime” transitions beaks claps Model wings Sequences Regimes Y. Matsubara et al.
Idea (1): MLCM: multi-level chain model Details Single HMM parameters across-regime transition prob. r regimes (HMMs) Y. Matsubara et al.
Idea (1): MLCM: multi-level chain model Details state 1 Single HMM parameters across-regime transition prob. r regimes (HMMs) state 2 state 2 state 1 Regime switch Regimes r=2 Regime 1 (k=3) Regime 2 (k=2) state 3 Regime2 “wings” Regime1 “beaks” Y. Matsubara et al.
Idea (2): model description cost Q2. How to decide # of regimes/segments ? Idea (2): Model description cost • Minimize coding cost • find “optimal” # of segments/regimes Y. Matsubara et al.
Idea (2): model description cost Idea: Minimize encoding cost! CostM(M) + Costc(X|M) min ( ) Coding cost Model cost (# of r, m) Good compression Good description Y. Matsubara et al.
Idea (2): model description cost Details Total cost of bundle X, given C Y. Matsubara et al.
Idea (2): model description cost Details Total cost of bundle X, given C # of segments/regimes duration/dimensions segment- membership F Coding cost of X given Θ Model description cost of Θ segment lengths Y. Matsubara et al.
Outline • Motivation • Problem definition • Compression & summarization • Algorithms • Experiments • AutoPlait at work • Conclusions Y. Matsubara et al.
AutoPlait Overview Iteration 0 r=1, m=1 Start! r=1 1 2 3 4 5 6 7 Iteration 0 1 2 3 4 5 6 7 Iteration Y. Matsubara et al.
AutoPlait Overview Iteration 1 r=2, m=4 r=1 Split r=2 0 1 2 3 4 5 6 7 Iteration Y. Matsubara et al.
AutoPlait Overview Iteration 1 r=2, m=4 Iteration 2 r=3, m=6 Split r=1 r=2 r=3 0 1 2 3 4 5 6 7 Iteration Y. Matsubara et al.
AutoPlait Overview Iteration 1 r=2, m=4 Iteration 2 r=3, m=6 r=1 Split r=2 r=4 Iteration 4 r=4, m=8 1 2 3 4 5 6 7 Iteration r=3 0 1 2 3 4 5 6 7 Iteration Y. Matsubara et al.
AutoPlait Algorithms • CutPointSearch Find good cut-points/segments • RegimeSplit Estimate good regime parameters Θ • AutoPlait Search for the best number of regimes (r=2,3,4…) Inner-most loop Inner loop Outer loop Y. Matsubara et al.
1.CutPointSearch Inner-most loop Given: - bundle - parameters of two regimes Find: cut-points of segment sets S1, S2, Y. Matsubara et al.
1.CutPointSearch Inner-most loop Details state 1 state 2 DP algorithm to compute likelihood: state 3 switch?? switch?? state 1 state 2 Y. Matsubara et al.
1.CutPointSearch Inner-most loop Details Theoretical analysis Scalability - It takes time (only single scan) - n: length of X - d: dimension of X - k: # of hidden states in regime Accuracy It guarantees the optimal cut points - (Details in paper) Y. Matsubara et al.
2.RegimeSplit Innerloop Given: - bundle Find: two regimes 1. find cut-points of segment sets: S1, S2 2. estimate parameters of two regimes: Y. Matsubara et al.
2.RegimeSplit Innerloop Details Two-phase iterative approach • Phase 1: (CutPointSearch) • Split segments into two groups : • Phase 2: (BaumWelch) • Update model parameters: Phase 1 Phase 2 Y. Matsubara et al.
3.AutoPlait Outer loop Given: - bundle Find: r regimes (r=2, 3, 4, … ) - i.e., find full parameter set Y. Matsubara et al.
3.AutoPlait Outer loop Split regimes r=2,3,…, as long as cost keeps decreasing - Find appropriate # of regimes Y. Matsubara et al.
3.AutoPlait Outer loop Split regimes r=2,3,…, as long as cost keeps decreasing - Find appropriate # of regimes r=2, m=4 r=4, m=8 r=3, m=6 Y. Matsubara et al.
Outline • Motivation • Problem definition • Compression & summarization • Algorithms • Experiments • AutoPlait at work • Conclusions Y. Matsubara et al.
Experiments We answer the following questions… Q1. Sense-making Can it help us understand the given bundles? Q2. Accuracy How well does it find cut-points and regimes? Q3. Scalability How does it scale in terms of computational time? Y. Matsubara et al.
Q1. Sense-making MoCap data AutoPlait (NO magic numbers) • DynaMMo (Li et al., KDD’09) pHMM (Wang et al., SIGMOD’11) Y. Matsubara et al.
Q1. Sense-making MoCap data AutoPlait (NO magic numbers) Y. Matsubara et al.
Q2. Accuracy (a) Segmentation (b) Clustering Target AutoPlait needs “no magic numbers” Y. Matsubara et al.
Q3. Scalability Wall clock time vs. data size (length) : n AutoPlait scales linearly, i.e., O(n) pHMM DynaMMo AutoPlait Y. Matsubara et al.
Outline • Motivation • Problem definition • Compression & summarization • Algorithms • Experiments • AutoPlait at work • Conclusions Y. Matsubara et al.
AutoPlait at work AutoPlait is capable of various applications, e.g., App1. Model analysis - Web-click sequences App2. Event discovery - Google Trend data Y. Matsubara et al.