160 likes | 281 Views
Mining Event Periodicity from Incomplete Observations. Zhenhui (Jessie) Li*, Jingjing Wang, Jiawei Han University of Illinois at Urbana-Champaign *Now at Penn State University. KDD 2012 Beijing, China. Prologue: Detect Periodicity in Movements [Li et al., KDD’10].
E N D
Mining Event Periodicity from Incomplete Observations Zhenhui (Jessie) Li*, Jingjing Wang, Jiawei Han University of Illinois at Urbana-Champaign *Now at Penn State University KDD 2012 Beijing, China Zhenhui Jessie Li
Prologue: Detect Periodicity in Movements [Li et al., KDD’10] Problem: What is the periodicity of the movement? Bee example: 8 hours in hive 16 hours fly nearby Zhenhui Jessie Li
Prologue: Detect Periodicity in Movements [Li et al., KDD’10] Observe the in-and-out movements from the reference spot (i.e., hive). Easy to see the periodicity. in hive outside hive time Two-Dimensional Movement One-Dimensional Binary Sequence Zhenhui Jessie Li
Challenge: Periodicity Detection for Incomplete Observations • Two factors result in incomplete observations: inconsistent+ lowsampling rate • Movement data collection in real scenarios: • Human movementsdata collected from cellphones: only report locations when making calls • Animal movement data: 2~3 locations in 3~5 days 2009-05-02 01:03 in 2009-05-03 11:30 out 2009-05-05 03:12 in 2009-05-09 12:03 in 2009-05-10 11:14 out 2009-05-11 02:15 in … in hive Complete Observations Incomplete Observations outside hive Zhenhui Jessie Li
A Challenging Case of Detecting Periodicity for Incomplete Observations Sparse Raw Data 2009-05-02 01:03 in 2009-05-03 11:30 out 2009-05-05 03:12 in 2009-05-09 12:03 in 2009-05-10 11:14 out 2009-05-11 02:15 in … in out in Any periodicity in the above sequence? Zhenhui Jessie Li
Mining Periodicity in Incomplete Data • Event has a period of 20 • Occurrences of the event happen between 20k+5 to 20k+10 Zhenhui Jessie Li
A Probabilistic Model for Periodic Event • Example: • Human daily periodicity visiting office • Period as 24 • Visiting office at 10-11am, 14-16pm Zhenhui Jessie Li
A Probabilistic Model for Periodic Event with Random Observation generate x(62)=0 x(5)=1 Zhenhui Jessie Li
Periodicity Detection by Overlaying Observations True period Wrong period Even distribution Skewed distribution Zhenhui Jessie Li
Relationship between Observation Ratio and Probabilistic Model Pos/Neg Ratio Periodic Distribution Vector Zhenhui Jessie Li
Discrepancy Score to Measure Periodicity If T (=24) is the correct period, the discrepancy score should be largefor certainset of timestamps If T (=23) is the wrong period, the discrepancy scores are likely to be zerofor anyset of timestamps Zhenhui Jessie Li
Periodicity Measure Zhenhui Jessie Li
Performance Comparisons Sampling rate (Ratio of observed points in the complete sequence) Zhenhui Jessie Li
Experiment on Real Human Data One person’s visits to a specific location Sampling rate: 20min Sampling rate: 1hour Zhenhui Jessie Li
Problems with Using Fourier Transform to Detect Periodicity T=4 T=16 Zhenhui Jessie Li
Summary: Mining Event Periodicity from Incomplete Observations • Motivation • Challenge of the real data: incomplete observations (inconsistent + low sampling rate) • Method • Overlay the segments and measure the “skewness” of the distribution • Theoretically prove the correctness of the method • Application • Location prediction • 2nd place in Nokia Mobile Data Challenge 2012 • Periodicity-based feature + SVM Thanks! Questions? Zhenhui Jessie Li