180 likes | 273 Views
Visually Mining and Monitoring Massive Time Series. Lin, J, Keogh, E., Lonardi, S., Lankford, J.P. and Nystrom, D.M. In Proceedings of the 10 th ACM SIGKDD International Converence on Knowledge Discovery and Data Mining, 2004. Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S.
E N D
Visually Mining and Monitoring Massive Time Series Lin, J, Keogh, E., Lonardi, S., Lankford, J.P. and Nystrom, D.M.In Proceedings of the 10th ACM SIGKDD International Converence on Knowledge Discovery and Data Mining, 2004. Amy Karlson V. Shiv Naga Prasad 15 February 2004 CMSC 838S Images courtesy of Jessica Lin and Eamonn Keogh
What are Time Series? • Simply: • Observations of a variable made over time • Typical across a wide variety of domains • Medicine • Physiology • Finance • Microbiology • Meteorology • Surveillance
Motivation: Critical Decision Making • Domains • Spacecraft Launch • Medicine • Research Directions • Mining Archives • Extract rules, patterns, regularities • Visualizing Streams • Novel visualization and interaction for: • Query by content • Motif discovery • Anomaly detection
Some Visual Time Series Systems • Time Searcher • Direct Manipulation Pattern Query • Theme Rivers • Theme strengthover time • Spirals • Periodic Data withknown period Hochheiser and Shniederman dot.com stocks 1999-2002 Havre, Hetzler, Whitney & Nowell InfoVis 2000 Weber et. al
VizTree • Construct a subsequence tree to span the space of subsequences of a given time series. • Use this to collect statistics about the series. • Size of the structure is independent of the length of the series.
VizTree Approach - Overview • Place windows along the time series to obtain subsequences. • Quantize along time and value dimension to obtain sequences of discrete symbols. • Construct a subsequence tree to represent all possible such sequences. • Collect frequencies of traversal of the branches of the subsequence tree. • Use these for motif and anomaly detection, and for comparing time series.
Subsequences Place windows along the time series to obtain subsequences.
Discretization • Subsequences are patterns. • Take windows along time series – length of window ~ length of subsequence. • Discretize the range of data - one symbol for each quantum. • Divide window into segments ~ represent one segment with one symbol.
Symbolic Aggregate approXimation(SAX) Representative symbols Quantization levels Segments One subsequence Discrete version = acdcbdba
Subsequence Tree - example a b • symbols={a,b,c} • #segments per window=2 • Tree spans the space of subsequences. • #Branch factor ~ # symbols (size of alphabet) • Depth ~ # segments per window • Branch thickness ~ freq. of occurrence of subsequence. a c a b b c a c b c
VisTree Tool Demo
Query by Content: Subsequence Matching • Finding known patterns • Chunking • Breaking a time series into individual series • Methods • Time (e.g. power usage) • Shape (e.g. heart beats) • Search Approaches • Exact - Slow • Approximate - Fast • Exploration • Hypothesis Testing --------- VizTree --------------------- VizTree
Motif Discovery • Finding unknown patterns • Not exact matches • VisTree allows exploration at varying levels of precision • E.g., cc** vs. ccac
Anomaly Detection • Finding abnormal patterns. • Use data already seen to identify anomalies • Identified by thin branches
Comparing Series: Diff Tree • Same parameters same tree structure • Compare the test branch frequencies with respect to reference branch frequencies • Blue = underrepresented • Green = overrepresented • Red = equivalent • Thickness = magnitude
Thoughts on VizTree (Vis.) • Most of “discovery” is implicit • Manual search • Parameter setting might be an issue • Automation might help • Tree Visualization • Use of real estate? • Effective? • Intuitive? • Alternatives?
Thoughts on VizTree (HCI) • Primarily a tool to for researchers now • (Also, we might have an outdated version) • Even so, some HCI suggestions: • Indication of how tree detail relates to tree overview • Zoom into a specific area of the time series (rather than zoom+scroll) • Selection in subsequence detail relates to subsequence overview • Unfortunately, least interesting patterns are most easily accessed (branches at root) • “snap to branch” or “snap to intersection” ? • Ability to turn off highlighting (undo)
Summary: Unique Contributions • Fundamental support for aperiodic series • Scalable • Resource requirements do not grow linearly with length series • Rich visual feature set • Global summaries • Diff-trees between multiple series • Local patterns and anomalies