VizTree

VizTree Huyen Dao and Chris Ackermann

Which is which Introducing example These are two random bit sequences. One sequence is generated by a computer and the other one by humans. 01011001011110011010010000100010100110110101110000101010111011111000110110110111111010011001001000110100011110011011010001011110001011010011011001101000000100110001001110000011101001100101100001010010 10001000101001000101010100001010100010101110111101011010010111010010101001110101010100101001010101110101010010101010110101010010110010111011110100011100001010000100111010100011100001010101100101110101

Introducing example HUMAN 01011001011110011010010000100010100110110101110000101010111011111000110110110111111010011001001000110100011110011011010001011110001011010011011001101000000100110001001110000011101001100101100001010010 10001000101001000101010100001010100010101110111101011010010111010010101001110101010100101001010101110101010010101010110101010010110010111011110100011100001010000100111010100011100001010101100101110101 0 1 1 0 1 0 Not really random! Subjects tried to create Randomness by alternating.

1 1 0 1 1 0 0 1 1 0 0 1 0 0 What does VizTree do? • Analysis of time series data. • Illustrates motifs, and anomalies with ‘Subsequence Trees’ Length of subsequence = 3

0 1 0 Creating a Subsequence Tree 0 1 0 1 1 0 0 1 0 1 1 1 1 0 0 1 1 0 1 … 1 1 0 1 1 0 0 1 1 0 0 1 0 0

1 0 1 Creating a Subsequence Tree 2 0 1 0 1 1 0 0 1 0 1 1 1 1 0 0 1 1 0 1 … 1 1 0 1 1 0 0 1 1 0 0 1 0 0

Discretizing • Only discrete data can be visualized. • Most data is continuous and needs to be converted. • Several steps to convert continuous data into tree structure • PAC • SAX

0 4.8 9.6 14.4 19.2 24 PAC A. Piecewise aggregate approximation (PAC) of time series: • Divide time series into n segments of equal length • Assign each a coefficient = average of values in that segment

a b c 0 4.8 9.6 14.4 19.2 24 SAX • Create an alphabet on the distribution space of time series: • Divide range into x regions: segment has equal probability of falling into any one • Assign symbols to regions from top-to-bottom • Assign each segment of the PAA a symbol based on in which segment resides. Time series becomes a string: ‘b c b a b’

a a b a a b b a a b b a b b Tree of continuous data • Instead of Boolean values, the branches of represent the symbols, • the top branch represents a • the bottom branch represents the last letter • Larger alphabet means more branches window size = 3 # of symbols = 3 Alphabet size = 2

0 4.8 9.6 14.4 19.2 24 Sliding window length • Specifies the time frame of the pattern that is being matched. Appropriate length can be determined by using the ruler length = 12 length = 24

0 4.8 9.6 14.4 19.2 24 # of symbols per window • Specifies how many discrete windows are fit into the given time window • Depends on sliding window size and frequency of value changes length = 24 ‘b c b a b’ ‘c a’

a b c a b 0 4.8 9.6 14.4 19.2 24 Alphabet size • Larger alphabet: • Discrete representation is more fine grained. • Tree is difficult to read. ‘b c b a b’ ‘b b a a a’

Parameters • Length of the sliding window • For focusing on certain intervals • # of symbols per window • The size of the pattern being analyzed • Alphabet size • The number of discrete values.

Time Series Data Mining Tasks Subsequence matching Time series motif discovery Anomaly Detection

Advanced settings • Cull trivial matches: • Consecutive strings that are the same: ‘dcb’, ‘dcb’ • Consecutive strings where no pair of symbols are more than a symbol apart: ‘dcb’, ‘cba’ • Chunking instead of actually sliding the window

VizTree and Data Mining Tasks Subsequence Matching • Do not have to know exact pattern for query: give concise description of pattern. • Selecting branch shows all subsequence matches and highlights occurrences in time series.

VizTree and Data Mining Tasks Time Series Motif Discovery • Motif – “previously unknown, frequently occurring patterns” • Discovery simple: frequently occurring patterns => thick branches • Traditional motif discovery algorithms slow • VizTree builds frequency into visualization so quickly find motifs • Highlights where motifs occur Lin et al. 2005

VizTree and Data Mining Tasks Anomaly Discovery • Simple cases: observing very thin branches in subsequence trees. • More complex cases: Diff Trees. • Thick branches of vivid green or blue indicate anomalies in second time series. Lin et al. 2005

Diff Tree • Contain analysis of two time series, A and B • Shows frequency of patterns in B in relation to frequency in A • Two values used in creation: • Support: is a pattern overrepresented (more frequently occurring) in B or underrepresented (less frequently occurring) • Confidence: how prevalent is the pattern in A • Support => Thickness of branches • Confidence => Color intensity of branches • Also: Surprisingness: ranks most anomalous patterns

What is great about VizTree? • Simple graphical representation: • Straightforward • Powerful: Can show lots of different subsequences in a simple tree structure • Simple and easy to understand description of subsequences through strings. • Quick analysis • The subsequence trees and diff trees renders quickly • Since the relevant encoded in tree: can spot motifs and anomalies quickly

Weaknesses • It is difficult to find the right combination of parameters • An idea would be to superimpose the effect of parameters on original graph (discrete values, sliding window length etc.) • Zooming is rather inconvenient • This could be solved by using another zooming technique, such as fish-eye. • Usability could be improved • Would be informative to see how the alphabet is define over the dataset. • The subtree view does not indicate where in the main tree it is so can lose track • The time series scales are not adjustable so can be hard to place where subsequences are in terms of time • Nodes are hard to select

VizTree

VizTree

Presentation Transcript

VizTree