1 / 22

VizTree

VizTree. Huyen Dao and Chris Ackermann. Which is which. Introducing example. These are two random bit sequences. One sequence is generated by a computer and the other one by humans.

edward
Download Presentation

VizTree

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. VizTree Huyen Dao and Chris Ackermann

  2. Which is which Introducing example These are two random bit sequences. One sequence is generated by a computer and the other one by humans. 01011001011110011010010000100010100110110101110000101010111011111000110110110111111010011001001000110100011110011011010001011110001011010011011001101000000100110001001110000011101001100101100001010010 10001000101001000101010100001010100010101110111101011010010111010010101001110101010100101001010101110101010010101010110101010010110010111011110100011100001010000100111010100011100001010101100101110101

  3. Introducing example HUMAN 01011001011110011010010000100010100110110101110000101010111011111000110110110111111010011001001000110100011110011011010001011110001011010011011001101000000100110001001110000011101001100101100001010010 10001000101001000101010100001010100010101110111101011010010111010010101001110101010100101001010101110101010010101010110101010010110010111011110100011100001010000100111010100011100001010101100101110101 0 1 1 0 1 0 Not really random! Subjects tried to create Randomness by alternating.

  4. 1 1 0 1 1 0 0 1 1 0 0 1 0 0 What does VizTree do? • Analysis of time series data. • Illustrates motifs, and anomalies with ‘Subsequence Trees’ Length of subsequence = 3

  5. 0 1 0 Creating a Subsequence Tree 0 1 0 1 1 0 0 1 0 1 1 1 1 0 0 1 1 0 1 … 1 1 0 1 1 0 0 1 1 0 0 1 0 0

  6. 1 0 1 Creating a Subsequence Tree 2 0 1 0 1 1 0 0 1 0 1 1 1 1 0 0 1 1 0 1 … 1 1 0 1 1 0 0 1 1 0 0 1 0 0

  7. Discretizing • Only discrete data can be visualized. • Most data is continuous and needs to be converted. • Several steps to convert continuous data into tree structure • PAC • SAX

  8. 0 4.8 9.6 14.4 19.2 24 PAC A. Piecewise aggregate approximation (PAC) of time series: • Divide time series into n segments of equal length • Assign each a coefficient = average of values in that segment

  9. a b c 0 4.8 9.6 14.4 19.2 24 SAX • Create an alphabet on the distribution space of time series: • Divide range into x regions: segment has equal probability of falling into any one • Assign symbols to regions from top-to-bottom • Assign each segment of the PAA a symbol based on in which segment resides. Time series becomes a string: ‘b c b a b’

  10. a a b a a b b a a b b a b b Tree of continuous data • Instead of Boolean values, the branches of represent the symbols, • the top branch represents a • the bottom branch represents the last letter • Larger alphabet means more branches window size = 3 # of symbols = 3 Alphabet size = 2

  11. 0 4.8 9.6 14.4 19.2 24 Sliding window length • Specifies the time frame of the pattern that is being matched. Appropriate length can be determined by using the ruler length = 12 length = 24

  12. 0 4.8 9.6 14.4 19.2 24 # of symbols per window • Specifies how many discrete windows are fit into the given time window • Depends on sliding window size and frequency of value changes length = 24 ‘b c b a b’ ‘c a’

  13. a b c a b 0 4.8 9.6 14.4 19.2 24 Alphabet size • Larger alphabet: • Discrete representation is more fine grained. • Tree is difficult to read. ‘b c b a b’ ‘b b a a a’

  14. Parameters • Length of the sliding window • For focusing on certain intervals • # of symbols per window • The size of the pattern being analyzed • Alphabet size • The number of discrete values.

  15. Time Series Data Mining Tasks Subsequence matching Time series motif discovery Anomaly Detection

  16. Advanced settings • Cull trivial matches: • Consecutive strings that are the same: ‘dcb’, ‘dcb’ • Consecutive strings where no pair of symbols are more than a symbol apart: ‘dcb’, ‘cba’ • Chunking instead of actually sliding the window

  17. VizTree and Data Mining Tasks Subsequence Matching • Do not have to know exact pattern for query: give concise description of pattern. • Selecting branch shows all subsequence matches and highlights occurrences in time series.

  18. VizTree and Data Mining Tasks Time Series Motif Discovery • Motif – “previously unknown, frequently occurring patterns” • Discovery simple: frequently occurring patterns => thick branches • Traditional motif discovery algorithms slow • VizTree builds frequency into visualization so quickly find motifs • Highlights where motifs occur Lin et al. 2005

  19. VizTree and Data Mining Tasks Anomaly Discovery • Simple cases: observing very thin branches in subsequence trees. • More complex cases: Diff Trees. • Thick branches of vivid green or blue indicate anomalies in second time series. Lin et al. 2005

  20. Diff Tree • Contain analysis of two time series, A and B • Shows frequency of patterns in B in relation to frequency in A • Two values used in creation: • Support: is a pattern overrepresented (more frequently occurring) in B or underrepresented (less frequently occurring) • Confidence: how prevalent is the pattern in A • Support => Thickness of branches • Confidence => Color intensity of branches • Also: Surprisingness: ranks most anomalous patterns

  21. What is great about VizTree? • Simple graphical representation: • Straightforward • Powerful: Can show lots of different subsequences in a simple tree structure • Simple and easy to understand description of subsequences through strings. • Quick analysis • The subsequence trees and diff trees renders quickly • Since the relevant encoded in tree: can spot motifs and anomalies quickly

  22. Weaknesses • It is difficult to find the right combination of parameters • An idea would be to superimpose the effect of parameters on original graph (discrete values, sliding window length etc.) • Zooming is rather inconvenient • This could be solved by using another zooming technique, such as fish-eye. • Usability could be improved • Would be informative to see how the alphabet is define over the dataset. • The subtree view does not indicate where in the main tree it is so can lose track • The time series scales are not adjustable so can be hard to place where subsequences are in terms of time • Nodes are hard to select

More Related