80 likes | 90 Views
Explore techniques such as pattern detection, pattern-based similarity search, and dynamic time warping in temporal data analysis, with a focus on telecom alarm databases and handling "big data" challenges.
E N D
CSE 8392 SPRING 1999DATA MINING: ADVANCED TOPICSTemporal Data Professor Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Dallas, Texas 75275 (214) 768-3087 fax: (214) 768-3085 email: mhd@seas.smu.edu www: http://www.seas.smu.edu/~mhd April 1999
TEMPORAL DATA OVERVIEW (Fayyad,Ch9) • Databases historically have contained non-temporal data • Records represent attributes at a single point in time (snapshot) • Analysis of temporal (time varying) records presents a unique set of challenges and possibilities • Transaction Time • Valid Time • Other time interpretation??? • Examples • NASA satellites: 1 TB of data per day • Patient monitoring • Financial market monitoring
Pattern Detection in Temporal Data • Detection of patterns is fuzzy • No exact match • Approximation required • Humans are good at detecting such patterns, but machines are not • Related fields of research offer helpful techniques • Spelling Correctors • Statistics • Signal processing • Genetic algorithms • Speech recognition
Pattern-Based Similarity Search (R[2]) • Identifying companies with similar growth patterns • Finding similar weather patterns • Sequence matching for temporal databases • Whole Matching - target and sequence have same length • Subsequence Matching - target may be shorter than sequences in database. Must match starting point. • Similar to pattern matching in texts • Approach differences: • Technique used • Similarity measure • Use of scaling or translation • Optimization (reduce search space or number of comparisons)
Pattern Matching Similarity Measures (R[2]]) • Problem: Given Target X=<x1, x2, … , xn> and Sequence Y=<y1, y2, … , yN>, find D(X,Y). • May assume n=N or n<N and look at all subsequences of length n. • Euclidean Distance - Form. 7.1 p 878 • Linear Correlation - Form. 7.2 p 878 • Discrete Fourier Transform - Form. 7.3 p 878
Dynamic Time Warping (DTW) (Fayyad,Ch9) • Uses dynamic time warping to investigate time series data • Involves matrix calculations • Requires distance measurements |x - y| or (x - y)2 • Warping Path determined based on minimum cumulative distances found • DTW imposed restrictions (p234) Monotonic; Continuous; Windowed; Slope; Boundary • Example - p 235 • Normalization • Convert raw scores (distances) to determine relative scores
Telecom Alarm Databases • Dissertation by Mika Klemettinen Univ of Helsinki, January 22, 1999 • Alarm - message generated by telecom network entity describing a problem. • Uses management network • Correlation - Combining information from multiple alarms to interpret together. (p 16,17) • Telecommunications Alarm Sequence Analyzer (TASA) - Recognize pattern defined by sequence of alarm messages. Based on pattern an action is taken. Window is associated with pattern. (p 20) • Episode Rule (p29) - Generalization of Association Rule
Summary • Pattern prediction with temporal data is challenging • Generalization of nontemporal DM: Classification, Prediction, Association Rules • Complicated by temporal relationship • “Big data” poses significant challenges • Must sift data, then • Detect meaningful patterns • Issues • Testing and validation • Prior trends may not be indicative of future patterns