190 likes | 423 Views
Spatiotemporal Stream Mining Applied to Seismic+ Data. Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275 USA mhd@engr.smu.edu. Outline. Work in Progress! Input/Feedback Needed!. CTBTO Data CTBTO Modeling Requirements EMM. CTBTO Data.
E N D
Spatiotemporal Stream Mining Applied to Seismic+ Data Margaret H. Dunham CSE Department Southern Methodist University Dallas, Texas 75275 USA mhd@engr.smu.edu CTBTO Data Mining/Data Fusion Workshop
Outline Work in Progress! Input/Feedback Needed! CTBTO Data CTBTO Modeling Requirements EMM CTBTO Data Mining/Data Fusion Workshop
CTBTO Data • Diverse – Seismic, Hydroacoustic, Infrasound, Radionuclide • Spatial (source and sensor) • Temporal • STREAM Data As a Data Miner I must first understand your DATA CTBTO Data Mining/Data Fusion Workshop
From Sensors to Streams Stream Data - Data captured and sent by a set of sensors Real-time sequence of encoded signals which contain desired information. Continuous, ordered (implicitly by arrival time or explicitly by timestamp or by geographic coordinates) sequence of items Stream data is infinite - the data keeps coming.
CTBTO & Data Mining Data Mining techniques must be defined based on your data and applications Can’t use predefined fixed models and prediction/classification techniques. Must not redo massive amounts of algorithms already created. CTBTO Data Mining/Data Fusion Workshop
CTBTO + DM Requirements • Model: • Handle different data types (seismic, hydroacoustic, etc.) • Spatial + Temporal (Spatiotemporal) • Hierarchical • Scalable • Online • Dynamic • Anomaly Detection: • Not just specific wave type or data values • Relationships between arrival of waves/data • Combined values of data from all sensors CTBTO Data Mining/Data Fusion Workshop
EMM (Extensible Markov Model) • Time Varying Discrete First Order Markov Model • Nodes are clusters of real world states. • Overlap of learning and validation phases • Learning: • Transition probabilities between nodes • Node labels (centroid or medoidof cluster) • Nodes are added and removed as data arrives • Applications: prediction, anomaly detection CTBTO Data Mining/Data Fusion Workshop
Research Objectives • Apply proven spatiotemporal modeling technique to seismic data • Construct EMM to model sensor data • Local EMM at location or area • Hierarchical EMM to summarize lower level models • Represent all data in one vector of values • EMM learns normal behavior • Develop new similarity metrics to include all sensor data types (Fusion) • Apply anomaly detection algorithms CTBTO Data Mining/Data Fusion Workshop
2/3 1/2 N3 2/3 N1 2/3 1/2 N3 1/3 1/1 N2 N1 N1 1/2 2/3 1/3 1/1 N2 1/3 N2 N1 1/3 N2 N3 1/1 1 N1 1/1 2/2 1/1 N1 EMM Creation/Learning <18,10,3,3,1,0,0> <17,10,2,3,1,0,0> <16,9,2,3,1,0,0> <14,8,2,3,1,0,0> <14,8,2,3,0,0,0> <18,10,3,3,1,1,0.>
Input Data Representation Vector of sensor values (numeric) at precise time points or aggregated over time intervals. Need not come from same sensor types. Similarity/distance between vectors used to determine creation of new nodes in EMM. CTBTO Data Mining/Data Fusion Workshop
Anomaly Detection with EMM Detected unusual weekend traffic pattern • Objective: Detect rare (unusual, surprising) events • Advantages: • Dynamically learns what is normal • Based on this learning, can predict what is not normal • Do not have to a priori indicate normal behavior • Applications: • Network Intrusion • Data: IP traffic data, Automobile traffic data • Seismic: • Unusual Seismic Events • Automatically Filter out normal events Weekdays Weekend Minnesota DOT Traffic Data
EMM with Seismic Data Input – Wave arrivals (all or one per sensor) Identify states and changes of states in seismic data Wave form would first have to be converted into a series of vectors representing the activity at various points in time. Initial Testing with RDG data Use amplitude, period, and wave type CTBTO Data Mining/Data Fusion Workshop
New Distance Measure • Data = <amplitude, period, wave type> • Different wave type = 100% difference • For events of same wave type: • 50% weight given to the difference in amplitude. • 50% weight given to the difference in period. • If the distance is greater than the threshold, a state change is required. • amplitude = | amplitudenew – amplitudeaverage | / amplitudeaverage • period = | periodnew – periodaverage | / periodaverage CTBTO Data Mining/Data Fusion Workshop
EMM with Seismic Data States 1, 2, and 3 correspond to Noise, Wave A, and Wave B respectively. CTBTO Data Mining/Data Fusion Workshop
Preliminary Testing RDG data February 1, 1981 – 6 earthquakes Find transition times close to known earthquakes 9 total nodes 652 total transitions Found all quakes CTBTO Data Mining/Data Fusion Workshop
. EMM Nodes CTBTO Data Mining/Data Fusion Workshop
Hierarchical EMM CTBTO Data Mining/Data Fusion Workshop
Now What? DATA NEEDED Interest DM COMMUNITY NOISE MAY NOT BE BAD KDD CUP CTBTO Data Mining/Data Fusion Workshop
References Zhigang Li and Margaret H. Dunham, “ STIFF: A Forecasting Framework for Spatio-Temporal Data”, Proceedings of the First International Workshop on Knowledge Discovery in Multimedia and Complex Data, May 2002, pp 1-9. Zhigang Li, Liangang Liu, and Margaret H. Dunham, “ Considering Correlation Between Variables to Improve Spatiotemporal Forecasting,” Proceedings of the PAKDD Conference, May 2003, pp 519-531. Jie Huang, Yu Meng, and Margaret H. Dunham, “Extensible Markov Model,” Proceedings IEEE ICDM Conference, November 2004, pp 371-374. Yu Meng and Margaret H. Dunham, “Efficient Mining of Emerging Events in a Dynamic Spatiotemporal,” Proceedings of the IEEE PAKDD Conference, April 2006, Singapore. (Also in Lecture Notes in Computer Science, Vol 3918, 2006, Springer Berlin/Heidelberg, pp 750-754.) Yu Meng and Margaret H. Dunham, “Mining Developing Trends of Dynamic Spatiotemporal Data Streams,” Journal of Computers, Vol 1, No 3, June 2006, pp 43-50. Charlie Isaksson, Yu Meng, and Margaret H. Dunham, “Risk Leveling of Network Traffic Anomalies,” International Journal of Computer Science and Network Security, Vol 6, No 6, June 2006, pp 258-265. Margaret H. Dunham and Vijay Kumar, “Stream Hierarchy Data Mining for Sensor Data,” Innovations and Real-Time Applications of Distributed Sensor Networks (DSN) Symposium, November 26, 2007, Shreveport Louisiana. CTBTO Data Mining/Data Fusion Workshop