40 likes | 49 Views
This project focuses on developing techniques to detect abnormal event sequences using clustering, local outlier factor, and probabilistic finite state automata. The goal is to handle large datasets from Cisco and enhance the anomaly detection tools for real-time on-the-fly detection of non-crashing faults.
E N D
Self-Detection of Abnormal Event Sequences Project Lead: FarokhBastani, I-Ling Yen, Latifur Khan Date: April 7, 2011
2010/Current Project OverviewSelf-Detection of Abnormal Event Sequences Tasks: Prepare Cisco event sequence data for analysis tools. Develop clustering, local outlier factor, and probabilistic finite state automata (PFSA) based technique for anomaly detection. Apply the techniques on Cisco datasets, analyze and validate the results. Use streaming techniques, parallelization, and prefix tree method to handle large datasets from Cisco. Enhance the anomaly detection tools for on-the-fly anomaly detection. Project Schedule: Task 1: preprocessor Task 1/2/3/4/5: Fine tuning Task 2/3/4: varoius anomaly detection techniques and applying them Task 5: on-the-fly detection A M J J A S O N D J F M A 10 11 Research Goals: Develop a diverse set of anomaly detection techniques for handling datasets with different characteristics. Handling large datasets is still a major issue in current data mining research and it is especially an issue in attributed event sequences. Develop run-time anomaly detection techniques to detect non-crashing faults in deployed systems to mitigate critical failures and ensure software reliability. • Benefits to Industry Partners: • A comprehensive set of techniques and tools to allow best analysis of different datasets. • Real-time on-the-fly anomaly detection capability. • Rapid adaptation of the tools to handle other application specific datasets.
Project Results to Date Significant Finding/Accomplishment Task Complete Task Partially Complete Task Not Started
Major Accomplishments, Discoveries, and Surprises Various Methods for Comparison & integration Real Time Processing Method: Anomaly Detection for Event Sequences Density Automata Clustering Collect Dt+T Build At Apply At–T Collect Dt+2T Build At+T Apply At Collect Dt Build At–T Apply At–2T Prefix-tree based K-Medoid MDI Prefix-tree based LOF Optimized & Added Anomaly Detection Capability t+3T t+2T t+T t Use prefix tree traces as input Developed Tool 2nd closest neighbor Prefix Tree Based Methods Experimental Results:Data Set: 2GB Cisco SDL trace logs (197,628 signal flows with 18 manually injected anomalies). Conducted on a PC with Intel Core i5 Duo 2.67 GHz CPU and 8 GB RAM.