90 likes | 187 Views
Net-Centric Software & Systems Consortium Kick-off Meeting. February 26-27, 2009. Self-Detection of Abnormal Event Sequences. Farokh B. Bastani UT-Dallas ilyen@utdallas.edu. I-Ling Yen UT-Dallas bastani@utdallas.edu. Latifur Khan UT-Dallas lkhan@utdallas.edu.
E N D
Net-Centric Software & Systems Consortium Kick-off Meeting February 26-27, 2009 Self-Detection of Abnormal Event Sequences Farokh B. Bastani UT-Dallas ilyen@utdallas.edu I-Ling Yen UT-Dallas bastani@utdallas.edu Latifur Khan UT-Dallas lkhan@utdallas.edu
Net-Centric Software & Systems Consortium Kick-off Meeting Problem Description • There are numerous types of event-based workflows in net-centric systems • E.g., Call control signal processing, network accesses, access to resources, access to data, etc. • Need for abnormal behavior detection • Event-based workflows may incur software & system faults, operational errors, attacks, fraud, illegitimate manipulations, resulting in abnormal behaviors • If the abnormal behavior can be detected, proactive techniques can be used to mitigate the problem Net-Centric Software & Systems Consortium
Net-Centric Software & Systems Consortium Kick-off Meeting Existing Solutions • Many data mining and machine learning algorithms can be used to classify normal and abnormal events • Bayesian networks, neural networks, decision trees, K-mean, support vector machines (SVM), hidden Markov models, etc. • Problem: Which method to use? • Data set dependent Must explore the best approach for each dataset • Feature extraction from raw data can have significant impact on the prediction quality Must explore various feature extraction models • Problem: How to mine event sequences? • Automata based approach: Known event sequences, cluster them and determine the abnormal ones (no well established clustering techniques) • Episode based approaches: Need to mine the event sequences first, and then cluster them and determine the abnormal ones (has well established episode mining techniques, but not much research on clustering) Net-Centric Software & Systems Consortium
Net-Centric Software & Systems Consortium Kick-off Meeting Our Solution • Multivariate automata and episode mining • Unknown event sequence: Use episode mining • Automata merging for known or mined event sequences • Multiple variables result in a huge state space • Use dominance parameters and weights to merge states • Develop techniques to merge automata efficiently (hashing, clustering) • Identify abnormal event sequences • Use clustering techniques to identify outliers • Need effective clustering techniques • Need to handle event sequences with different lengths • Need to integrate inter-event parameters in the clustering process • Manual help to identify actual faulty event sequences offline 9/10/2014 Net-Centric Software & Systems Consortium
Faulty data injector feedback Current Data sets prediction Classifier All data sets Classifier Analysis Net-Centric Software & Systems Consortium Kick-off Meeting Our Solution (Cont.) • Develop a feedback based self-improving mechanism • When the prediction error exceeds a threshold, adjust the algorithm • Use multiple algorithms to provide fine tuning • E.g., use weighted decision from multiple algorithms • Fine tune feature set extractions and use dimension reduction mechanisms to obtain faster and better results • Off-line analysis to achieve improvements and feed the improvements to the online model • Adjusted algorithm, revision of features, addition of inter-ES features • Develop fault-injection techniques to induce self-learning • Establish the faulty pattern library from data that have been learned • Inject faulty patterns to train the mining process and to measure the effects (use faulty pattern library and develop fault generation algorithms) 9/10/2014 Net-Centric Software & Systems Consortium 5
Net-Centric Software & Systems Consortium Kick-off Meeting Experimental Plan • Develop techniques for abnormal event sequence detection • Develop automata generation and merging techniques • Study the effects of various clustering algorithms on various event sequence datasets • Consider signal flow data from Cisco • Consider network-based intrusion detection datasets • Consider human interoperations (if possible) • Develop the models and methods for dynamic adaptation • Algorithmic adaptation and feature set extraction adaptation Net-Centric Software & Systems Consortium
Net-Centric Software & Systems Consortium Kick-off Meeting Industry Member Benefits • The abnormal behavior prediction approach can be applied to many net-centric applications that are event-based and workflow-oriented • Call control signal processing • Resource and database access control • System health monitoring for real-time embedded systems, including avionic systems, space-based systems, etc. • Application-dependent workflows, e.g. monitoring the behavior of drivers on roads • Need real data and related knowledge from industry for analysis, model construction, effectiveness analysis Net-Centric Software & Systems Consortium
Net-Centric Software & Systems Consortium Kick-off Meeting Deliverables and Budget • First year, $30K: Develop the basic multivariate automata mining and abnormal sequence detection techniques • First quarter: Work with industrial partner to understand the data and develop pre-processor to extract event patterns • Second and third quarter: Develop the automata merging and automata clustering techniques • Fourth quarter: Apply the techniques to the dataset and validate the approaches • Second year, $30K: Develop dynamic learning techniques • Develop the feedback learning approach • Develop tools to efficiently achieve self learning Net-Centric Software & Systems Consortium
Topic/project/effort description Net-Centric Software & Systems Consortium Kick-off Meeting With dynamic learning: More accurate abnormal event prediction results and less false alarms. Automated Feature extraction: Obtain features that can optimize the prediction effectiveness. Many data clustering algorithms can be used for abnormal event detection. But they do not self adapt and data features have to be identified preliminarily • MAIN ACHIEVEMENT: • Applied data clustering algorithms to various data sets to study their effectiveness. The experiments show that Support Vector Machine yields the best results for 90% of the data sets. • Developed improved SVM algorithm to further improve data clustering outcomes. • Developed methods for clustering sparse data sets. • HOW IT WORKS: • Dynamic Learning: Develop a feedback based self-improving mechanism to improve clustering algorithm on-the-fly based on a small set of data and verify the improvement off-line on a large volume of historical data • Automated feature extraction: Build workflow and event model to allow automatic extraction of data features, including event characteristics, inter-event effects, etc. Try to improve the precision of abnormality prediction by improvement on extracted features. • ASSUMPTIONS AND LIMITATIONS: • Availability of data STATUS QUO QUANTITATIVE IMPACT Comparison of Prediction Accuracy Key objectives: • Dynamic learning • Adaptive feature extraction Develop an abnormal event detection algorithm that can dynamically adapt through learning and can automatically extract the best features for optimal prediction END-OF-PHASE GOAL NEW INSIGHTS Apply the technique to Cisco signal flow data and for network intrusion detection Can be used for abnormal event sequence detection for many event based applications. Net-Centric Software & Systems Consortium