220 likes | 324 Views
Discovering Outlier Filtering Rules from Unlabeled Data. Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu. Outline. Motivation Objective Introduction Main Framework Outlier Detector - SmartSifter Rule Generator – DL-ESC/DL-SC
E N D
Discovering Outlier Filtering Rules from Unlabeled Data Author: Kenji Yamanishi & Jun-ichi Takeuchi Advisor: Dr. Hsu Graduate: Chia- Hsien Wu
Outline • Motivation • Objective • Introduction • Main Framework • Outlier Detector - SmartSifter • Rule Generator – DL-ESC/DL-SC • Experimentation–The network intrusion • Experimental Results • Conclusion • Opinion
Motivation • The problem of the SmartSifter’s accuracy • The SmartSifter cannot find the general pattern of the identified outliers
Objective • Improving the accuracy of SmartSiFter. • Discovering a new pattern that outliers in a specific group may commonly have
Introduction • Developing SmartSifer : It is an on-line outlier detection algorithm • Improving the power of the SamtSifer by combining supervised learning method
Main Framework A New Rule Classifier L
Outlier Detector - SmartSifter ->SS • Using a probabilistic (Gaussian mixture) model->P(x,y) = p(x)p(y|x) • Employing an on-line discounting learning algorithm (SDLE)/(SDEM) to update the model • Giving a score to each datum
Outlier Detector - SmartSifter ->SS (cont.) • SDLE algorithm: An on-line discounting variant of the Laplace law based estimation algorithm • SDEM algorithm: An on-line discounting variant of the incremental EM (Expectation Maximization) algorithm
Outlier Detector - SmartSifter ->SS (cont.) • Outputting a sorted dataset • A highly scored data indicates a high possibility be an outlier
Rule Generator – DL-ESC/DL-SC • Using a stochastic decision list • Employing the principle of minimizing extended stochastic complexity or stochastic complexity
Rule Generator – DL-ESC/DL-SC (cont.) • If ξ makes t1 true, then μ = v1 with probability p1 else if ξ makes t2 true, then μ = v2 with probability p2 ……………………… else μ = vs with probability ps
Experimentation - Network intrusion detection • The purpose of our experiment is to detect without making use of the labels concerning intrusions
Experimentation – Dataset (cont.) • Using the dataset KDD Cup 1999 prepared for network intrusion detection • Using the 13 attributes for DL-ESC • Using four attributes for SmartSifter (service ,duration ,src_bytes ,dst_bytes) • Only “service” is categorical • Y= log(x+0.1),where the base of logarithm is e • Generating five datasets S0,S1,S2,S3,S4
Experimentation – Illustration by an Example (cont.) First Rule – S1 Update Rule – S1 Update Rule – S2
Experimental Results • SS : SmartSifter • R&S: Rule and SmartSifter (This framework) • Using S0 as a training set to construct a filtering rule, each of S1,S2,S3,and S4 is used for test
Conclusion • This new framework has two features • Improving the power of SmartSifter • Helping the user discovers a general pattern
Opinion • Making the detection process more effective and more understandable • This framework can apply to other field