Po-Ching Lin Dept. of CSIE National Chung Cheng University

Casting out Demons:Sanitizing Training Data for Anomaly Sensorsin IEEE Symp. on S&P 2008G. F. Cretu, A. Stavrou, M. E. Locasto, S. J. Stolfo and A. D. Keromytis Po-Ching Lin Dept. of CSIE National Chung Cheng University

Problem definition • Two main approaches to detecting malicious inputs, behavior, network traffic, etc. • Signature matching • Anomaly detection • Challenge of effective anomaly detection of malicious traffic • A highly accurate modeling of normal traffic • Real network traffic is usually polluted or unclean • Using it as the training data can be a problem • How can we sanitize training data for AD sensors? 2

Solution outline • Assumption: • An attack or abnormality appear only in small subsetsin a large training set • The solution: • Test each packet with the micro models using the voting scheme and build a “normal” model. • Data deemed abnormal is used for building an abnormal model. • Abnormal model can be distributed between sites. • A shadow sensor architecture to handle false positives. MM1 MM5 MM4 MM2 MM3 training set noise: attack or non-regularity Mi: micro model i 3

Assumption & micro models • Observation • Over a long period, attacks and abnormalities are a minority class of data. • Deriving the micro models • T = {md1,md2,...,mdN}, where mdi is the micro-dataset starting at time (i − 1) ∗ g, g from 3 to 5 hours • Mi = AD(mdi) : the micro model from mdi 4

Deriving the sanitized training set • Sanitize the training dataset • Test each packet Pj with all the micro-models Mi. • Lj,i = TEST(Pj ,Mi), Lj,i=1 if Pj is abnormal; otherwise 0. • Combine output from the models • SCORE(Pj )= 1/WwiLj,i, where W=wi. • Sanitize the training dataset • Tsan={Pj|SCORE(Pj)V}, Msan = AD(Tsan). • Tabn={Pj|SCORE(Pj)>V}, Msan = AD(Tsan). 5

Evaluation of sanitization • Use two anomaly sensors for evaluation • Anagram and Payl • Experimental corpus • 500 hours of real network traffic • 300 hours of traffic to build the micro-models • the next 100 hours to generate the sanitized model • the remaining 100 hours of data was used for testing • from three different hosts: www, www1, and lists • with cross validation 6

Without sanitizing the training data A: Anagram A-S: Anagram +Snort A-SAN: Anagram + sanitization P: Payl P-SAN: Payl+sanitization V ∈ [0.15, 0.45] 7

Analysis of sanitization parameters • Three parameters for fine-tuning • The granularity of micro-models • The voting algorithm (simple voting vs. weighted voting) • The voting threshold 8

Simple voting vs. Weighted voting 9

Results from the other two hosts 10

Granularity impact 11

Other impacts 12

Latency for different ADs 13

Long lasting training attacks 14

Collaborative Sanitization • Comparing models of abnormality with those generated by other sites • Direct model differencing • Mcross = Msan − {Mabni ∩Msan} • Indirect model differencing • Differencing the sets of packets used to compute the models. • If a packet Pj is considered abnormal by at least one Mabni • features are extracted from the packet for computing the new local abnormal model • used for computing the cross-sanitized model 15

Training attacks 16

Conclusion & limitations in this paper • The capability of anomaly detection in the micro models • Effectiveness of PAYL and Anagram? • The traffic in the evaluation • “normality” is diverse in a real environment • Deriving packets to form the training set is stateless • Attacks can be across packets or even connections 17

Po-Ching Lin Dept. of CSIE National Chung Cheng University

Po-Ching Lin Dept. of CSIE National Chung Cheng University

Presentation Transcript

National Chung Cheng University Dept. Computer Science Information Engineering

National Cheng Kung University Tainan , Taiwan

Po-Chang Lee, M.D.,M.T.L. Professor of Surgery National Cheng Kung University Hospital, Taiwan

Makoto Yamashita (Tokyo Institute of Technology) I-Lin Wang (National Cheng Kung University)

Member : Ming- Ru Chuang Yun- Chieh Lee Wei-Po Lin Po-Wei Cheng

林孟諭 Dept. of Electrical Engineering National Cheng Kung University Tainan, Taiwan, R.O.C

Chi-Kuang Hwang Dept. of Electrical Engineering Chung-Hua University

Aydemir Akin, Ching Ching Wu*, Tsang long Lin

National Cheng Kung University, Tainan, Taiwan

Chung-Cheng Lee Department of Business Administration, Chaoyang University of Technology

Lecture 6: Networking J. S. Chou, P.E., Ph.D. National Chung Cheng University

Shi-Chung Chang Dept. of Electrical Engineering National Taiwan University December 8, 1999

Po-Hsiung Lin Department of Atmospheric Sciences National Taiwan University

Corporate Ethics and Leadership Presented at National Chung Cheng University January 9, 2006

Po-Ching Wu and Ching-Ting Lee

Chung-Cheng Lee Department of Business Administration, Chaoyang University of Technology