290 likes | 488 Views
Masquerader detection using SVM with String Kernel. Statistical Learning Theory Special Issue in AI 2003 fall December 4, 2003 Jeongseok Seo. Contents. Introduction Related works SVM and String kernel Experiment and Results Future works References. Introduction (1/2). Masquerader.
E N D
Masquerader detection using SVM with String Kernel Statistical Learning Theory Special Issue in AI 2003 fall December 4, 2003 Jeongseok Seo
Contents Introduction Related works SVM and String kernel Experiment and Results Future works References Network Security Group, Computer Science at KAIST
Introduction (1/2) Masquerader Spoofing: Impersonating other users, e.g. by forging the originating e-mail address, or by gaining password access [Denning 1997] Masqueraders are people who use somebody else’s computer account in computer intrusion detection [Schonlau et al. 2001] Five phases of general hacking [Counter hack, 2001 Prentice Hall] 1. scanning system vulnerabilities. 2. gaining access using application or OS vuls. 3. gaining system administrator privilege. 4. preventing detection. 5. maintaining access using trojans, backdoors or rootkits. Network Security Group, Computer Science at KAIST
Introduction (2/2) Types of Attack or Misuse Detected (by percent) Computer Security Issues and Trends (2003 CSI/FBI Computer Crime and Security Survey) Network Security Group, Computer Science at KAIST
Related Works IDES: Intrusion-detection expert system (SRI International) • First research for detecting masquerader since 1983 • Rule-based expert system trained to detect known malicious activity. • User profiles from system log and accounting log • Time of login, login location, session duration, CPU time, • executing program, accessing file, etc. • Too low hit rate and high false alarm rate Network Security Group, Computer Science at KAIST
Related Works Schonlau’s research (2001 Statistical science) • Schonlau et al. conducted experiment with various methods • Hybrid Multi-Step Markov • Bayes 1-Step Markov • Uniqueness • Compression • IPAM (Incremental Probabilistic Action Model) • Sequence match • Experiment using the user command by UNIX acct auditing mechanism • Randomly selected 50 users’ command among about 70 users were recorded • Over a time period of several months Network Security Group, Computer Science at KAIST
Related Works Schonlau’s Experiment Data in detail 231 blocks of masquerader 852 kinds of commands 5000 commands (50 blocks) 10000 commands (100 blocks) Network Security Group, Computer Science at KAIST
Related Works Maxion’s research (2002 ICDSN) • Maxion et al. used Naïve Bayes classification algorithm • Robustness to noise • Fast learning time, O(n) • Successful use in text classification [McCallum et al. 1998] • Experiment Results Network Security Group, Computer Science at KAIST
Related Works Kim’s research (IEICE submitted) • Kim et al. uses SVM based on Common Command Frequency • Selecting kernel and parameters using cross-validation • Poly(d), RBF(r), windowsize, slidingsize, cost-factor(j) • Common Command select like stopwords in text. • Cost = α(Misses) + β(False Alarm) Network Security Group, Computer Science at KAIST
Related Works Kim’s Feature extraction Command frequency and “common command" Network Security Group, Computer Science at KAIST
SVM and String Kernel Properties of SVM • High-Dimensional Input Vector Space • 852 features / 15000 commands and 50 users • Sparse Input Vectors • At least, more than 800 vectors have 0 value • SVM for detecting masquerader will works well (Kim, “Efficient Masquerade Detection using SVM based on Common Command Frequency In Sliding Windows”) (Joachims 2001, “A Statistical Learning Model of Text Classification for SVM”) Network Security Group, Computer Science at KAIST
SVM and String Kernel Features of Masquerader User Commands ≈ Sequences of Program execution [Winskel 1993] Ex) user 1: cd ; ls; tar; gcc user 2: ls ; gcc; cd; tar How to apply sequence information to SVM? Network Security Group, Computer Science at KAIST
SVM and String Kernel Kernelization Extending the Hypothesis Space Referenced by Joachims’ machine learning course material Network Security Group, Computer Science at KAIST
Kn(s,t) i1 in s t j1 jn SVM and String Kernel String Kernel (Lodhi et al. 2002) Direct computation – O(|∑|ⁿ) High-dimensional feature space with the tractability of command sequencing information Denotation Kernel: Average range of common subsequences of length n Network Security Group, Computer Science at KAIST
K’n(sx,t) i1 x s t t[1:k-1] x Kn(sx,t) j1 k:tk=x x K’n(sx,t) K’n(s,t) i1 i1 i1 in x s t s t s t j1 j1 j1 jn Kn(sx,t) i1 in x s t t[1:k-1] x j1 k:tk=x jn SVM and String Kernel Recursive computation of string kernel O(n|s||t|²) Network Security Group, Computer Science at KAIST
K’’n(s,t) i1 in x s t u t[1:k-1] x j1 k:tk=x jn SVM and String Kernel Efficient computation of string kernel O(n|s||t|) Network Security Group, Computer Science at KAIST
SVM and String Kernel Comparison of computation with RBF O(|s|) CFLOAT kernel(KERNEL_PARM *kernel_parm, DOC *a, DOC *b) { … … case 2: /* radial basis function */ return((CFLOAT)exp(-kernel_parm->rbf_gamma*(a->twonorm_sq-2*sprod_ss(a->words,b->words) \ +b->twonorm_sq))); } double sprod_ss(WORD *a, WORD *b) { … … while (ai->wnum && bj->wnum) { if(ai->wnum > bj->wnum) { bj++; } else if (ai->wnum < bj->wnum) { ai++; } else { /* ai-wnum == bj->wnum */ sum+=ai->weight * bj->weight; ai++; bj++; } } return((double)sum); } Network Security Group, Computer Science at KAIST
Experiment and Result Implementation • Modification of SVM-light (V. 5.0 in Mar. 7, 2002) • http://www.cs.cornell.edu/People/tj/ • Fast optimization • Decomposition training methods (working set) • Shrinking heuristic • Caching of kernel evaluations • Use of folding in the linear case • Simple and easy to understand (about 6000 lines) • Sun Enterprise 3500: SparcIII * 8 cpu, 4G memory Network Security Group, Computer Science at KAIST
Experiment and Result Experiment 1 • Training Data • 50 users, each 50 blocks (each 5,000 commands) • 100 commands (1 block) / input string • Parameter: substring length |u|, lamda λ • |u| ≥ 3: 3.7Gbyte real memory for caching of kernel eva. • Current experiment |u| = 2 • Testing Data • 50 users, each 100 blocks (each 10,000 commands) • Total 231 masquerader blocks Network Security Group, Computer Science at KAIST
Experiment and Result Experiment 2 • Training Data • 50 users, each 50 blocks (each 5,000 commands) • 6 strings (window size 50, sliding 10) / 1 block • Parameter: substring length |u|, lamda λ • |u| ≥ 3: 3.7Gbyte real memory for caching of kernel eva. • Current experiment |u| = 2 • Testing Data • 50 users, each 100 blocks (each 10,000 commands) • Total 231 masquerader blocks • 6 strings per 1 block Network Security Group, Computer Science at KAIST
Experiment and Result Results, but still going on experiment. 1st Experiment (SVM with string kernel |u| = 2, according to λ) 1st Experiment (SVM with string kernel |u| = 2, according to λ) (6 strings / 1 block, majority voting) cf. Bayes 1-Step Markov (Model Accuracy 96.58%) Network Security Group, Computer Science at KAIST
Extensible works Possible Features for Good Performance (1/2) • Combining S.K. with other kernels • Polynomial kernel of degree n • K(Φ(s),Φ(t)) = (Φ(s)•Φ(t))+1)ⁿ • Radial Basis Function kernel of gamma g • K(Φ(s),Φ(t)) = exp(-g||Φ(s) - Φ(t)||) Network Security Group, Computer Science at KAIST
Extensible works Possible Features for Good Performance (2/2) • Security expert’s heuristics • They are using special command or sequence of commands • Intruder use normal sequence of commands • Non-self data (-1 labels) without self data E.g. commands of attacker (Oct., 2003) w socklist killall -9 smbd w socklist w wget www.geocities.com/yabada21/psybnc.tgz tar xzvf psybnc.tgz cd psybnc pico psybnc.conf w cat psybnc.pid Network Security Group, Computer Science at KAIST
Other string related kernels More work about string related kernel • Mismatch String Kernel [Leslie 02] • Used for biological motivation • Information about family members’ structure and function • Would like to extend family information to new proteins • Focus on remote homology detection • Can be detected by sequence similarity • Construct kernel matrix using (k,m)-mismatch tree for SVM classification • Hard to apply to general string (many alphabets in ∑), but can be applied with some modification Network Security Group, Computer Science at KAIST
Feature map for SVM based on spectrum of a sequence • The k-spectrum of a sequence is the set of all k-length contiguous subsequences that it contains • Feature map is indexed by all possible k-length subsequences (“k-mers”) from the alphabet of amino acids • Dimension of feature space = 20 • ( |alphabet of amino acides| = 20 ) • General method for any sequence data AKQDYYYYEI AKQ KQD QDY DYY YYY YYY YYE YEI String related kernels Mismatch string kernel Network Security Group, Computer Science at KAIST
AKQDYYYYEI ( 0 , 0 , … , 1 , … , 1 , … , 2 ) AAA AAD … AKQ … DYY … YYY String related kernels Mismatch string kernel • (k,0)-Spectrum Feature Map • Feature map for k-spectrum with no mismatches • For sequence X, F(k,0)(X) = (FT(X)){k-mers T}’ • where FT(X) = number of occurrences of T in X • For example (3,0)-spectrum feature map Network Security Group, Computer Science at KAIST
String related kernels Mismatch string kernel • (k,m)-Spectrum Feature Map • Feature map for k-spectrum, allowing m mismatches • If S is a k-mer, F(k,m)(S) = (FT(S)){k-mers T}’ • where FT(S) = P(S|T) if S is within m mismatches from T, otherwise 0 • Extend additively to longer sequences X by summing over all k-mers S in X • To train an SVM, can use kernel rather than explicit feature map • For sequences X, Y, feature map F, kernel value is inner product in feature space • K(X,Y) = <F(X), F(Y)> Network Security Group, Computer Science at KAIST
References N. Cristianini and S Taylor, “An introduction to Support Vector Machines,” Cambridge Univ. Press, 2000 H. Lodhi et al., “Text Classification using String Kernels,” Journal of Machine Learning Research, Feb., 2002. T. Joachims, “A Statistical Learning Model of Text Classification for Support Vector Machines,” Proceedings of SIGIR, 2001. M. Schonlau et al., “Computer Intrusion: Detecting Masqueraders,” Statistical Science, 16(1) 2001. R.A. Maxion et al., “Masquerade Detection Using Truncated Command Lines,” Proceedings of ICDSN, June 2002. G. Winskel, “The Formal Semantics of Programming Languages,” The MIT Press, 1993. C. Leslie et al., “Mismatch string kernels for SVM protein classification,” NIPS 2002. Network Security Group, Computer Science at KAIST
References H.S. Kim et al., “Efficient Masquerade Detection Based on SVM,” Journal of KIISC, Oct. 2003. C. Saunders et al., “String Kernels, Fisher Kernels and Finite State Automata,” Neural Information Processing Systems 2002. S.V.N. Vishwanathan et al., “Fast Kernels for String and Tree Matching,” Neural Information Processing Systems 2002. C. Leslie et al., “The Spectrum Kernel: A String Kernel for SVM Protein Classification,” Pacific Symposium on Bio-computing, 2002. T. Frieb et al., “The Kernel-Adatron Algorithm: a Fat and Simple Learning Procedure for Support Vector Machines,” 15th Intl. Conf. Machine Learning, 1998. S. Mukkamala et al., “Feature Ranking and Selection for Intrusion Detection Systems Using Support Vector Machines,” IEEE Intl. Joint Conf. on Neural Networks, 2002. Network Security Group, Computer Science at KAIST