220 likes | 452 Views
IEEE WCCI IJCNN 2002 World Congress on Computational Intelligence International Joint Conference on Neural Networks. Intrusion Detection Using Neural Networks and Support Vector Machine. Srinivas Mukkamala , Guadalupe Janoski , Andrew Sung
E N D
IEEE WCCI IJCNN 2002 World Congress on Computational Intelligence International Joint Conference on Neural Networks Intrusion Detection Using Neural Networks and Support Vector Machine SrinivasMukkamala, Guadalupe Janoski, Andrew Sung Dept. of CS in New Mexico Institute of Mining and Technology
Outline • Approaches to intrusion detection using neural networks and support vector machines • DARPA dataset • Neural Networks • Support Vector Machines • Experiments • Conclusion and Comments
Approaches • Key ideas are to • discover useful patterns or features that describe user behavior on a system • And use the set of relevant features to build classifiers that can recognize anomalies and known intrusions • Neural networks and support vector machines are trained with normal user activity and attack patterns • Significant deviations from normal behavior are flagged as attacks
DARPA Data for Intrusion Detection • DARPA (Defense Advanced Research Projects Agency) • An agency of US Department of Defense responsible for the development of new technology for use by the military • Benchmark from a KDD (Knowledge Discovery and Data Mining) competition designed by DARPA • Attacks fall into four main categories • DOS: denial of service • R2L: unauthorized access from a remote machine • U2R: unauthorized access to local super user (root) privileges • Probing: surveillance and other probing
Features http://kdd.ics.uci.edu/databases/kddcup99/task.html
Neural Networks Neuron神經 Signals Dendrite樹突 Signals Gather signals Signals Signals Signals Signals Soma中心 Combine signals & decide to trigger Signals Signals Axon軸突 Output signal
X2 平面的線: w1X1 + w2X2 – θ = 0 D A θ INPUT w1 OUTPUT X1 Σ w2 B C ACTIVATION Divide and Conquer WEIGHT -1 1 N1 -1 -1 1 Σ x1 N3 1 1 out1 x2 Σ out3 N2 -1 1 1 -1 Σ x1 out2 -1 x2
Feed Forward Neural Network (FFNN) Layer 1 Layer 2 Layer 3 Layer 4 Decide Architecture Determine Weight Automatically 1 2 tanh(S) S x1(1) xj(l) N1 Nj Hyperbolic function eS – e-S Layer 1 tanh(S) = Layer l eS + e-S general S1(1) Sj(l) Σ Σ w21(1) w01(1) wij(l) w11(1) x0(0) x2(0) x1(0) xi(l-1)
w w w Σ w w Input Output Σ w w w Σ w g(x) 由w所組成的classifier Training Data: E Error Function: How to minimize E(w) ? Stochastic Gradient Descent (SGD) w is random small value at the beginning for T iterations wnew wold – η.▽w(En) w learning rate
forward Layer 1 Layer 2 … … Layer L-1 Layer L …… for l = 1, 2, …, L compute Sj(l) and xj(l) x1(l) Back Propagation Algorithm backward Nj for l = L, L-1, …, 1 compute δi(l) Layer l Sj(l) Σ wij(l) General xi(l-1)
Feed Forward NNet Consists of layers 1, 2, …, L w w w Σ wij(l) connect neuron i in layer (l-1) to neuron j in layer l w w Σ … … Cumulated signal w w w Σ Activated output w often tanh x1(l) Minimize E(w) and determine the weights automatically Nj SGD (Stochastic Gradient Descent) Layer l w is random small value at the beginning for T iterations wnew wold – η.▽w(En) Sj(l) Σ Forward: compute Sj(l) and xj(l) Backward: compute δi(l) wij(l) Stop when desired error rate was met xi(l-1)
Support Vector Machine • A supervised learning method • Is known as the maximum margin classifier • Find the max-margin separating hyperplane
SVM – hard margin <w, x> - θ = 0 x2 <w, x> - θ = +1 <w, x> - θ = -1 2 max ∥w∥ w, θ yn(<w, xn> - θ) ≧1 2 ∥w∥ 1 argmin <w, w> 2 w, θ yn(<w, xn> - θ) ≧1 x1
Quadratic programming • Adapt the problem for quadratic programming • Find A, b, R, q and put into the quad. solver 1 1 ΣΣ aijvivj + Σ bivi argmin argmin <w, w> 2 2 i j w, θ v V* quadprog(A, b, R, q) yn(<w, xn> - θ) ≧1 Σ rkivi≧ qk i Let V = [ θ, w1, w2, …, wD ] 1 D Σ wd2 2 d=1 D (-yn) θ + Σ yn (xn)d wd≧ 1 d=1
Adaptation argmin v 1 V = [ θ, w1, w2, …, wD ] ΣΣ aijvivj + Σ bivi 2 i j v0, v1, v2, .…, vD Σ rkivi≧ qk i (1+D)*(1+D) v0 vd (1+D)*1 b0 = 0 i ≠ 0 bi = 0 a00 = 0 a0j = 0 ai0 = 0 i ≠ 0, j ≠ 0 aij = 1 (i = j) 0 (i ≠ j) rn0 = -yn d > 0 rnd = yn (xn)d 1 D Σ wd2 2 d=1 (2N)*(1+D) qn = 1 (2N)*1 D (-yn) θ + Σ yn (xn)d wd≧ 1 d=1
SVM – soft margin • Allow possible training errors • Tradeoff c • Large c : thinner hyperplane, care about error • Small c : thicker hyperplane, not care about error tradeoff 1 <w, w> + cΣξn argmin errors 2 n w, θ yn(<w, xn> - θ) ≧1 – ξn ξn ≧ 0
Adaptation 1 argmin ΣΣ aijvivj + Σ bivi 2 v i j Σ rkivi≧ qk i V = [ θ, w1, w2, …, wD, ξ1, ξ2, …, ξN ] (1+D+N)*(1+D+N) (1+D+N)*1 (2N)*1 (2N)*(1+D+N)
Primal form and Dual form • Primal form 1 <w, w> + cΣξn argmin 2 n w, θ yn(<w, xn> - θ) ≧1 – ξn Variables: 1+D+N Constraints: 2N ξn ≧ 0 • Dual form 1 argmin ΣΣ αnynαmym<xn, xm>- Σ αn 2 n m n α 0 ≦αn≦C Variables: N Constraints: 2N+1 Σynαn = 0 n
Dual form SVM • Find optimal α* • Use α* solve w* and θ • αn=0 correct or on • 0<αn<C on • αn=C wrong or on αn=C free SV Support Vector αn=0
Nonlinear SVM • Nonlinear mapping X Φ(X) • {(x)1, (x)2} R2 {1, (x)1, (x)2, (x)12, (x)22, (x)1(x)2} R6 • Need kernel trick 1 argmin ΣΣ αnynαmym<Φ(xn), Φ(xm)>- Σ αn 2 n m n α 0 ≦αn≦C Σynαn = 0 n (1+ <xn, xm>)2
Experiments Using automated parsers to process the raw TCP/IP dump data into machine-readable form 7312 training data (different types of attacks and normal data) has 41 features 6980 testing data evaluate the classifier
Conclusion and Comments • Speed • SVMs is significant shorter • Avoid the ”curse of dimensionality” by max-margin • Accuracy • Both have high accuracy • SVMs can only make binary classification • IDS requires multiple-class identification • How to determine the features?