1 / 22

Intrusion Detection Using Neural Networks and Support Vector Machine

IEEE WCCI IJCNN 2002 World Congress on Computational Intelligence International Joint Conference on Neural Networks. Intrusion Detection Using Neural Networks and Support Vector Machine. Srinivas Mukkamala , Guadalupe Janoski , Andrew Sung

jamil
Download Presentation

Intrusion Detection Using Neural Networks and Support Vector Machine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. IEEE WCCI IJCNN 2002 World Congress on Computational Intelligence International Joint Conference on Neural Networks Intrusion Detection Using Neural Networks and Support Vector Machine SrinivasMukkamala, Guadalupe Janoski, Andrew Sung Dept. of CS in New Mexico Institute of Mining and Technology

  2. Outline • Approaches to intrusion detection using neural networks and support vector machines • DARPA dataset • Neural Networks • Support Vector Machines • Experiments • Conclusion and Comments

  3. Approaches • Key ideas are to • discover useful patterns or features that describe user behavior on a system • And use the set of relevant features to build classifiers that can recognize anomalies and known intrusions • Neural networks and support vector machines are trained with normal user activity and attack patterns • Significant deviations from normal behavior are flagged as attacks

  4. DARPA Data for Intrusion Detection • DARPA (Defense Advanced Research Projects Agency) • An agency of US Department of Defense responsible for the development of new technology for use by the military • Benchmark from a KDD (Knowledge Discovery and Data Mining) competition designed by DARPA • Attacks fall into four main categories • DOS: denial of service • R2L: unauthorized access from a remote machine • U2R: unauthorized access to local super user (root) privileges • Probing: surveillance and other probing

  5. Features http://kdd.ics.uci.edu/databases/kddcup99/task.html

  6. Neural Networks Neuron神經 Signals Dendrite樹突 Signals Gather signals Signals Signals Signals Signals Soma中心 Combine signals & decide to trigger Signals Signals Axon軸突 Output signal

  7. X2 平面的線: w1X1 + w2X2 – θ = 0 D A θ INPUT w1 OUTPUT X1 Σ w2 B C ACTIVATION Divide and Conquer WEIGHT -1 1 N1 -1 -1 1 Σ x1 N3 1 1 out1 x2 Σ out3 N2 -1 1 1 -1 Σ x1 out2 -1 x2

  8. Feed Forward Neural Network (FFNN) Layer 1 Layer 2 Layer 3 Layer 4 Decide Architecture Determine Weight Automatically 1 2 tanh(S) S x1(1) xj(l) N1 Nj Hyperbolic function eS – e-S Layer 1 tanh(S) = Layer l eS + e-S general S1(1) Sj(l) Σ Σ w21(1) w01(1) wij(l) w11(1) x0(0) x2(0) x1(0) xi(l-1)

  9. w w w Σ w w Input Output Σ w w w Σ w g(x) 由w所組成的classifier Training Data: E Error Function: How to minimize E(w) ?  Stochastic Gradient Descent (SGD) w is random small value at the beginning for T iterations wnew wold – η.▽w(En) w learning rate

  10. forward Layer 1 Layer 2 … … Layer L-1 Layer L …… for l = 1, 2, …, L compute Sj(l) and xj(l) x1(l) Back Propagation Algorithm backward Nj for l = L, L-1, …, 1 compute δi(l) Layer l Sj(l) Σ wij(l) General xi(l-1)

  11. Feed Forward NNet Consists of layers 1, 2, …, L w w w Σ wij(l) connect neuron i in layer (l-1) to neuron j in layer l w w Σ … … Cumulated signal w w w Σ Activated output w often tanh x1(l) Minimize E(w) and determine the weights automatically Nj SGD (Stochastic Gradient Descent) Layer l w is random small value at the beginning for T iterations wnew wold – η.▽w(En) Sj(l) Σ Forward: compute Sj(l) and xj(l) Backward: compute δi(l) wij(l) Stop when desired error rate was met xi(l-1)

  12. Support Vector Machine • A supervised learning method • Is known as the maximum margin classifier • Find the max-margin separating hyperplane

  13. SVM – hard margin <w, x> - θ = 0 x2 <w, x> - θ = +1 <w, x> - θ = -1 2 max ∥w∥ w, θ yn(<w, xn> - θ) ≧1 2 ∥w∥ 1 argmin <w, w> 2 w, θ yn(<w, xn> - θ) ≧1 x1

  14. Quadratic programming • Adapt the problem for quadratic programming • Find A, b, R, q and put into the quad. solver 1 1 ΣΣ aijvivj + Σ bivi argmin argmin <w, w> 2 2 i j w, θ v V*  quadprog(A, b, R, q) yn(<w, xn> - θ) ≧1 Σ rkivi≧ qk i Let V = [ θ, w1, w2, …, wD ] 1 D Σ wd2 2 d=1 D (-yn) θ + Σ yn (xn)d wd≧ 1 d=1

  15. Adaptation argmin v 1 V = [ θ, w1, w2, …, wD ] ΣΣ aijvivj + Σ bivi 2 i j v0, v1, v2, .…, vD Σ rkivi≧ qk i (1+D)*(1+D) v0 vd (1+D)*1 b0 = 0 i ≠ 0 bi = 0 a00 = 0 a0j = 0 ai0 = 0 i ≠ 0, j ≠ 0 aij = 1 (i = j) 0 (i ≠ j) rn0 = -yn d > 0 rnd = yn (xn)d 1 D Σ wd2 2 d=1 (2N)*(1+D) qn = 1 (2N)*1 D (-yn) θ + Σ yn (xn)d wd≧ 1 d=1

  16. SVM – soft margin • Allow possible training errors • Tradeoff c • Large c : thinner hyperplane, care about error • Small c : thicker hyperplane, not care about error tradeoff 1 <w, w> + cΣξn argmin errors 2 n w, θ yn(<w, xn> - θ) ≧1 – ξn ξn ≧ 0

  17. Adaptation 1 argmin ΣΣ aijvivj + Σ bivi 2 v i j Σ rkivi≧ qk i V = [ θ, w1, w2, …, wD, ξ1, ξ2, …, ξN ] (1+D+N)*(1+D+N) (1+D+N)*1 (2N)*1 (2N)*(1+D+N)

  18. Primal form and Dual form • Primal form 1 <w, w> + cΣξn argmin 2 n w, θ yn(<w, xn> - θ) ≧1 – ξn Variables: 1+D+N Constraints: 2N ξn ≧ 0 • Dual form 1 argmin ΣΣ αnynαmym<xn, xm>- Σ αn 2 n m n α 0 ≦αn≦C Variables: N Constraints: 2N+1 Σynαn = 0 n

  19. Dual form SVM • Find optimal α* • Use α* solve w* and θ • αn=0  correct or on • 0<αn<C  on • αn=C  wrong or on αn=C free SV Support Vector αn=0

  20. Nonlinear SVM • Nonlinear mapping X  Φ(X) • {(x)1, (x)2} R2  {1, (x)1, (x)2, (x)12, (x)22, (x)1(x)2} R6 • Need kernel trick 1 argmin ΣΣ αnynαmym<Φ(xn), Φ(xm)>- Σ αn 2 n m n α 0 ≦αn≦C Σynαn = 0 n (1+ <xn, xm>)2

  21. Experiments Using automated parsers to process the raw TCP/IP dump data into machine-readable form 7312 training data (different types of attacks and normal data) has 41 features 6980 testing data evaluate the classifier

  22. Conclusion and Comments • Speed • SVMs is significant shorter • Avoid the ”curse of dimensionality” by max-margin • Accuracy • Both have high accuracy • SVMs can only make binary classification • IDS requires multiple-class identification • How to determine the features?

More Related