1 / 33

Self-Learning Anti-Virus Scanner

Self-Learning Anti-Virus Scanner. Arun Lakhotia , Professor Andrew Walenstein , Assistant Professor University of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL. Introduction. Alumni in AV Industry Prabhat Singh Nitin Jyoti Aditya Kapoor Rachit Kumar McAfee AVERT

marcos
Download Presentation

Self-Learning Anti-Virus Scanner

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Self-Learning Anti-Virus Scanner ArunLakhotia, Professor Andrew Walenstein, Assistant ProfessorUniversity of Louisiana at Lafayette www.cacs.louisiana.edu/labs/SRL 2008 AVAR (New Delhi)

  2. Introduction • Alumni in AV Industry • Prabhat Singh • Nitin Jyoti • Aditya Kapoor • Rachit Kumar McAfee AVERT • Erik Uday Kumar,Authentium • Moinuddin Mohammed,Microsoft • Prashant Pathak, Ex-Symantec • Funded by: Louisiana Governor’s IT Initiative • Director, Software Research Lab • Lab’s focus: Malware Analysis • Graduate level course on Malware Analysis • Six years of AV related research • Issues investigated: • Metamorphism • Obfuscation AVAR 2008 (New Delhi)

  3. Outline • Attack of Variants • AV vulnerability: Exact match • Information Retrieval Techniques • Inexact match • Adapting IR to AV • Account for code permutation • Vilo: System using IR for AV • Integrating Vilo into AV Infrastructure • Self-Learning AV using Vilo 2008 AVAR (New Delhi)

  4. ATTACK OF VARIANTS 2008 AVAR (New Delhi)

  5. Variants vs Family Source: Symantec Internet Threat Report, XI AVAR 2008 (New Delhi)

  6. Analysis of attacker strategy • Purpose of attack of variants • Denial of Service on AV infrastructure • Increase odds of passing through • Weakness exploited • AV system use: Exact match over extract • Attack strategy • Generate just enough variation to beat exact match • Attacker cost • Cost of generating and distributing variants 2008 AVAR (New Delhi)

  7. Analyzing attacker cost • Payload creation is expensive • Must reuse payload • Need thousands of variants • Must be automated • “General” transformers are expensive • Specialized, limited transformers • Hence packers/unpackers 2008 AVAR (New Delhi)

  8. Attacker vulnerability • Automated transformers • Limited capability • Machine generated, must have regular pattern • Exploiting attacker vulnerability • Detect patterns of similarities • Approach • Information Retrieval (this presentation) • Markov Analysis (other work) 2008 AVAR (New Delhi)

  9. Information Retrieval 2008 AVAR (New Delhi)

  10. IR Basics • Basis of Google, Bioinformatics • Organizing very large corpus of data • Key idea • Inexact match over whole • Contrast with AV • Exact match over extract 2008 AVAR (New Delhi)

  11. IR Problem Document Collection IR Related documents Query: Keywords orDocument AVAR 2008 (New Delhi)

  12. IR Steps Step 1: Convert documents to vectors 1a. Define a method to identify “features” Example: k-consecutive words 1b. Extract all features from all documents Have you wondered When is a rose a rose? 1c. Count features, make feature vector Have you wondered 1 You wondered when 1 Wondered when rose 1 When rose rose 1 [1, 1, 1, 1, 0,0] How about onions 0 Onion smell stinks 0 AVAR 2008 (New Delhi)

  13. IR Steps • Step 2: Compute feature vectors • Take into account features in entire corpus • Classical method • W=TF x IDF DF = # documents containing the feature IDF = Inverse of DF TF = Term Frequency TF(v1) DF w1 = TFxIDF(v1) IDF You wondered when 5 1 1/5 1/5 Wondered when rose 7 2 1/7 2/7 When rose rose 5 8 5/8 1/8 How about onions 6 3 1/6 3/6 Onion smell stinks 3 0 1/3 0/3 AVAR 2008 (New Delhi)

  14. IR Steps • Step 3: Compare vectors • Cosine similarity w1 = [0.33, =0.25, 0.66, 0.50] w1 = [0.33, =0.25, 0.66, 0.50] 2008 AVAR (New Delhi)

  15. IR Steps Document Collection • Step 4: Document Ranking • Using similarity measure Matching document 0.30 0.82 0.90 0.76 IR New Document AVAR 2008 (New Delhi)

  16. Adapting IR for AV AVAR 2008 (New Delhi)

  17. l2D2: push ecx push 4 pop ecx push ecx l2D7: rol edx, 8 mov dl, al and dl, 3Fh shr eax, 6 loop l2D7 pop ecx call s319 xchg eax, edx stosd xchg eax, edx inc [ebp+v4] cmp [ebp+v4], 12h jnz short l305 l2D2: push ecx push 4 pop ecx push ecx l2D7: rol edx, 8 mov dl, al and dl, 3Fh shr eax, 6 loop l2D7 pop ecx call s319 xchg eax, edx stosd xchg eax, edx inc [ebp+v4] cmp [ebp+v4], 12h jnz short l305 push push pop push rol mov and shr loop pop call xchg stosd xchg inc cmp jnz l144: push ecx push 4 pop ecx push ecx l149: mov dl, al and dl, 3Fh rol edx, 8 shr ebx, 6 loop l149 pop ecx call s52F xchg ebx, edx stosd xchg ebx, edx inc [ebp+v4] cmp [ebp+v4], 12h jnz short l18 l144: push ecx push 4 pop ecx push ecx l149: mov dl, al and dl, 3Fh rol edx, 8 shr ebx, 6 loop l149 pop ecx call s52F xchg ebx, edx stosd xchg ebx, edx inc [ebp+v4] cmp [ebp+v4], 12h jnz short l18 push push pop push mov and rol shr loop pop call xchg stosd xchg inc cmp jnz Adapting IR for AV Step 0: Mapping program to document Extract Sequence of operations 2008 AVAR (New Delhi)

  18. P P O P R M A S L O C X S X I C J P P O P M A R S L O C X S X I C J Virus 1 Virus 2 Adapting IR for AV Step 1a: Defining features k-perm P P O P R M A S L O C X S X I C J P P O P S L O C X S X I C J M A R Feature = Permutation of k operations 2008 AVAR (New Delhi)

  19. Virus 1 Virus 2 P P O P M A R S L O C X S X I C J P P O P M A R S L O C X S X I C J Adapting IR for AV Step 1 Example of 3-perm P P O P R M A S L O C X S X I C J P O P Virus 3 P P O P M A R S L O C X S X I C J AVAR 2008 (New Delhi)

  20. 1 P O PR M A S L 2 P O PM A R S L M A R S L P O P 3 MARS PMAR MARS PMAR 0 0 1 0 0 1 1 0 0 Adapting IR for AV Step 2: Construct feature vectors (4-perms) AVAR 2008 (New Delhi)

  21. Adapting IR for AV • Step 3: Compare vectors • Cosine similarity (as before) • Step 4: Match new sample AVAR 2008 (New Delhi)

  22. Vilo: System using IR for AV AVAR 2008 (New Delhi)

  23. Vilo Functional View Malware Collection Malware Match 0.90 0.82 0.76 0.30 Vilo New Sample AVAR 2008 (New Delhi)

  24. Vilo in Action: Query Match AVAR 2008 (New Delhi)

  25. Vilo: Performance Response time vs Database size Search on generic desktop: In Seconds Contrast with Behavior match: In Minutes Graph match: In Minutes AVAR 2008 (New Delhi)

  26. Vilo Match Accuracy ROC Curve: True Positive vs False Positive True Positive False Positive AVAR 2008 (New Delhi)

  27. Vilo in AV Product AVAR 2008 (New Delhi)

  28. Vilo in AV Product AV Systems: Composed of classifiers Classifier Classifier Classifier Vilo Classifier Classifier AV Scanner Introduce Vilo as a Classifier AVAR 2008 (New Delhi)

  29. Self-Learning AV Product How to get malware collection? Collect malware detected by the Product. Solution 1 Vilo Classifier Classifier AVAR 2008 (New Delhi)

  30. Self-Learning AV Product Solution 2 How to get malware collection? Collect and learn in the cloud Vilo Internet Cloud Vilo Classifier Classifier AVAR 2008 (New Delhi)

  31. Learning in the Cloud Solution 2 How to get malware collection? Collect and learn in the cloud Internet Cloud Vilo Learner Vilo Classifier Classifier Classifier AVAR 2008 (New Delhi)

  32. Experience with Vilo-Learning • Vilo-in-the-cloud holds promise • Can utilize cluster of workstations • Like Google • Take advantage of increasing bandwidth and compute power • Engineering issues to address • Control growth of database • Forget samples • Use “signature” feature vector(s) for family • Be “selective” about features to use AVAR 2008 (New Delhi)

  33. Summary • Weakness of current AV system • Exact match over extract • Exploited by creating large number of variants • Information Retrieval research strengths • Inexact match over whole • VILO demonstrates IR techniques have promise • Architecture of Self-Learning AV System • Integrate VILO into existing AV systems • Create feedback mechanism to drive learning AVAR 2008 (New Delhi)

More Related