230 likes | 248 Views
Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware. SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging Yanfang Ye, Lifei Chen, Dingding Wang, Tao Li, Qingshan Jiang, Min Zhao. MOTIVATION.
E N D
Presented by: Akanksha Kaul Dept of Computer & Information Sciences University of Delaware SBMDS: an interpretable string based malware detection system using SVM ensemble with bagging Yanfang Ye, Lifei Chen, Dingding Wang, Tao Li, Qingshan Jiang, Min Zhao
MOTIVATION • Urgent need to detect malicious executables • Major Threats • Metamorphic Executables • Reprograms itself • Capable of infecting two OS. • Polymorphic Executables • Emulates as Non-malicious code • Unseen Executables
Need of the Hour • SBMDS String Based Malware Detection System • What this system is exactly all about?? • Performs Interpretable String Analysis Interpretable string is line of codes in a program which contains both API execution calls and important semantic strings representing the intent and goal of the program writer.
Interpretable String??? • Eg: Worm “Nimda” “html script language = ‘javascript’ window.open(‘readme.eml’)” • Another Example: “&gameid= %s&pass=%s; myparentthreadid=%d; myguid=%s” • But all Strings are not interpretable Eg: “!0&0h0m0o0t0y0” “*3d%3dtgyhjij”,
Major Steps to perform • Constructing the interpretable strings by developing a feature parser. • Performing feature selection to select informative strings. • Using SVM ensemble with bagging to construct the classifier. • Conducting the malware detector, also predict the exact type of the malware.
Step 1 • Develop Feature parser 39,838 executable collected from Kingsoft Anti-virus lab. All executables are PE files. Extract static features API calls from import table. Strings carrying semantic interpretation.
SAMPLE (Backdoor-Redgirl.exe) ‘%s’ goto delete” always implicates that the malware may generate the “.bat” file to suicide
Step 2 • Feature Selection Selects only interpretable strings from the huge set of strings obtained from previous step. Assign these strings as signatures of the PE files.
Step 3 • Using SVM to CLASSIFY Why SVM ?? • Have showed state-of-art results in classification problem. Problem: training complexity of SVM dependent on size of data set.
Problem Training Accuracy becomes Constant when size of dataset reaches 3000
Curse of Dimensionality?? • Problem caused by the exponential increase in volume of data. • How does SVM deals with “Curse of Dimensionality” • Solution: By Using SVM ensemble & • Bagging • SVM ensemble and Bagging???
3.1 SVM Ensemble with Bagging • Ensemble is a set of classifiers whose individual decisions are combined in some way to classify new samples. • Bagging technique on the training set “BAGGING” (Bootstrap AGGregating) • Uniform sampling of training data set
3.2 Multi-Classification • Various classes of Malwares. • To select the identical values from two different classes method of “MAJORITY VOTING” is used. • Smallest index is chosen 1= Backdoors 2= Spywares 3= Trojans 4= Worms 0= Benign files
STEP 4: Malware Detection • Unknown variants of malwares are used. • Malicious or not. • To which class Malware belongs to.
System Architecture 1.Feature Parser 2. Feature Selection 3. SVM Ensemble Classifier 4. Malware Detector
Reason why I Chose This paper • Comparisons With the Popular Anti- Virus Software. Points of Comparisons: • Detecting Known Variants of Malware. • Detecting Unknown Variants. • Efficiency (Detection Time). • Number of False positive Detections.
Conclusion • This system has been already incorporated into the scanning tool of a commercial Anti- Virus software. • Anti-Virus Name not Disclosed.
All Well that Ends Well THANK YOU