100 likes | 305 Views
IMDS: Intelligent Malware Detection System. Yanfang Ye Dingding Wang Tao Li
E N D
IMDS: Intelligent Malware Detection System Yanfang Ye Dingding Wang Tao Li Dongyi Ye
Motivation Watch Out! Virus! • Threat to the security of computer systems • Signature based anti-virus systems fail to detect polymorphic or new malware • Some data mining techniques have shown promising results on small collection of malicious executables Polymorphic or NewX Signature based detection Our goal: Develop more effective and efficient data mining solutions to large collection of malicious executables OOA mining based classification
Data Collection and Preprocessing • PE viruses are in the majority of viruses rising in recent years • 17366 malicious executables provided by Anti-virus Laboratory of KingSoft Corporation • 12214 benign executables gathered from Windows system files • Develop a PE parser to construct API execution sequences
System Architecture OOA_Fast_FP_Growth algorithm Association rule based classification
Objective Oriented Association Mining • OOA Mining -- model association patterns relating to a user’s objective e.g. Obj1 = (Group = Malicious) Algorithms: OOA_Apriori, OOA_FP-Growth [1] • OOA_Fast_FP-Growth algorithm[4] -- A modification of OOA_FP-Growth[2,3] • Paths are directed, thus, fewer pointers are needed and less memory space is required • Each node is the sequence number of an item, which is determined by the support count of the item • Example (Kernel32.dll, OpenProcess;CopyFileA;CloseHandle;GetVersionExA;GetModuleFileNameA;WriteFile) Obj = (Group = Malicious) (os = 0.29, oc = 0.99) • Associative Classification CBA[5] -- build on rules with high support and confidence
Efficiency Experimental results (1) False positives of different scanners (1000 benign files) Running time of different OOA mining algorithms (sample: 3393 malicious / 2217 benign) Efficiency of different scanners (sample: 500 malicious / 1500 benign) N: Norton AntiVirus M:McAfee D:Dr.Web K:Kaspersky SAVE [6]: Static Analyzer of Vicious Exe- cutables
Experimental results (2) • Detection Ability Polymorphic malware detection Unknown malware detection
Experimental results (3) • Detection accuracy with different data mining solutions Results by using different classifiers. TP, TN, FP, FN, DR, and ACY refer to True Positive, True Negative, False Positive, False Negative, Detection Rate, and Accuracy, respectively
Conclusion • Summary • IMDS is an integrated system for malware detection, which consists of PE parser, OOA rule generator and rule based classifier • It is the first try to apply associative mining to detect malicious code among large scale of executables • The effectiveness and efficiency of IMDS outperform many widely-used anti-virus software and other data mining based malware detection methods • Future Work • Conduct further study to take sequence into consideration
Selected References • [1] Y.Shen, Q.Yang, and Z.Zhang. Objective-oriented utility-based association mining. In Proceedings of ICDM’02. • [2] J. Han and M. Kamber. Data mining: Concepts and techniques, 2nd edition. Morgan Kaufmann, 2006. • [3] J. Han, J. Pei, and Y. Yin. Mining frequent patterns without candidate generation. In Proceedings of SIGMOD, pages 1.12, May 2000. • [4] M. Fan and C. Li. Mining frequent patterns in an FP-tree without conditional FP-tree generation. Journal of Computer Research and Development, 40:1216.1222, 2003. • [5] B. Liu, W. Hsu, and Y. Ma. Integrating classification and association rule mining. In Proceedings of KDD’98. • [6] A. Sung, J. Xu, P. Chavez, and S. Mukkamala. Static analyzer of vicious executables (SAVE). In Proceedings of the 20th Annual Computer Security Applications Conference, 2004. • [7] J. Xu, A. Sung, P. Chavez, and S. Mukkamala. Polymorphic malicious executable scanner by API sequence analysis. In Proceedings of the International Conference on Hybrid Intelligent Systems, 2004. • [8] J. Wang, P. Deng, Y. Fan, L. Jaw, and Y. Liu. Virus detection using data mining techniques. In Proceedings of IEEE International Conference on Data Mining, 2003.