220 likes | 437 Views
Prophiler: A fast filter for the large-scale detection of malicious web pages. Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/03/31. Conference.
E N D
Prophiler: A fast filter for the large-scale detectionof malicious web pages Reporter :鄭志欣 Advisor:Hsing-Kuo Pao Date : 2011/03/31
Conference • Davide Canali, Marco Cova, Giovanni Vigna and Christopher Kruegel,"Prophiler: a Fast Filter for the Large-Scale Detection of Malicious Web Pages",20th International World Wide Web Conference (WWW 2011)
Outline Introduction Approach Implementation and Setup Evaluation Conclusion
Intruduction • Malicious Web pages • Drive-by-Download : JavaScript • Compromising hosts • Large-scare Botnets • Static analysis vs. Dynamic analysis • Dynamic analysis spent a lot of time. • Static analysis reduce the resources required for performing large-scale analysis. • URLblacklists (Google safe Browsing) • HoneyClient: Wepawet PhoneyC JSUnpack • Combined ? • Quickly discard benign pages forwarding to the costly analysis tools(Wepawet).
Prophiler • Prophiler, uses static analysis techniques to quickly examine a web page for malicious content. • HTML , JavaScript , URL information • Model : Using Machine-Learning techniques
Approach • Features • Neko HTML Parser • HTML, JavaScript,URL information • Total features : 77 • New features : 17 • Models
Reference Paper • [26]C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages with Static Heuristics. In Proceedings of the Australasian Telecommunication Networks and Applications Conference (ATNAC), 2008. • [16] P. Likarish, E. Jung, and I. Jo. Obfuscated Malicious Javascript Detection using Classification Techniques. In Proceedings of the Conference on Malicious and Unwanted Software (Malware), 2009 • [6] B. Feinstein and D. Peck. Caffeine Monkey: Automated Collection, Detection and Analysis of Malicious JavaScript. In Proceedings of the Black Hat Security Conference, 2007. • [17] J. Ma, L. Saul, S. Savage, and G. Voelker. Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2009. • [25] C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages Through Analysis of Underlying DNS and Web Server Relationships. In Proceedings of the LCN Workshop on Network Security (WNS), 2008.
Discussion • Assumptions • First, distribution of feature values for malicious examples is different from benign examples. • Second, the datasets used for model training share the same feature distribution as the real-world data that is evaluated using the models. • Trade-offs • False negative vs. False positive
Implementation and Setup(cont.) • Prophiler as a filter for ourexisting dynamic analysis tool, called Wepawet. • Collection URLs : Heritrix (tools), Spam Email • Terms form Twitter , Google , Wikipedia trends • Collecting URLs : 2,000 URLs/day
Implementation and Setup • The crawler fetches pages and submits them as input to Prophiler. • Server : • Ubuntu Linux x64 v 9.10 • 8-core Intel Xeon processor and 8 GB of RAM • The system in this configuration is able to analyze on average 320,000pages/day. • Analysis must examine around 2 million URLs each day.
Evaluation Total web pages : 20 million web pages.
Evaluation (cont.) • Training Set : • 787 Wepawet’s database. • 51,171 Top100 Alexa website • Google safebrowsing API ,anti-virus ,experts. • 10-Fold
Evaluation (cont.) • Validation • 153,115 pages • Submitted to Wepawet spent 15 days • Benign : 139,321 pages • Malicious : 13,794 pages • False Positive : 10.4% • False Negative : 0.54% • Saving valuable resources
Evaluation (cont.) • Large-scale Evaluation • 18,939,908 pages run 60-days • 14.3% as malicious • 85.7% as reduction of load on the back-end analyzer • 1,968 malicious pages/days (by Wepawet) • False Positive rate : 13.7% • False Negaitve rate : 1%
Evaluation (cont.) • Comparsion • 15000 web pages • Malicious : 5861 pages • Benign : 9139 pages
Conclusion We developed Prophiler, a system whose aim is to provide a filter that can reduce the number of web pages that need to be analyzed dynamically to identify malicious web pages. Deployed our system as a front-end for Wepawet , with very small false negative rate.