1 / 22

Prophiler: A fast filter for the large-scale detection of malicious web pages

Prophiler: A fast filter for the large-scale detection of malicious web pages. Reporter : 鄭志欣 Advisor: Hsing-Kuo Pao Date : 2011/03/31. Conference.

brant
Download Presentation

Prophiler: A fast filter for the large-scale detection of malicious web pages

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Prophiler: A fast filter for the large-scale detectionof malicious web pages Reporter :鄭志欣 Advisor:Hsing-Kuo Pao Date : 2011/03/31

  2. Conference • Davide Canali, Marco Cova, Giovanni Vigna and Christopher Kruegel,"Prophiler: a Fast Filter for the Large-Scale Detection of Malicious Web Pages",20th International World Wide Web Conference (WWW 2011)

  3. Outline Introduction Approach Implementation and Setup Evaluation Conclusion

  4. Intruduction • Malicious Web pages • Drive-by-Download : JavaScript • Compromising hosts • Large-scare Botnets • Static analysis vs. Dynamic analysis • Dynamic analysis spent a lot of time. • Static analysis reduce the resources required for performing large-scale analysis. • URLblacklists (Google safe Browsing) • HoneyClient: Wepawet PhoneyC JSUnpack • Combined ? • Quickly discard benign pages forwarding to the costly analysis tools(Wepawet).

  5. Prophiler • Prophiler, uses static analysis techniques to quickly examine a web page for malicious content. • HTML , JavaScript , URL information • Model : Using Machine-Learning techniques

  6. Approach • Features • Neko HTML Parser • HTML, JavaScript,URL information • Total features : 77 • New features : 17 • Models

  7. Features

  8. Reference Paper • [26]C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages with Static Heuristics. In Proceedings of the Australasian Telecommunication Networks and Applications Conference (ATNAC), 2008. • [16] P. Likarish, E. Jung, and I. Jo. Obfuscated Malicious Javascript Detection using Classification Techniques. In Proceedings of the Conference on Malicious and Unwanted Software (Malware), 2009 • [6] B. Feinstein and D. Peck. Caffeine Monkey: Automated Collection, Detection and Analysis of Malicious JavaScript. In Proceedings of the Black Hat Security Conference, 2007. • [17] J. Ma, L. Saul, S. Savage, and G. Voelker. Beyond Blacklists: Learning to Detect Malicious Web Sites from Suspicious URLs. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, 2009. • [25] C. Seifert, I. Welch, and P. Komisarczuk. Identification of Malicious Web Pages Through Analysis of Underlying DNS and Web Server Relationships. In Proceedings of the LCN Workshop on Network Security (WNS), 2008.

  9. Effectiveness of new features

  10. Discussion • Assumptions • First, distribution of feature values for malicious examples is different from benign examples. • Second, the datasets used for model training share the same feature distribution as the real-world data that is evaluated using the models. • Trade-offs • False negative vs. False positive

  11. Implementation and Setup(cont.) • Prophiler as a filter for ourexisting dynamic analysis tool, called Wepawet. • Collection URLs : Heritrix (tools), Spam Email • Terms form Twitter , Google , Wikipedia trends • Collecting URLs : 2,000 URLs/day

  12. Implementation and Setup • The crawler fetches pages and submits them as input to Prophiler. • Server : • Ubuntu Linux x64 v 9.10 • 8-core Intel Xeon processor and 8 GB of RAM • The system in this configuration is able to analyze on average 320,000pages/day. • Analysis must examine around 2 million URLs each day.

  13. Evaluation Total web pages : 20 million web pages.

  14. Evaluation (cont.) • Training Set : • 787 Wepawet’s database. • 51,171 Top100 Alexa website • Google safebrowsing API ,anti-virus ,experts. • 10-Fold

  15. Evaluation (cont.) • Validation • 153,115 pages • Submitted to Wepawet spent 15 days • Benign : 139,321 pages • Malicious : 13,794 pages • False Positive : 10.4% • False Negative : 0.54% • Saving valuable resources

  16. Evaluation (cont.) • Large-scale Evaluation • 18,939,908 pages run 60-days • 14.3% as malicious • 85.7% as reduction of load on the back-end analyzer • 1,968 malicious pages/days (by Wepawet) • False Positive rate : 13.7% • False Negaitve rate : 1%

  17. 1968 every day as malicious by Wepawet

  18. Evaluation (cont.) • Comparsion • 15000 web pages • Malicious : 5861 pages • Benign : 9139 pages

  19. Conclusion We developed Prophiler, a system whose aim is to provide a filter that can reduce the number of web pages that need to be analyzed dynamically to identify malicious web pages. Deployed our system as a front-end for Wepawet , with very small false negative rate.

More Related