1 / 45

Searching the Searchers with SearchAudit

Searching the Searchers with SearchAudit. John P., Fang Yu, Yinglian Xie , Martin Abadi , Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010. A Presentation at Advanced Defense Lab. Outline. Introduction Related Work Architecture

cheche
Download Presentation

Searching the Searchers with SearchAudit

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Searching the Searchers with SearchAudit John P., Fang Yu, YinglianXie, Martin Abadi, Arvind Krishnamurthy University of California, Santa Cruz USENIX SECURITY SYMPOSIUM, August, 2010 A Presentation at Advanced Defense Lab

  2. Outline • Introduction • Related Work • Architecture • Implementation – Stage 1 • Implementation – Stage 2 • Attack 1: Indentifying Vulnerable Web Sites • Attack 2: Forum Spamming • Attack 3: Windows Live Messenger Phishing • Conclusion Advanced Defense Lab

  3. Introduction • A framework that identifies malicious queries from massive search engine logs to uncover their relationship with potential attacks. • Use a small set of malicious queries as seed, and generates regular expressions for detecting new malicious queries. Advanced Defense Lab

  4. Introduction • Two stage: • Identification • Investigation • SearchAudit identifies malicious queries. • Analyzing those queries and the attacks of which they are part. Advanced Defense Lab

  5. Introduction • Enhanced detection capability • 400 becomes 4 million. • Low false-positive rates. • 2% • Ability to detect new attacks • Forum spaming • Facilitation of attack analysis • Analyze a series of phishing attacks that lasted for more than one year. Advanced Defense Lab

  6. Outline • Introduction • Related Work • Architecture • Implementation – Stage 1 • Implementation – Stage 2 • Attack 1: Indentifying Vulnerable Web Sites • Attack 2: Forum Spamming • Attack 3: Windows Live Messenger Phishing • Conclusion Advanced Defense Lab

  7. Related Work • There’s a significant amount of automated Web traffic on the Internet. • Another research showed that more than 3% of the entire search traffic may be generated by stealthy search bots. • What’s the motivation of those search bots? • Search engine competitors • Studying search quality • Click fraud for monetary gain • Spreading infection (MyDoom, Santy) • Identifying victims Advanced Defense Lab

  8. Related Work • Using regular expression patterns • Hon-eycomb • Polygraph • Hamsa • AutoRE (A way to generate RE from another research) Advanced Defense Lab

  9. Outline • Introduction • Related Work • Architecture • Implementation – Stage 1 • Implementation – Stage 2 • Attack 1: Indentifying Vulnerable Web Sites • Attack 2: Forum Spamming • Attack 3: Windows Live Messenger Phishing • Conclusion Advanced Defense Lab

  10. Architecture • Let attackers be our guides • Follow their activities and predict their future attacks. Advanced Defense Lab

  11. Architecture • Platform • Dryad/DryadLINQ • Query Expansion • Taking a small set of seed queries and expand them • Extract IPs and search again • Regular Expression Generation • Signature Generation (AutoRE) • Eliminating Redundancies • Eliminating Proxies Advanced Defense Lab

  12. Arch. – Eliminating Redundancies • Algorithm • REGEX_CONSOLIDATE Advanced Defense Lab

  13. Architecture – Eliminating Proxies • Most users in a geographical region have similar query patterns. • Mostly legitimate users’ queries will have a large overlap with the popular queries from the same /16 IP prefix. • We label an IP as a proxy if K most popular queries from that IP and the K most popular queries from that prefix overlap in m queries. • K = 100, m = 5 Advanced Defense Lab

  14. Outline • Introduction • Related Work • Architecture • Implementation – Stage 1 • Implementation – Stage 2 • Attack 1: Indentifying Vulnerable Web Sites • Attack 2: Forum Spamming • Attack 3: Windows Live Messenger Phishing • Conclusion Advanced Defense Lab

  15. Data Description and Sys Setup • Use 3 months of search logs from the Bing search engine. • February 2009 (when it was known as Live Search) • December 2009 • January 2010 • Each month of sampled data contains around 2 billion pageviews. • The seed 500 malicious queries are obtained from a hacker Web site milw0rm.com • Takes about 7 hours to process the 1.2 TB of sampled data. Advanced Defense Lab

  16. Selection of RE • Use Cookies to identify the malicious queries. • Benign proxy are eliminated. • Use a threshold to pick regular expressions based on their scores. Advanced Defense Lab

  17. Detection Results:Effect of Query Expansion and Regular Expression Matching • Feed the 500 malicious queries into SearchAudit, we find that 122 of the 500 queries appear in the dataset. • February 2009 dataset • 174 IPs issued these queries • Use the result to feed our system again • 800 unique queries from 264 IPs Advanced Defense Lab

  18. Detection Results Advanced Defense Lab

  19. Effect of Incomplete Seeds • Split the 122 seed queries into two sets • 100 queries that were first posted on milw0rm.com before 2009 • 22 queries were posted in 2009 Advanced Defense Lab

  20. Looping Back Seed Queries • Use derived RE as new seeds to feed back as an input to SearchAudit. Advanced Defense Lab

  21. Overall Matching Statistics Advanced Defense Lab

  22. Verification of Malicious Queries • As we lack ground truth information about whether a query is malicious or not. • Check whether the query is reported on any hacker Web sites • Check query behavior whether the query matches individual bot or botnet features • For each query q returned by SearchAudit • Issue a query “q AND (dork OR vulnerability)” to search engine, and save the results. Advanced Defense Lab

  23. Verification of Queries Generated by Individual Bots • Two features help us to distinguish bot queries from human queries • Cookie: • Most bot queries do not enable cookies, resulting in an empty cookie field. • Normal users who do not clear their cookies, all the queries carry the old cookies. • Link clicked • Many bots do not click any link on the result page. Instead, they scrape the results off the page. Advanced Defense Lab

  24. Verification of Queries Generated by Individual Bots Advanced Defense Lab

  25. Verification of Queries Generated by Botnets • If most of the IPs that issued malicious queries exhibit similar behavior, then it’s likely that all these IPs were running the same script. • User agent • Contains information about the browser and the version used • Metadata • Records certain metadata that comes with the request • Pages per query • Records the number of search result pages retrieved per query • Inter-query interval • Denotes the time between queries issued by the same IP Advanced Defense Lab

  26. Verification of Queries Generated by Botnets Advanced Defense Lab

  27. Verification of Queries Generated by Botnets Advanced Defense Lab

  28. Outline • Introduction • Related Work • Architecture • Implementation – Stage 1 • Implementation – Stage 2 • Attack 1: Indentifying Vulnerable Web Sites • Attack 2: Forum Spamming • Attack 3: Windows Live Messenger Phishing • Conclusion Advanced Defense Lab

  29. Analysis of Detection Results • Large countries such as USA, Russia, and China are responsible for almost half the IPs issuing malicious queries. • Vulnerable Web Sites • Try to exploit these web sites by SQL injection • index.php?content=[ˆ?=#+;&:]{1,10} • Try to find particular software with known vulnerabilities • “Power by” • Forum spamming • “/includes/joomla.php” site:.[a-zA-Z]{2,3} • Windows Live Messenger phishing Advanced Defense Lab

  30. Analysis of Detection Results Advanced Defense Lab

  31. Outline • Introduction • Related Work • Architecture • Implementation – Stage 1 • Implementation – Stage 2 • Attack 1: Indentifying Vulnerable Web Sites • Attack 2: Forum Spamming • Attack 3: Windows Live Messenger Phishing • Conclusion Advanced Defense Lab

  32. Identifying Vulnerable Web Sites • Applications of Vulnerability Searches • Sample 5000 queries returned by SearchAudit. • For every query q we issue a query “q –dork –vulnerability”. • Obtain 80,490 URLs from 39,475 unique Web sites. • Compare this list of random Web sites against a list of known phishing or malware sites. • PhishTank • Microsoft • Test and show that many of these sites indeed have SQL injection vulnerabilities. Advanced Defense Lab

  33. Identifying Vulnerable Web Sites Advanced Defense Lab

  34. SQL Injection Vulnerabilities • For the malicious queries, we look at the search results and crawl all of the links twice. • First time, we crawl the link as is • Second time, we add a single quote (‘) • If the two pages are identical, then it suggests that there’s no obvious SQL injection vulnerability • If the second page have any kind of SQL error, then there might exists an SQL injection vulnerability • In 14,500 URLs, we find 1,760 URLs (12%) may have SQL injection vulnerability. Advanced Defense Lab

  35. Outline • Introduction • Related Work • Architecture • Implementation – Stage 1 • Implementation – Stage 2 • Attack 1: Indentifying Vulnerable Web Sites • Attack 2: Forum Spamming • Attack 3: Windows Live Messenger Phishing • Conclusion Advanced Defense Lab

  36. Forum-Spamming Attacks • We manually identified 46 REs that are associated with forum spamming. Advanced Defense Lab

  37. Advanced Defense Lab

  38. Forum-Spamming Attacks Advanced Defense Lab

  39. Apps of Forum Searching Queries • Using Project Hony Pot to identify Web spamming Advanced Defense Lab

  40. Outline • Introduction • Related Work • Architecture • Implementation – Stage 1 • Implementation – Stage 2 • Attack 1: Indentifying Vulnerable Web Sites • Attack 2: Forum Spamming • Attack 3: Windows Live Messenger Phishing • Conclusion Advanced Defense Lab

  41. Windows Live MSN Phishing • What is a MSN Phishing ? • http://[a-zA-Z0-9._]*.<domain-name>/ • http://<domain-name>?user=[a-zA-Z0-9._]* Advanced Defense Lab

  42. Windows Live MSN Phishing Advanced Defense Lab

  43. Characteristics of Compromised Accounts Advanced Defense Lab

  44. Outline • Introduction • Related Work • Architecture • Implementation – Stage 1 • Implementation – Stage 2 • Attack 1: Indentifying Vulnerable Web Sites • Attack 2: Forum Spamming • Attack 3: Windows Live Messenger Phishing • Conclusion Advanced Defense Lab

  45. Conclusion Advanced Defense Lab

More Related