1 / 20

Knowing Your Enemy: Understanding and Detecting Malicious Web Advertising

Knowing Your Enemy: Understanding and Detecting Malicious Web Advertising. Presenter: Zhou Li (Indiana University) Kehuan Zhang (Indiana University ), Yinglian Xie (MSR), Fang Yu (MSR), XiaoFeng Wang (Indiana University ). 10/18/2012. Ad. Ad. Ad. Ad. Ad. Landscape. Ad Exchange.

cleave
Download Presentation

Knowing Your Enemy: Understanding and Detecting Malicious Web Advertising

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Knowing Your Enemy: Understanding and DetectingMalicious Web Advertising Presenter: Zhou Li (Indiana University) Kehuan Zhang (Indiana University), YinglianXie(MSR), Fang Yu (MSR), XiaoFeng Wang (Indiana University) 10/18/2012

  2. Ad Ad Ad Ad Ad

  3. Landscape Ad Exchange • Online advertising, a billion-dollar business Ad Network [http://www.adexchanger.com/pdf/Display-Advertising-Technology-Landscape-2010-05-03.pdf]

  4. Malvertising - Nytimes.com Attack visit Publisher view ad Ad Network redirect Scam web site 1% of the publishers are involved in malvertising.

  5. Challenges • Code obfuscation and ad syndication • Code analysis [Cova’10] can be evaded. • Attacks are diverse • Drive-by-download, Click-fraud and Scam. • Sandboxing [AdSafe, Finifter’10, Louw’10] is not enough. • Manipulate URL • URL pattern [Zhang’11, John’11] can be evaded. • New legitimate and malicious entities every day.

  6. Contributions – Explore Ad Topology • Malvertising infrastructure measurement • Malvertising Scale • Evading strategies • Properties of malicious parties and relationships • Topology-based detection framework • 15x more detected domain-paths than Google Safebrowsing and Forefront combined

  7. Data Collection • From June 21st to September 30th • 12 Virtual Machines with instrumented browser • Alexa top 90,000 web sitesvisited regularly • Extract ad redirection paths Node: freeonlinegames.com doubleclick.net/abc, adsloader.com/abc script referrer doubleclick.net/abc adsloader.com/abc freeonlinegames.com Easylist Path: freeonlinegames.com-> doubleclick.net/abc-> adsloader.com/abc Domain-path: freeonlinegames.com -> doubleclick.net -> adsloader.com

  8. Data Statistics • 24 million ad paths • 22 million nodes, >90% ad nodes • Scanned with Forefront and Google Safebrowsing • 543 malicious nodes, 263 domains • 938 malicious domain-paths • 286 infected publishers (Ranked from 314 to 89184) • Long-lived campaign (2 months) , short-lived domain (3 days)

  9. Campaign: Fake AV • Attack Strategy: • Set up malicious ad network • Penetrate big ad network • Multi-layers • Rotation • Cloaking 65 infected publishers (highest ranked 400) 24 malicious ad networks Adsloader.com Cloaking 16 Redirectors enginedelivery.com 84 Scam sites eafive.com

  10. Properties of Malicious Pairs • Frequency: # of publishers associated with a pair • Malicious node pairs appear less frequently • Insight: • The relationships with other entities are not stable

  11. Properties of Malicious Paths • Longer path length (8.11 > 3.59 of normal) • Ad syndication is the major problem (>60% domain-paths) • The closer to bad nodes, the more suspicious • Insight: • Exploring sequences is promising • Short subsequences are usually good enough

  12. Our Ideas for Detection • Analyze ad-delivery topology • Combine node features with ad-path • Focus on short subsequences • Use statistical learning to generate detection rules • Adapt to new, ever changing attacker strategies

  13. Detection Framework Input • Frequency (High, Low) • Role (Publisher, Ad, Unknown) • Domain Registration (Short, Long) • URL (Malicious, Normal) Node annotation Subsequence extraction 3 nodes Training data labeling Testing data Training data Likely good Known bad Unknown Rule learning Detection Statistical Learning Malicious node identification Output

  14. Results FPR (False-positive Rate) = NFP / (NFP + NTN) • 15x new findings • 10.5 days early detection • than safebrowsing FDR (False-detection Rate) = NFP / (NFP + NTP) [Likely-good-Testing] June - September [Unknown-Testing] June - September Safebrowsing & Forefront [Unknown-Testing] October

  15. New Click-Fraud: Hijack User Traffic Malware android-hk.com counter-wordpress.com PPC Ad Network miva.com getnewsearcher.com 67.201.62.48 break.com • Findings: • Do not require botnets • Use of doggy search engine • Target 2nd-tier PPC ad networks • High successful rate (72.5%)

  16. Conclusion • Malvertising is a big problem • 1% top publishers are infected • Top Ad networks are infiltrated (e.g., Doubleclick) • Topology is a new direction for detection • Short subsequences with node roles and features • 15x more coverage, 0.075% FPR, 5% FDR • Discover new attacks • Usage and deployment • Ad exchange service: capture malicious and fraudulent ad entities • Anti-virus: provide new malware signatures • End users: detect and stop ongoing malvertsing attacks

  17. Classification and Validation • Classification • Likely Scam: pop up phishing window • Likely click-fraud: malicious subsequence prior to ad nodes • Likely drive-by-download: malicious subsequence post to ad nodes • Validation • Scam: manually • Click-fraud: reach landing page? fail? • Drive-by-download: Safebrowsing, Forefront, Microsoft Anti-malware team

  18. Robustness • Evasion strategy • Modify URL pattern • Compromise old domains • It is difficult for attackers to make change on multiple parties simultaneously. • Faking ad-specific features is not easy and could cause discrepancy.

  19. Properties of Individual Malicious Nodes • Most of malicious nodes have unknown roles • >90% malicious nodes, <8% legitimate nodes • Registered within a year, expire in a year • >70% malicious domains, <20% legitimate domains • Free domain providers like .co.cc used widely • Follow URL patterns • /showthread.php\?t=\d{8} matches 34 domains

More Related