Knowing Your Enemy: Understanding and Detecting Malicious Web Advertising

Knowing Your Enemy: Understanding and DetectingMalicious Web Advertising Presenter: Zhou Li (Indiana University) Kehuan Zhang (Indiana University), YinglianXie(MSR), Fang Yu (MSR), XiaoFeng Wang (Indiana University) 10/18/2012

Ad Ad Ad Ad Ad

Landscape Ad Exchange • Online advertising, a billion-dollar business Ad Network [http://www.adexchanger.com/pdf/Display-Advertising-Technology-Landscape-2010-05-03.pdf]

Malvertising - Nytimes.com Attack visit Publisher view ad Ad Network redirect Scam web site 1% of the publishers are involved in malvertising.

Challenges • Code obfuscation and ad syndication • Code analysis [Cova’10] can be evaded. • Attacks are diverse • Drive-by-download, Click-fraud and Scam. • Sandboxing [AdSafe, Finifter’10, Louw’10] is not enough. • Manipulate URL • URL pattern [Zhang’11, John’11] can be evaded. • New legitimate and malicious entities every day.

Contributions – Explore Ad Topology • Malvertising infrastructure measurement • Malvertising Scale • Evading strategies • Properties of malicious parties and relationships • Topology-based detection framework • 15x more detected domain-paths than Google Safebrowsing and Forefront combined

Data Collection • From June 21st to September 30th • 12 Virtual Machines with instrumented browser • Alexa top 90,000 web sitesvisited regularly • Extract ad redirection paths Node: freeonlinegames.com doubleclick.net/abc, adsloader.com/abc script referrer doubleclick.net/abc adsloader.com/abc freeonlinegames.com Easylist Path: freeonlinegames.com-> doubleclick.net/abc-> adsloader.com/abc Domain-path: freeonlinegames.com -> doubleclick.net -> adsloader.com

Data Statistics • 24 million ad paths • 22 million nodes, >90% ad nodes • Scanned with Forefront and Google Safebrowsing • 543 malicious nodes, 263 domains • 938 malicious domain-paths • 286 infected publishers (Ranked from 314 to 89184) • Long-lived campaign (2 months) , short-lived domain (3 days)

Campaign: Fake AV • Attack Strategy: • Set up malicious ad network • Penetrate big ad network • Multi-layers • Rotation • Cloaking 65 infected publishers (highest ranked 400) 24 malicious ad networks Adsloader.com Cloaking 16 Redirectors enginedelivery.com 84 Scam sites eafive.com

Properties of Malicious Pairs • Frequency: # of publishers associated with a pair • Malicious node pairs appear less frequently • Insight: • The relationships with other entities are not stable

Properties of Malicious Paths • Longer path length (8.11 > 3.59 of normal) • Ad syndication is the major problem (>60% domain-paths) • The closer to bad nodes, the more suspicious • Insight: • Exploring sequences is promising • Short subsequences are usually good enough

Our Ideas for Detection • Analyze ad-delivery topology • Combine node features with ad-path • Focus on short subsequences • Use statistical learning to generate detection rules • Adapt to new, ever changing attacker strategies

Detection Framework Input • Frequency (High, Low) • Role (Publisher, Ad, Unknown) • Domain Registration (Short, Long) • URL (Malicious, Normal) Node annotation Subsequence extraction 3 nodes Training data labeling Testing data Training data Likely good Known bad Unknown Rule learning Detection Statistical Learning Malicious node identification Output

Results FPR (False-positive Rate) = NFP / (NFP + NTN) • 15x new findings • 10.5 days early detection • than safebrowsing FDR (False-detection Rate) = NFP / (NFP + NTP) [Likely-good-Testing] June - September [Unknown-Testing] June - September Safebrowsing & Forefront [Unknown-Testing] October

New Click-Fraud: Hijack User Traffic Malware android-hk.com counter-wordpress.com PPC Ad Network miva.com getnewsearcher.com 67.201.62.48 break.com • Findings: • Do not require botnets • Use of doggy search engine • Target 2nd-tier PPC ad networks • High successful rate (72.5%)

Conclusion • Malvertising is a big problem • 1% top publishers are infected • Top Ad networks are infiltrated (e.g., Doubleclick) • Topology is a new direction for detection • Short subsequences with node roles and features • 15x more coverage, 0.075% FPR, 5% FDR • Discover new attacks • Usage and deployment • Ad exchange service: capture malicious and fraudulent ad entities • Anti-virus: provide new malware signatures • End users: detect and stop ongoing malvertsing attacks

Classification and Validation • Classification • Likely Scam: pop up phishing window • Likely click-fraud: malicious subsequence prior to ad nodes • Likely drive-by-download: malicious subsequence post to ad nodes • Validation • Scam: manually • Click-fraud: reach landing page? fail? • Drive-by-download: Safebrowsing, Forefront, Microsoft Anti-malware team

Robustness • Evasion strategy • Modify URL pattern • Compromise old domains • It is difficult for attackers to make change on multiple parties simultaneously. • Faking ad-specific features is not easy and could cause discrepancy.

Properties of Individual Malicious Nodes • Most of malicious nodes have unknown roles • >90% malicious nodes, <8% legitimate nodes • Registered within a year, expire in a year • >70% malicious domains, <20% legitimate domains • Free domain providers like .co.cc used widely • Follow URL patterns • /showthread.php\?t=\d{8} matches 34 domains

Knowing Your Enemy: Understanding and Detecting Malicious Web Advertising

Knowing Your Enemy: Understanding and Detecting Malicious Web Advertising

Presentation Transcript

Commerce Project

Chapter 5

Understanding Cancer

Know Your Enemy

SIMBRIG - Simple Brigade Model

How Media Works: Advertising and the Purchase Funnel

Most criticisms of advertising focus on the deceptive aspects of modern advertising. Nevertheless, even if advertising

Computer Forensics Use of Malicious Input

IPB

Leadership: Enemy of the People? Keith Grint

Slide 19-1

Seminar 228.443: Advertising

Advertising

Genre

XML and RSS

Vehicle Strand Advertising and Branding

Security in Computing Chapter 3, Program Security

Wildfire Prevention - Invasives as the Enemy

Search Engines and Web Advertising

IPB

AN/APR-39A(V)1 RADAR SIGNAL DETECTING SET

Credit: Your Best Friend or Your Worst Enemy?