210 likes | 486 Views
Knowing Your Enemy: Understanding and Detecting Malicious Web Advertising. Presenter: Zhou Li (Indiana University) Kehuan Zhang (Indiana University ), Yinglian Xie (MSR), Fang Yu (MSR), XiaoFeng Wang (Indiana University ). 10/18/2012. Ad. Ad. Ad. Ad. Ad. Landscape. Ad Exchange.
E N D
Knowing Your Enemy: Understanding and DetectingMalicious Web Advertising Presenter: Zhou Li (Indiana University) Kehuan Zhang (Indiana University), YinglianXie(MSR), Fang Yu (MSR), XiaoFeng Wang (Indiana University) 10/18/2012
Ad Ad Ad Ad Ad
Landscape Ad Exchange • Online advertising, a billion-dollar business Ad Network [http://www.adexchanger.com/pdf/Display-Advertising-Technology-Landscape-2010-05-03.pdf]
Malvertising - Nytimes.com Attack visit Publisher view ad Ad Network redirect Scam web site 1% of the publishers are involved in malvertising.
Challenges • Code obfuscation and ad syndication • Code analysis [Cova’10] can be evaded. • Attacks are diverse • Drive-by-download, Click-fraud and Scam. • Sandboxing [AdSafe, Finifter’10, Louw’10] is not enough. • Manipulate URL • URL pattern [Zhang’11, John’11] can be evaded. • New legitimate and malicious entities every day.
Contributions – Explore Ad Topology • Malvertising infrastructure measurement • Malvertising Scale • Evading strategies • Properties of malicious parties and relationships • Topology-based detection framework • 15x more detected domain-paths than Google Safebrowsing and Forefront combined
Data Collection • From June 21st to September 30th • 12 Virtual Machines with instrumented browser • Alexa top 90,000 web sitesvisited regularly • Extract ad redirection paths Node: freeonlinegames.com doubleclick.net/abc, adsloader.com/abc script referrer doubleclick.net/abc adsloader.com/abc freeonlinegames.com Easylist Path: freeonlinegames.com-> doubleclick.net/abc-> adsloader.com/abc Domain-path: freeonlinegames.com -> doubleclick.net -> adsloader.com
Data Statistics • 24 million ad paths • 22 million nodes, >90% ad nodes • Scanned with Forefront and Google Safebrowsing • 543 malicious nodes, 263 domains • 938 malicious domain-paths • 286 infected publishers (Ranked from 314 to 89184) • Long-lived campaign (2 months) , short-lived domain (3 days)
Campaign: Fake AV • Attack Strategy: • Set up malicious ad network • Penetrate big ad network • Multi-layers • Rotation • Cloaking 65 infected publishers (highest ranked 400) 24 malicious ad networks Adsloader.com Cloaking 16 Redirectors enginedelivery.com 84 Scam sites eafive.com
Properties of Malicious Pairs • Frequency: # of publishers associated with a pair • Malicious node pairs appear less frequently • Insight: • The relationships with other entities are not stable
Properties of Malicious Paths • Longer path length (8.11 > 3.59 of normal) • Ad syndication is the major problem (>60% domain-paths) • The closer to bad nodes, the more suspicious • Insight: • Exploring sequences is promising • Short subsequences are usually good enough
Our Ideas for Detection • Analyze ad-delivery topology • Combine node features with ad-path • Focus on short subsequences • Use statistical learning to generate detection rules • Adapt to new, ever changing attacker strategies
Detection Framework Input • Frequency (High, Low) • Role (Publisher, Ad, Unknown) • Domain Registration (Short, Long) • URL (Malicious, Normal) Node annotation Subsequence extraction 3 nodes Training data labeling Testing data Training data Likely good Known bad Unknown Rule learning Detection Statistical Learning Malicious node identification Output
Results FPR (False-positive Rate) = NFP / (NFP + NTN) • 15x new findings • 10.5 days early detection • than safebrowsing FDR (False-detection Rate) = NFP / (NFP + NTP) [Likely-good-Testing] June - September [Unknown-Testing] June - September Safebrowsing & Forefront [Unknown-Testing] October
New Click-Fraud: Hijack User Traffic Malware android-hk.com counter-wordpress.com PPC Ad Network miva.com getnewsearcher.com 67.201.62.48 break.com • Findings: • Do not require botnets • Use of doggy search engine • Target 2nd-tier PPC ad networks • High successful rate (72.5%)
Conclusion • Malvertising is a big problem • 1% top publishers are infected • Top Ad networks are infiltrated (e.g., Doubleclick) • Topology is a new direction for detection • Short subsequences with node roles and features • 15x more coverage, 0.075% FPR, 5% FDR • Discover new attacks • Usage and deployment • Ad exchange service: capture malicious and fraudulent ad entities • Anti-virus: provide new malware signatures • End users: detect and stop ongoing malvertsing attacks
Classification and Validation • Classification • Likely Scam: pop up phishing window • Likely click-fraud: malicious subsequence prior to ad nodes • Likely drive-by-download: malicious subsequence post to ad nodes • Validation • Scam: manually • Click-fraud: reach landing page? fail? • Drive-by-download: Safebrowsing, Forefront, Microsoft Anti-malware team
Robustness • Evasion strategy • Modify URL pattern • Compromise old domains • It is difficult for attackers to make change on multiple parties simultaneously. • Faking ad-specific features is not easy and could cause discrepancy.
Properties of Individual Malicious Nodes • Most of malicious nodes have unknown roles • >90% malicious nodes, <8% legitimate nodes • Registered within a year, expire in a year • >70% malicious domains, <20% legitimate domains • Free domain providers like .co.cc used widely • Follow URL patterns • /showthread.php\?t=\d{8} matches 34 domains