200 likes | 328 Views
Detecting Fraudulent Clicks From BotNets 2.0. Adam Barth Joint work with Dan Boneh, Andrew Bortz, Collin Jackson, John Mitchell, Weidong Shao, and Elizabeth Stinson. BotNets, Current and Future. Browser Security Model. Same-origin policy for network access Origin is scheme://host:port
E N D
Detecting Fraudulent Clicks From BotNets 2.0 Adam Barth Joint work with Dan Boneh, Andrew Bortz, Collin Jackson, John Mitchell, Weidong Shao, and Elizabeth Stinson
Browser Security Model • Same-origin policy for network access • Origin is scheme://host:port • Write HTTP anywhere on the network • Easy using HTML forms • Except restricted ports, like 25 (SMTP) • Read from origin only • Can read some “library” formats from anywhere • JavaScript, CSS, Images, Applets, etc
Desired Properties of Policy • Can’t send spam • Writes to port 25 blocked • Can’t click advertisements • Need to READ a token to make a click count • Unfortunately…
<policy-file-request/> <allow-access-from domain="*" to-ports="*" /> rebind DNS DNS Rebinding Attacks • Circumvent browser network access policy • attacker.com points to attacker and target • Can read and write sockets to anywhere attacker’s server targetserver
An Experiment • We ran a Flash ad (gains socket access) • Paid $30 • 50,951 impressions from 44,924 unique IP addresses • 90.6% of browser vulnerable • More if we include other rebinding attacks • $100 to hijack 100,000 IP addresses • No click required • Impressions are cheap
A Long Tail • Some impressions last for days
Using Rebinding for Click-Fraud • Enroll as a publisher with ad network A • Publish pay-per-click ads on your site • Enroll as a advertiser with ad network B • Buy pay-per-impression Flash ads • Buy bots for $0.001 each • Use 99% just to generate impressions on your site • Use 1% to generate ad clicks on $0.50/per-click ads • Multiply your money by 5, repeat
Implications for Click-Fraud Defense • Simulates IP distribution exactly • Each bot an independent sample from web visitors • Black-listing IPs as bot infested meaningless • Traffic time-appropriate for IP • Human at that IP actually surfing the web right now • HTTP headers appropriate for IP • Grab real headers from request for Flash ad • Can’t get cookies, but many networks don’t use them
Distinguish Bots from Humans • Bots cannot simulate human cognition • Can’t use traditional CAPTCHAs • Too disruptive to the user experience • User has not interest in proving their humanity • Click-fraud detection a different problem • CAPTCHAs determine if this client a human • We just need estimate the proportion of humans
A Straw-Man Design • Humans click “Yes!” • Bots click at random • Ad network stats: • 3487 Yes clicks • 1271 No clicks • How many bots? • Expectation: 2542 • High probably bound an exercise for the reader
A Real Advertisement • Where will humans click? • Bots cannot simulate • Can’t trick humansinto clicking • Actually need process ad
Image Recognition Doesn’t Help • Suppose the bot can identify the hot spots • Say by segmenting the image using vision techniques • In what ratio should the bot click? • Depends on the relative appeal of the hot spots • Requires human-level AI to get right • Any error a signal of bot proportion
Ad Network can Measure Humans • At first, run ads on trusted partners • Record distribution of human click location • Easy to record (x, y) coordinates of click on web • Cheap for ad network • Was going to run ad anyway • Expensive for attacker to influence • Must use valuable bot clicks without payout • Must be clicking everywhere all the time
A Work in Progress • Need to validate diversity in distribution • Will run real ads and measure click location • How does distribution vary by screen location of ad? • Experiment with ad design • Objective: human click location hard for bot to predict • Text ads? • Less area to click and less enticing visuals • There still might be a valuable signal in click location
Conclusions • BotNets 2.0 are coming • Cheap, large-scale, ephemeral bots in the browser • Don’t require full-machine compromise • Heuristic click-fraud detection’s days are numbered • Click location can divide humans from bots • Accurate simulation requires human cognition • Easy for ad networks to deploy • More science needed to determine effectiveness