1 / 41

Fighting Fire With Fire : Crowdsourcing Security Threats and Solutions on the Social Web

This research paper discusses the growing problem of security threats on the social web, particularly social spam and Sybil attacks. The paper explores the ineffectiveness of existing countermeasures and proposes a crowdsourced Sybil detection system as a potential solution. The paper also includes a user study and analysis of the accuracy and cost-effectiveness of crowdsourced detection.

pamelat
Download Presentation

Fighting Fire With Fire : Crowdsourcing Security Threats and Solutions on the Social Web

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fighting Fire With Fire:Crowdsourcing Security Threats and Solutions on the Social Web gangw@cs.ucsb.edu Gang Wang, Christo Wilson, Manish Mohanlal, Ben Y. Zhao Computer Science Department, UC Santa Barbara.

  2. A Little Bit About Me • 3nd Year PhD @ UCSB • Intern at MSR Redmond 2011 • Intern at LinkedIn (Security Team) 2012 Research Interests: • Security and Privacy • Online Social Networks • Crowdsourcing Data Driven Analysis and Modling

  3. Recap: Threats on the Social Web • Social spam is a serious problem • 10% of wall posts with URLs on Facebook are spam • 70% phishing • Sybils underlie many attacks on Online Social Networks • Spam, spear phishing, malware distribution • Sybils blend completely into the social graph • Existing countermeasures are ineffective • Blacklists only catch 28% of spam • Sybil detectors from the literature do not work

  4. Sybil Accounts on Facebook • In-house estimates • Early 2012: 54 million • August 2012: 83 million • 8.7% of the user base • Fake likes • VirtualBagel: useless site, 3,000 likes in 1 week • 75% from Cairo, age 13-17 • Sybilsattacks in large scale • Advertisers are fleeing Facebook

  5. Sybil Accounts on Twitter • 92% of Newt Gingritch’s followers are Sybils • Russian political protests on Twitter • 25,000 Sybils sent 440,000 tweets • 1 million Sybils controlled overall Followers • Twitter is vital infrastructure • Sybils usurping Twitter for political ends 4,000 new followers/day 100,000 new followers in 1 day

  6. Talk Outline • Malicious crowdsourcing sites – crowdturfing[WWW’12] • Spam and Sybils generated by real people • Huge threat in China • Growing threat in the US • Crowdsourced Sybil detection [NDSS’13] • If attackers can do it, why not defenders? • Can humans detect Sybils? • Is this cost effective? • Design a crowdsourced Sybil detection system User Study

  7. Outline • Intro • Crowdturfing • Crowdsourcing Overview • What is Crowdturfing • How bad is it? • Crowdturfing in the US • Crowdsourced Sybil Detection • Conclusion

  8. High Quality Sybils and Spam FAKE • We tend to think of spam as “low quality” • What about high quality spam and Sybils? • Open questions • What is the scope of this problem? • Generated manually or mechanically? • What are the economics? Gang Wang MaxGentlemanis the bestest male enhancement system avalable. http://cid-ce6ec5.space.live.com/ Stock Photographs

  9. Black Market Crowdsourcing • Amazon’s Mechanical Turk • Admins remove spammy jobs • Black market crowdsourcing websites • Spam and fake accounts, generated by real people • Major force in China, expanding in the US and India Crowdturfing=Crowdsourcing + Astroturfing

  10. Crowdturfing Workflow Campaign • Customers • Initiate campaigns • May be legitimate businesses • Agents • Manage campaign and workers • Verify completed tasks • Workers • Complete tasks for money • Control Sybils on other websites Tasks Reports

  11. Crowdturfing in China Zhubajie $ $ Sandaha Campaigns Campaigns Jan. 08 Jan. 09 Jan. 10 Jan. 11

  12. Spreading Spam on Weibo • Campaigns reach huge audiences • How effective are these campaigns? 50% of campaigns reach >100000 users 8% reach >1 million users

  13. How Effective is Crowdturfing? • Initiate our own campaigns as a customer • 4 benign ad campaigns promoting real e-commerce sites • All clicks route through our measurement server Web Display Ads CPC = $0.01 • Travel agency reported sales statistics • 2 sales/month before our campaign • 11 sales within 24 hours after our campaign • Each trip sells for $1500!

  14. Crowdturfing in America • Other studies support these findings • Freelancer • 28% spam jobs • Bulk OSN accounts, likes, spam • Connections to botnet operators • Poultry Markets • $20 for 1000 followers • Ponzi scheme

  15. Takeaways • Identified a new threat: Crowdturfing • Growing exponentially in size and revenue in China • $1 million per month on just one site • Cost effective: $0.21 per click • Starting to grow in US and other countries • Mechanical Turk, Freelancer • Twitter Follower Markets • Huge problem for existing security systems • Little to no automation to detect • Turing tests fail

  16. Outline • Intro • Crowdturfing • Crowdsourced Sybil Detection • Open Questions • User Study • Accuracy Analysis • System Design • Conclusion

  17. Crowdsourcing Sybil Defense • Defenders are losing the battle against OSN Sybils • Idea: build a crowdsourced Sybil detector • Leverage human intelligence • Scalable • Open Questions • How accurate are users? • What factors affect detection accuracy? • Is crowdsourced Sybil detection cost effective?

  18. User Study • Two groups of users • Experts – CS professors, masters, and PhD students • Turkers – crowdworkers from Mechanical Turk and Zhubajie • Three ground-truth datasets of full user profiles • Renren – given to us by Renren Inc. • Facebook US and India • Crawled • Legitimate profiles – 2-hops from our own profiles • Suspicious profiles – stock profile images • Banned suspicious profiles = Sybils Crowdturfing Site Stock Picture

  19. Progress Classifying Profiles Real or fake? Browsing Profiles Why? Navigation Buttons Screenshot of Profile (Links Cannot be Clicked) Testers may skip around and revisit profiles

  20. Experiment Overview Crawled Data More Profiles per Experts Data from Renren Fewer Experts

  21. Individual Tester Accuracy Not so good :( • Experts prove that humans can be accurate • Turkers need extra help… Awesome! 80% of experts have >90% accuracy!

  22. Accuracy of the Crowd Almost Zero False Positives Experts Perform Okay Turkers Miss Lots of Sybils Treat each classification by each tester as a vote Majority makes final decision • False positive rates are excellent • Turkers need extra help against false negatives • What can be done to improve accuracy?

  23. Eliminating Inaccurate Turkers Dramatic Improvement Most workers are >40% accurate • Only a subset of workers are removed (<50%) • Getting rid of inaccurate turkers is a no-brainer From 60% to 10% False Negatives

  24. How Many Classifications Do You Need? False Negatives China • Only need a 4-5 classifications to converge • Few classifications = less cost India False Positives US

  25. How to turn our results into a system? • Scalability • OSNs with millions of users • Performance • Improve turkeraccuracy • Reduce costs • Preserve user privacywhen giving data to turkers

  26. System Architecture Filter Out Inaccurate Turkers Maximize Usefulness of High Accuracy Turkers Crowdsourcing Layer Rejected! OSN employee Very Accurate Turkers Turker Selection Accurate Turkers Sybils • Leverage Existing Techniques • Help the System Scale All Turkers • Continuous Quality Control • Locate Malicious Workers Heuristics ? Social Network User Reports Suspicious Profiles Filtering Layer

  27. Trace Driven Simulations • Simulate 2000 profiles • Error rates drawn from survey data • Vary 4 parameters Classifications 2 Very Accurate Turkers Results • Average 6 classifications per profile • <1% false positives • <1% false negatives Results++ • Average 8 classifications per profile • <0.1% false positives • <0.1% false negatives 20-50% Controversial Range Accurate Turkers 90% Threshold Classifications 5

  28. Estimating Cost • Estimated cost in a real-world social networks: Tuenti • 12,000 profiles to verify daily • 14 full-time employees • Annual salary 30,000 EUR (~$20 per hour) $2240 per day • Crowdsourced Sybil Detection • 20sec/profile, 8 hour day 50 turkers • Facebook wage ($1 per hour) $400 per day • Cost with malicious turkers • Estimate that 25% of turkers are malicious • 63 turkers • $1 per hour $504 per day

  29. Takeaways • Humans can differentiate between real and fake profiles • Crowdsourced Sybil detection is feasible • Designed a crowdsourced Sybil detection system • False positives and negatives <1% • Resistant to infiltration by malicious workers • Sensitive to user privacy • Low cost • Augments existing security systems

  30. Outline • Intro • Crowdturfing • Crowdsourced Sybil Detection • Conclusion • Summary of My Work • Future Work

  31. Key Contributions • Identified novel threat: crowdturfing • End-to-end spam measurements from customers to the web • Insider knowledge of social spam • Novel defense: crowdsourced Sybil detection • User study proves feasibility of this approach • Build an accurate, scalable system • Possible deployment in real OSNs – LinkedIn and RenRen

  32. Ongoing Works • Twitter follower markets • Locate customers who purchase bulk of Twitter followers • Study the un-follow dynamics of customers • Develop systems to detect customers in the wild • Sybil detection using server-side click streams • Build click models based on clickstream logs • Extract click patterns of Sybil and normal users • Develop systems to detect Sybil

  33. Questions? Thank you!

  34. Potential Project Ideas • Malware distribution in cellular networks • Identify malware related cellular network traffic • Coordinated malware distribution campaigns • Feature based detection • Advertising traffic analysis on mobile Apps • Characterize ads traffic • How effective for app-displayed ads to get click-through? • Are there malware delivered through ads?

  35. Preserving User Privacy Showing profiles to crowdworkers raises privacy issues Solution: reveal profile information in context Crowdsourced Evaluation ! ! Public Profile Information Friend-Only Profile Information Crowdsourced Evaluation Friends

  36. Clickstream Sybil Detection Sybil Clickstream • Clickstream detection of Sybils • Absolute number of clicks • Time between clicks • Page traversal order • Challenges • Real-time • Massive scalability • Low-overhead 96% 86% 87% Friend Invite Message Friend Invite 5% 15% 2% 5% 31% 14% 4% 9% 20% 55% Final Initial Share 3% 22% Final Initial Photo 9% 43% 27% 64% 93% 10% Browse Profiles 9% 29% 14% Share 68% Browse Profiles 21% 56% 56% Normal Clickstream

  37. Are Workers Real People? Late Night/Early Morning Work Day/Evening Lunch Dinner ZBJ SDH

  38. Crowdsourced Sybil Detection • How to detect crowdturfed Sybils? • Blur the line between real and fake • Difficult to detect algorithmically • Anecdotal evidence that people can spot Sybils • 75% of friend requests from Sybils are rejected • Can people distinguish in real/fake general? • User studies: experts, turkers, undergrads • What features give Sybils away? • Are certain Sybils tougher than others? • Integration of human and machine intelligence

  39. Survey Fatigue US Experts US Turkers All testers speed up over time No fatigue Fatigue matters

  40. Sybil Profile Difficulty Experts perform well on most difficult Sybils • Some Sybils are more stealthy • Experts catch more tough Sybils than turkers Really difficult profiles

More Related