1 / 36

Crowdsourcing: Opportunity or New Threat?

Crowdsourcing: Opportunity or New Threat?. Major Area Exam: June 12 th Gang Wang Committee: Prof. Ben Y. Zhao ( Co-chair ) Prof. Heather Zheng (Co-chair ) Prof. Christopher Kruegel. Why Crowdsourcing. Software automation replaces the role of human in many areas

isabeller
Download Presentation

Crowdsourcing: Opportunity or New Threat?

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Crowdsourcing: Opportunity or New Threat? Major Area Exam: June 12th Gang Wang Committee: Prof. Ben Y. Zhao (Co-chair) Prof. Heather Zheng (Co-chair) Prof. Christopher Kruegel

  2. Why Crowdsourcing • Software automation replaces the role of human in many areas • Store and retrieve large volumes of information • Perform calculation • Human still outperform computer in many ways

  3. Searching for Jim Gray (2007) • Jim Gray, Turing Award winner • Missing with hissailboat outside San Francisco Bay, Jan 2007 • No result from searches of coastguard and private planes • Use satellite image to search for Jim Gray’s sailboat • Problem: the search cannot be automated by computer • Solution • Split the satellite image into many small images • Volunteers look for his boat in each image 100,000 tasks completed in 2 days

  4. Traffic Monitoring (2012) • Google Map traffic monitoring • Large area • Real-time update • Previous approach • Deploy sensors on the road • Expensiveequipment No Traffic Here! Traffic Jam!

  5. Newsflash: Apple also builds crowdsourced map system in iOS6 this fall User-driven traffic monitoring • 200 million Google Map users (mobile) 1 • Report location while driving • Integrate the traffic map in real-time 1. http://techcrunch.com/2011/05/25/google-maps-for-mobile-stats

  6. Crowdsourcing: a process that enlists a crowd to do micro-work to solve problems that software cannot do Solution Tasks/Problems Effective when a bigproblem can be decomposed into small tasks that are easy for individuals to solve

  7. Crowdsourcing Workflow • Requester • Submit tasks, integrate results • Platform • Manage tasks and workers • E.g. Amazon Mechanical Turk • Workers • Work on tasks, return results • Large number of human users Requester Collect Results Submit Tasks Platform Distribute Tasks Return Results Workers

  8. State of the Art • Popular crowdsourcing services • Amazon Mechanical Turk, FreeLancer • Tasks: translation, transcription, product survey, etc. • Other success stories • Wikipedia • Protein folding Folded Unfolded • However, there are problems

  9. Misuse of Crowdsourcing • Difficult to detect • High quality spam • Deceptive product reviews • Realistic fake accounts • Emerging threat to online communities “Over 40% of New Mechanical Turk Jobs Involve Spam” “In a Race to Out-Rave, 5-Star Web Reviews Go for $5” “Dairy Giant Mengniu in Smear Scandal” “Hacked Emails Reveal Russian Astroturfing Program”

  10. Crowdsourcing Research • Economics • Social Science • System/Applications • Computer Science • Business models • Market survey • Labor economics • Social, cultural, and ethical issues Fake Review Sybils HCI NLP • Security Social Spam SEO IR DB

  11. Outline • Introduction • Overview of Crowdsourcing Applications • Research Challenges • Security and Crowdsourcing

  12. Crowdsourcing Applications Categorize applications based on human intelligence • Natural language processing (NLP) • Data labeling [Snow2008] [Callison-Burch2009] • Searching results validation [Alonso2008] • Database query: CrowdDB [Franklin2011], Qurk [Marcus2011] • Image processing • Image annotation [Ahn2004] [Chen2009] • Image search [Yan2010] • Content generation/knowledge sharing • Wikipedia, Quora, Yahoo! Answers, StackOverflow • Real-time Q&A: Vizwiz [Bigham2010], Mimir [Hsieh2009] • Human sensor • Google Map traffic monitoring • Twitter earthquake report [Sakaki2010]

  13. Crowdsourcing Applications Categorize applications based on human intelligence • Natural language processing (NLP) • Data labeling[Snow2008] [Callison-Burch2009] • Searching results validation [Alonso2008] • Database query: CrowdDB [Franklin2011], Qurk [Marcus2011] • Image processing • Image annotation [Ahn2004] [Chen2009] • Image search [Yan2010] • Content generation/knowledge sharing • Wikipedia, Quora, Yahoo! Answers, StackOverflow • Real-time Q&A: Vizwiz [Bigham2010], Mimir [Hsieh2009] • Human sensor • Google Map traffic monitoring • Twitter earthquake report [Sakaki2010]

  14. Data Annotation • Natural Language Processing (NLP) problems • Evaluating machine translation quality [Callison-Burch2009] • Labeling text content (e.g. emotions) [Snow2008] • Challenges • Difficult for software automation • Experts are expensive and slow • Benefits of using crowdsourcing • Non-experts are cheap and fast • Non-expert results (processed) are as good as experts

  15. Use human intelligence to improve automation Image Search CrowdSearch [Yan2010] • Accurate image searching for mobile devices by combining • Automated image searching • Human validation of searching results via crowdsourcing Automated Image Search Crowd Validation Result Candidate Images Query Image Only 25% accuracy Accuracy > 95%

  16. Question and Answer VizWiz [Bigham2010] • Help blind people • Answer questions • Near real-time • Example • Shoppingscenario “Take a photo” “Record the question” “Right side” • Use the crowd to help people in need • Replace an “expensive” personal assistant “Which item is corn?” Server

  17. Outline • Introduction • Overview of Crowdsourcing Applications • Research Challenges • Security and Crowdsourcing

  18. Challenges in Crowdsourcing • Quality control • High diversity in worker background and expertise • Incentives • Encourage participation • Improve work quality • Task management • Perform complex/real-time tasks • Coordinate workers and requesters • Security • Spammy/cheating workers, fraud requesters • Using crowdsourcing systems for malicious attacks

  19. Quality Control • Fundamental problem: the crowd is not reliable • [Oleson2011] [Snow2008] [Callison-Burch2009] [Yan2010] [Franklin2011] • Workers make mistakes • Workers spam the system • Existing strategies • Majority voting • Pre-screening to test workers • Statistic models to clear data bias Yes Ground Truth Yes Yes Screening Test No

  20. Incentives • Basic questions: how to set the right price of the tasks? • Can you improve work quality by raising payment? • Can you attract more workers by raising payment? • Empirical study on worker incentives [Mason2010] [Hsieh2010] • High payment helps to recruit workers faster and increase participation • Money does not improve quality • Punishment/bonus based quality control • Pay the minimum $0.01 for all workers and $0.01 for bonus • Common problem for all applications

  21. Task Management Example: use bubble sort algorithm to sort pictures Example: Writing a travel book for New York City Task2 Task1 … • Crowdsource complex tasks [Kittur2011] • Partition the complex tasks • Parallel execute each work flow • Integrate results • Implement algorithms on the crowd [Little2010] • Regard the crowd as computation unit • Design/organize the tasks in a way to run algorithms • Open problems • Real-time crowdsourcing, parallel tasks execution, synchronization Task2 Task1 Brief History Attractions … … … Partition (outline) Paragraph Task3 Map (gather facts) Which one is better? Reduce (collect text)

  22. Security Challenges • Attacks inside crowdsourcing systems • Spammy workers give random/bad answers • Dishonest requesters • Using crowdsourcing system to carry out malicious campaigns • Real-user can perform all kinds of malicious tasks • Crowdsourcing makes it possible to scale • Write fake reviews • Create fake accounts (Sybils) • Generate social network spam • Solve CAPTCHA • Give biased voting • Build back links (SEO)

  23. Outline • Introduction • Overview of Crowdsourcing Applications • Research Challenges • Security and Crowdsourcing • Malicious Crowdsourcing Systems • Fake Reviews Generation by the Crowd • Detecting Sybils in Online Social Networks

  24. Threat Map Spam Fake Reviews Greyhat SEO Sybils OSNs Online Review Search Engine

  25. Dark Side of Crowdsourcing • Crowdsourcing • Large number of workers • Easy, cheap, fast • Real users can do bad jobs 5 star rating and positive review Use different IP addresses Bypass existing spam filter

  26. Measuring Malicious Crowdsourcing • Crowdsourcing malicious tasks • Generate Spam, solve CAPTCHA, create fake accounts, Greyhat SEO • Scale and economics • Zhubajie (China): malicious jobs 10K/month, with $1M/month [Wang2012b] • FreeLancer (US): malicious jobs 140K/7 years [Motoyama2011] • Emerging threat • International work force • Growing exponentially Buyers Workers Figure from [Motoyama2011]

  27. Outline • Introduction • Overview of Crowdsourcing Applications • Research Challenges • Security and Crowdsourcing • Malicious Crowdsourcing Systems • Fake Reviews Generation by the Crowd • Detecting Sybils in Online Social Networks

  28. Online Reviews: Why Important? • 80%of people will check online reviews before purchasing products/travel online.1 • Independent restaurants: a one-star increase in Yelp rating leads to a 9% increase in revenue.2 1http://www.coneinc.com/negative-reviews-online-reverse-purchase-decisions 2 Michael Luca. Reviews, reputation, and revenue: The Case of Yelp.com. Harvard Business School Working Paper, 2011

  29. Detecting Fake Reviews • Detecting review spam [Jindal 2008] • Duplicated/Near-duplicated reviews • Detecting review spammers • Classify rating/review behaviors [Lim 2010] • Detect synchronized reviews in groups [Mukherjee2012] • Deception models [Ott2011] • Content classification using trained data • Psycholinguistic deception detection Human accuracy 60% Classifier accuracy 90% Lower bound of spam reviews

  30. Challenges to Detect Fake Reviews • Current review spam detection solution is limited • Assume a few attackers control many accounts • Crowdsourcing can break these assumptions • Deception model (NLP approach) has limitations • Domain specific • Content analysis still has high false positive (10%) • Detecting fake reviews is an open problem • Low false positive • Real-time

  31. Outline • Introduction • Overview of Crowdsourcing Applications • Research Challenges • Security and Crowdsourcing • Malicious Crowdsourcing Systems • Fake Reviews Generation by the Crowd • Detecting Sybils in Online Social Networks

  32. Social Network Sybils • Sybils in Online Social Networks (OSNs) • Cheating in social games • Spreading spam/malware • [Thomas2011], [Gao2010], [Nazir2010] • Challenges to detect Sybils in the wild • Various/adaptive Sybil behavior patterns/attack strategies • Increasingly sophisticated/realistic Sybil account profiles • Automated mechanisms losing effectiveness • Use crowdsourcing for Sybil detection

  33. Crowdsourced Sybil Detection • Basic idea: build a crowdsourced Sybil detector • Resilient to changing attacker strategies • Question: can human identify Sybil profiles?(answer: user study) • Ground truth datasets of full user profiles • 200 real + 180 fake accounts (Renren, Facebook, Facebook-India) • Segmented user groups • Renren users (Chinese), Facebook (US), Facebook (Indian) • Experts (conscientious, motivated), Turkers (paid per profile, $-driven) • High level results • Experts are accurate; both experts and turkers have near-zero false positives • Quality control can improve turker accuracy ~ experts • Accurate, scalable, cost-effective

  34. Conclusion • An alternative solution to various problems • Difficult to be automated by software • Can be decomposed into small tasks • Many challenges in crowdsourcing system • Quality control against unreliable workers • Task management for complex tasks • Incentive models to reduce cost and optimize performance • Malicious crowdsourcing and related attacks • Serious threat to existing security mechanisms • Measurement study to understand the problem • Defense is still an open problem

  35. Possible Research Areas • Defend against malicious crowdsourcing systems • Attacking malicious crowdsourcing systems • Detecting crowdsourcing campaigns in real-time • Spot fake reviews/reviewers • Resilient to changing behaviors • Real-time • Scalability • Using crowdsourcing to solve security problems • Crowdsourcing to detect social Sybils

  36. Thank you! Questions?

More Related