Crowdsourcing and its applications on Scientific Research

Crowdsourcing and its applications on Scientific Research

Crowdsourcing = Crowd + Outsourcing • “soliciting solutions via open calls to large-scale communities”

Some Examples • Call for professional helps • Award 50,000 to 1,000,000 for each tasks • Office work platform • Microtask platform • Over 30,000 tasks at the same time

What Tasks are crowdsourceable?

Software Development • Reward: 25,000 USD

Data Entry • Reward: 4.4 USD/hour

Image Tagging • Reward: 0.04 USD

Trip Advice • Reward: points on Yahoo! Answers

The impact of crowdsourcingon scientific research?

Amazon Mechanical Turk • A micro-task marketplace • Task prices are usually between 0.01 to 1 USD • Easy-to-use interface

Amazon Mechanical Turk • Human Intelligence Task (HIT) • Tasks hard for computers • Developer • Prepay the money • Publish HITs • Get results • Worker • Complete the HITs • Get paid

Who are the workers?

A Survey of Mechanical Turk • Survey on 1000 Turkers (Turk workers) • Two identical surveys (Oct. 2008 and Dec. 2008) • Consistent results • Blog post: • A Computer Scientist in a Business School

Age Gender Education Annual Income

Compare with Internet Demographics • Use the data from ComScore • In summary, Tukers are • younger • Portion of 21-35 years old: 51% vs. 22% in internet • mainly female • 70% female vs. 50 % female • having lower income • 65% turkers with income < 60k/year vs. 45% in internet • having smaller family • 55% turkers have no children vs. 40% in internet

How Much Turkers Earn?

Why Turkers Turk?

Research Applications

Dataset Collection • Dataset is important in computer science! • In multimedia analysis • Is there X in the image • Where is Y in the image • In natural language processing • What is the emotion of this sentence • And in lots of other applications

Dataset Collection • Utility Annotation • By Sorokin and Forsyth at UIUC • Image analysis • Type keyword • Select examples • Click on landmarks • Outline figures

0.01 USD/ task

0.02 USD/ task

0.01 USD/ task

Dataset Collection • Linguistic annotations (Snow et al. 2008) • Word similarity USD 0.2 to label 30 word pairs

Dataset Collection • Linguistic annotations (Snow et al. 2008) • Affect recognition USD 0.4 to label 20 headlines (140 labels)

Dataset Collection • Linguistic annotations (Snow et al. 2008) • Textual entailment • If “Microsoft was established in Italy in 1985”, then “Microsoft was established in 1985” ? • Word sense disambiguation • “a bass on the line” vs. “a funky bass line” • Temporal annotation • Ran happens before fell: • “The horse ran past the barn fekk”

Dataset Collection • Document relevance evaluation • Alonso et al. (2008) • User rating collection • Kittur et al. (2008) • Noun compound paraphrasing • Nakov (2008) • Name resoluation • Su et al. (2007)

Data Characteristic Cost? Efficiency? Quality?

Cost and Efficiency • In image annotation • Sorokin and Forsyth, 2008

Cost and Efficiency • In linguistic annotation • Snow et. al, 2008

Cheap and fast! Is it good?

Quality • Multiple non-experts can beat experts • 三個臭皮匠勝過一個諸葛亮 • Black line • agreement among turkers • Green line: • single expert • Golden result: • agreement among multiple experts

In addition to Dataset Collection

QoE Measurement • QoE (Quality of Experience) • Subjective measure of user perception • Traditional approach • User studies by MOS ratings (Bad -> Excellent) • Crowdsourcing with paired comparison • Diverse user input • Easy to understand • Interval scale scores can be calculated

Acoustic QoE Evaluation

Acoustic QoE Evaluation • Which one is better? • Simple pair comparison

Optical QoE evaluation

Interactive QoE Evaluation

Acoustic QoE • VoIP Loss Rate • MP3 Compression Rate

Optical QoE • Packet loss rate • Video Codec

Iterative Task

Iterative Tasks • Turkit: tools for iterative tasks on Mturk • Imperative programming paradigm • Basic elements • Variable (a = b) • Control (if else statement) • Loop (for, while statement) • Turning MTurk into a programming platform which integrates human brain powers

Iterative Text Improvement • A Wikipedia-like scenario • One Turkerimprove the text • Other Turkersvote if the improvement is valid

Iterative Text Improvement • Image description • Instructions for the improve-HIT • Please improve the description for this image • People will vote whether to approve your changes • Use no more than 500 characters • Instructions for the vote-HIT • Please select the better description for this image • Your vote must agree with the majority to be approved

Iterative Text Improvement • Image description • A partial view of a pocket calculator together with some coins and a pen. • A view of personal items a calculator, and some gold and copper coins, and a round tip pen, these are all pocket and wallet sized item used for business, writing, calculating prices or solving math problems and purchasing items. • A close-up photograph of the following items:* A CASIO multi-function calculator* A ball point pen, uncapped* Various coins, apparently European, both copper and gold • …Various British coins; two of £1 value, three of 20p value and one of 1p value. …

Iterative Text Improvement • Image description A close-up photograph of the following items: A CASIO multi-function, solar powered scientific calculator. A blue ball point pen with a blue rubber grip and the tip extended. Six British coins; two of £1 value, three of 20p value and one of 1p value. Seems to be a theme illustration for a brochure or document cover treating finance - probably personal finance.

Iterative Text Improvement • Handwriting Recognition • Version 1 • You (?) (?) (?) (work). (?) (?) (?) work (not) (time). I (?) (?) a few grammatical mistakes. Overall your writing style is a bit too (phoney). You do (?) have good (points), but they got lost amidst the (writing). (signature)

Crowdsourcing and its applications on Scientific Research

Crowdsourcing and its applications on Scientific Research

Presentation Transcript

Nanobiotechnology and its Applications

Kevlar and its Applications

JVSTM and its applications

Educational applications of scientific research on music performance

Crowdsourcing and its applications on Scientific Research

Crowdsourcing in Life Science Research

Operations Research (OR) and Its Applications in Industry

PRESENTATION ON EMBEDDED SYSTEM AND ITS APPLICATIONS

Replication and Its Applications

Group 4: Web based applications/ crowdsourcing

Scientific Applications on Multi-PIM Systems

Mobile Crowdsourcing: Challenges and Applications

Discussion on applications and research projects

Scientific innovations and applications-

Research and Its Applications

Harmonized Research on ITS

Elasticity and its Applications

Crowdsourcing Collaborative Medical Research

Educational applications of scientific research on music performance

Scientific Applications on Multi-PIM Systems

Harmonized Research on ITS

Research Case in Crowdsourcing