Human Computation

Human Computation Yu-Song Syu 10/11/2010

Human Computation • Human Computation – a new paradigm of applications • ‘Outsource’ computational process to human • Use “human cycles” to solve the problems that are easy to humans but difficult to computer programs • ex: image annotation • Games With A Purpose (GWAP) • Pioneered by Dr. Luis von Ahn, CMU • Take advantage of people’s desire to be entertained • Motivate people to play voluntarily • Produce useful data as a by‐product

Guessing: CAR Guessing: HAT Guessing: KID Guessing: BOY Guessing: CAR ESP – First Game With A Purpose Player 1 Player 2 Agreement Reached: CAR Purpose: Image labeling

Tag a Tune • Helps tagging a song/ music

Other GWAP applications http://gwap.com

Other HCOMP applications ~doesn’t have to be a game Geotagging  collect GeoInfo. Tagging  Face Recognition CAPTCHA  OCR Green Scores  Vehicle Routing

Analysis of Human Computation Systems • How to measure performance? • How to assign Tasks/Questions? • How would players do, if situation changes?

Next… • Introduce two analytical works (on Internet GWAPs) • “purposes”: geo-tagging + image annotation • Propose a model to analyze user behaviors • Introduce a novel approach to improve the system performance • conduct metrics to evaluate the proposed methods under different circumstances • with simulation and real data traces

Analysis of GWAP-based Geospatial Tagging Systems IEEE CollaborateCom 2009, Washion D.C. Ling-Jyh Chen, Yu-Song Syu, Bo-Chun Wang Academia Sinica, Taiwan Wang-Chien Lee The Pennsylvania State University

Geospatial Tagging Systems (GeoTagging) • An emerging location-based application • Helps users find various location-specific information (with tagged pictures) • e.g., “Find a good restaurant nearby” (POI searching in Garmin) • Conventional GeoTagging services • 3 major drawbacks • Two-phase operation model • Photo  go back home  upload • Clustering at hot spots • Tendency to popular places • Lack of specialized tasks • Restaurants allowing pets

GWAP-based geotagging services (Games With APurpose) • Collect information through games Where is the Capital Hall? asker Take a picture for the White House solver : pending unsolved tasks : Locations of Interest (LOI) • Avoid the 3 major drawbacks • Tasks are uploaded right after taking photos • Tasks are assigned by the system • Tasks can be specialized

Problems • Which task to assign? • Will the solver accept the assigned task? • How to measure the system performance?

Acceptance rate of a solver • When a solver u appears, the system decides to assign the task in LOI v • u is more likely to accept the task when… • Population(v) ↗, • Distance(u,v) ↘, Pv[k]: probability that k users appear in v τ Sigmoid Function

Evaluation Metrics (1/3) • Throughput Utility: • To solve as many tasks as possible • Increase #tags • assign easily accepted tasks • Results cluster at hot spots System Throughput #solved tasks (throughput) fairness All solved tasks from the beginning at all locations Starvation Problem

Evaluation Metrics (2/3) • Fairness Utility: • To balance number of solved tasks at LOIs • Balancing • assigntasks at unproductive LOIs • Tasks are more easily rejected Coefficient of Variation Balancing (fairness) throughput c.v. of normalized #solved tasks at all locations Equality of Outcome

Evaluation Metrics (3/3) • System Utility: • To accommodate Uthroughput& Ufairness

Task Assignment Strategies • Simple Assignment (SA) • Only assign the task at the same LOI with the solver (Local Task) • Random Assignment (RA) • Provide abaseline of system performance • Least Throughput First Assignment (LTFA) • Prefer the task from the node of the least throughput •  to maximize Ufairness • Acceptance Rate First Assignment (ARFA) • Prefer the task of the highest acceptance rate •  to maximize Uthroughput • Hybrid Assignment (HA) • Assign the task contributing the highest System Utility (Usystem)

Simulation – Configurations • An equal-sized grid map • size: 20 x 20 • #askers:#solvers = 2:1 • We repeat 100 Times to achieve the average performance

Simulation – Assumptions • Players arrive LOIi at a Poisson Rateλi • λ is unknown in real systems • Approximate based on current & past population at LOIi • EMA - exponential moving average • Here, α = 0.95 α: smoothing factor Ni(t): current population in LOIi at time t

Network Scenarios • EXP • λi (i=1…N) is an exponential distribution with the parameter 0.2  E(λ) = 5 • SLAW (Self-similar Least Action Walk, Infocom’09) • SLAW waypoint generator • Used in simulations of “Human Mobility” • generate fractional Brownian Motion waypoints • In this work, population of LOIs • TPE • A real map in Taipei City • λiis determined by #bus stops at LOIi

Throughput Performance: Uthroughput EXP scenario SLAW scenario Equality of outcome TPE scenario

Fairness Performance: Ufairness EXP scenario SLAW scenario Starvation Problem TPE scenario

Overall Performance: Usystem EXP scenario SLAW scenario Average Spent Time TPE scenario

Usystem(100) Assigning multiple tasks Usystem(100) EXP scenario SLAW scenario • When a solver appears, the system assigns • more than 1 task to the solver • Solver can choose 1 or none of them • K: Number of tasks that the system assigns to • the solver in a round Usystem(100) TPE scenario

Work in progress • Include “time” and “quality” factors in our model • Different values of “#askers/#solvers” • Consider more complex tasks • E.g., what is the fastest way to get to the airport from downtown in rush hour?

Conclusion • Study GWAP-based Geotagging games analytically • Propose 3 metrics to evaluate system performance • Propose 5 task assignment strategies • HA achieves best system performance • computation-hungry • LTFA is the most suitable one in practice • comparable performance to the HA scheme • Acceptable computation complexity • Considering multiple tasks,system performance ↗ when K ↗ • but players may be sick of too many tasks assigned in a round • It’s better to assign multiple tasks1-by-1, rather than all-at-once • For higher System Utility

Exploiting Puzzle Diversity in Puzzle Selection for ESP‐like GWAP Systems IEEE/WIC/ACM WI-IAT 2010, Toronto Yu‐Song Syu, Hsiao‐Hsuan Yu, and Ling‐Jyh Chen Institute of Information Science, Academia Sinica, Taiwan

Guessing: CAR Guessing: HAT Guessing: KID Guessing: BOY Guessing: CAR Remind: The ESP Game Player 1 Player 2 Agreement Reached: CAR

Why is it important? • Some statistics (July 2008) • 200,000+ players have contributed 50+ million labels. • Each player plays for a total of 91 minutes. • The throughput is about 233 labels/player/hour (i.e., one label every 15 seconds) • Google bought a license to create its own version of the game in 2006

To evaluate the performance of ESP-like games • To collect as many labels per puzzle as possible • i.e., quality • To solve as many puzzles as possible • i.e., throughput • Both factors are critical to the performance of the ESP game, but unfortunately they do not complement each other.

State of Art • Chen et al. proposed Optimal Puzzle Selection Algorithm to solve this scheduling problem • determines theoptimal “number of assignments per puzzle” based on an analyticalmodel to find “how many times should a picture be assigned” • An ESP-like game (ESP-Lite) is designed to verify this approach

Problem… • Neglects the puzzle diversity (some puzzles are more productive, and some are hard to solve), which may result in the equality of outcomes problem. B A Which can be tagged more?

Contribution From ESP Lite • Using realistic game traces, we identify the puzzle diversity issue in ESP‐like GWAP systems. • We propose the Adaptive Puzzle Selection Algorithm (APSA) to cope with puzzle diversity by promoting equality of opportunity. • We propose the Weight Sum Tree (WST) to reduce the computational complexity and facilitate the implementation of APSA in real‐world systems. • We show that APSA is more effective than OPSA in terms of the number of agreements reached and the system gain.

Adaptive Puzzle Selection Algorithm • APSA is inspired by the Additive Increase Multiplicative Decrease (AIMD) model of Transmission Control Protocol (TCP). • APSA selects a puzzle to play based on a weighted value wk, and the probability that the k‐th puzzle will be selected is More productive puzzles can be more easily selected later  equality of opportunity

Implementation Method (1/3) • The scalability issue: • The computational complexity increases linearly with the number of puzzles played, i.e., O(K) • Our solution: • We propose a new data structure, called Weight Sum Tree (WST), which is a complete binary tree of partially weighted sums. totally K nodes + K=8, si: the i-th node in the tree h: the height of the tree + +

Implementation Method (2/3) • Three cases to maintain the WST • After the k‐th puzzle is played in a game round • Update the wkand its ancestors: O(logK) • After a puzzle has beenremoved (say, the k‐th puzzle) • Set the wkto 0 (to become a virtual puzzle): O(logK) • After adding a new puzzle (say, the k‐th puzzle) • Set the wkto 1 • Replace the first (leftmost) virtual puzzle (O(logK))or rebuild the WST (O(K))

Implementation Method (3/3) • Determine a random numberr (0 ≤ r ≤ 1), and call the function Puzzle_Selection(0,r)

Evaluation • Use trace‐based simulations. • Game trace collected by the ESP Litesystem. • One‐month long (from 2009/3/9 to 2009/4/9) • The OPSA scheme used in 1,444 games comprised of 6,326 game rounds. In total, 575 distinct puzzles were played and 3,418 agreements were reached. • Dataset available at: • http://hcomp.iis.sinica.edu.tw/dataset/

Evaluation – Puzzle Diversity • The differences exist among the puzzles. • It is important to consider puzzle diversity! It is more difficult to reach the (i+1)-th agreement than the i-th agreement 5-th agreement curve is flat

Simulation Results

System Gain Evaluation • APSA always achieves a better system gain than the OPSA scheme • The system gain could be improved further by modifying the second part of the metric (e.g., by introducing competition into the system [17]).

Summary • We identify the puzzle diversityissue in ESP‐like GWAP systems. • We propose the Adaptive Puzzle Selec1on Algorithm (APSA) to consider individual differencesby promoting equality of opportunity. • We design a data structure, called Weight Sum Tree (WST) to reduce the computational complexity of APSA. • We evaluate the APSA scheme and show that it is more effective than OPSA in terms of # agreements reached and the system gain

Human Computation