Matchin: Eliciting User Preferences with an Online Game

Matchin: Eliciting User Preferences with an Online Game Severin Hacker, and Luis von Ahn Carnegie Mellon University SIGCHI 2009

Matchin • A game that asks two randomly chosen partners "which of these two images do you think your partner prefers?"

Some Findings • It is possible to extract a global "beauty" ranking within a large collection of images. • It is possible to extract the person's general image preferences. • Their model can determine a player's gender with high probability.

A Taxonomy of Methods • Absolute Versus Relative Judgments • Total Versus Partial Judgments • Random Access Versus Predefined Access • "I Like" Versus "Others Like" • Direct Versus Indirect

Existing Methods • Flickr Interestingness • Voting • Hot or Not

The Mechanism • Matchin is a two-player game that is played over the Internet. • Every game takes two minutes. • One pair of images usually takes between two to five seconds. • Matchin uses a collection of 80,000 images from Flickr that were gathered October 2007.

The Scoring Function • Matchin uses a sigmoid function for scoring games. • Constant scoring function • Players could get many points by quickly picking the images at random. • Exponential scoring function • The rewards sometimes became too high

The Data • The game was launched on May 15, 2008. • Within only four months, 86,686 games had been played by 14,993 players. • There have been 3,562,856 individual decisions (clicks) on images. • An individual decision/record is stored in the form: • <id, game_id, player, better, worse, time, waiting_time>

Ranking Functions • Empirical Winning Rate (EWR) • ELO Rating • TrueSkill Rating

Empirical Winning Rate (EWR) • Function: • Two problems: • For images that have a low degree, the empirical winning rate might be artificially high or low. • It does not take the quality of the competing image into account.

ELO Rating (1/2) • The ELO rating system was introduced for rating chess players. • Each chess player’s performance in a game is modeled as a normally distributed random variable. • The mean of that random variable should reflect the player’s true skill and is called the player’s ELO rating.

ELO Rating (2/2) • Expected score: • ELO rating:

Every player’s skill s is modeled as a normally distributed random variable centered around a mean μand per-player variance σ2. A player’s particular performance in a game then is drawn from a normal distribution with mean s and a per-game variance β 2. TrueSkill Rating (1/2)

TrueSkill Rating (2/2) • Update: • Conservative skill estimate:

Collaborative Filtering (1/2) • In the collaborative filtering setting, they want to find out about each individual's preferences • recommend images to each user based on his/her preferences • compare users and images with each other • They have developed a new collaborative filtering algorithm they call “Relative SVD”

Collaborative Filtering (2/2) • The user feature vectors: • The image feature vectors: • The amount by which user i likes image j • Data: a set D of triplets (i,j,k) • The error for a particular decision: • The total sum of squared errors (SSE):

Comparison of the Models

Local Minimum • Do humans learn while playing the game? • They compared the agreement rate of first-time players and other players. • the first-time players: 69.0% • the more experienced players: 71.8% • They have also measured if people learn within a game. • the first half of the game: 67% • the second half of the game : 64%

Gender Prediction • The conditional entropy: • The necessary conditional probabilities Pr(G=g|X=x) can be computed with Bayes' rule given the class conditionals Pr(X=x|G=g). • The naïve Bayes classifier will maximize the likelihood of the data: • The total accuracy is 78.3%

The Top Ranked Images

Discussion (1/2) • The highest ranked pictures • sunsets, animals, flowers, churches, bridges, and famous tourist attractions • neither provocative nor offensive • The worst pictures • taken indoors and include a person • blurry or too dark • screenshots or pictures of documents or text

Discussion (2/2) • There are substantial differences among players in judging images, and taking those differences into account can greatly help in predicting the users’ behavior on new images. • More experienced players had about the same error rate as new players.

Conclusion • The main contribution of this paper is to provide a new method to elicit user preferences . • They compared several algorithms for combining these relative judgments into a total ordering and found that they can correctly predict a user’s behavior in 70% of the cases. • They describe a new algorithm called Relative SVD to perform collaborative filtering on pair-wise relative judgments. • They present a gender test that asks users to make some relative judgments and can predict a random user’s gender in roughly 4 out of 5 cases.

Matchin: Eliciting User Preferences with an Online Game

Matchin: Eliciting User Preferences with an Online Game

Presentation Transcript

Creating and Managing Users

Road to the White House

ANALYTICAL X-RAY SAFETY ONLINE USER TRAINING This training is intended for users of X-ray devices equipped with an inter

Global Celebrations

Littlefield Game

ANALYTICAL X-RAY SAFETY ONLINE USER TRAINING This training is intended for users of X-ray devices equipped with an inter

CAPTCHA, THE ESP GAME, AND OTHER STUFF

The Generalized System of Preferences (GSP) The special regime for the Less Developed Countries (LDC)

Designing the User Interface

Computational Game Theory

Towards Self-Optimizing Frameworks for Collaborative Systems

SO … how do we do it

Distortions In Deriving Preferences

The Next Mainstream Programming Language: A Game Developer’s Perspective

The Next Mainstream Programming Language: A Game Developer’s Perspective

CSA3212: User-Adaptive Systems

Accommodating Global Policies and User Preferences in Computer Supported Collaborative Systems

Quality Assurance and Game Testing

Economics 776: Experimental Economics First Semester 2007 Topic 2: Norms and Preferences: Evidence

Game Programming 101

ATEC 2320

3 D Game Programming