Rethinking the ESP Game

Rethinking the ESP Game Stephen Robertson, Milan Vojnovic, Ingmar Weber* Microsoft Research & Yahoo! Research *This work was done while I was a visiting researcher at MSRC.

The ESP Game – Live Demo Show it live. (2min) Alternative version.

The ESP Game - Summary • Two players try to agree on a label to be added to an image • No way to communicate • Entered labels only revealed at end • Known labels are “off-limits” • ESP refers to “Extrasensory perception” • Read the other person’s mind

The ESP Game - History • Developed by Luis von Ahn and Laura Dabbish at CMU in 2004 • Goal: Improve image search • Licensed by Google in 2006 • A prime example of harvesting human intelligence for difficult tasks • Many variants (music, shapes, …)

The ESP Game – Strengths and Weaknesses • Strengths • Creative approach to a hard problem • Fun to play • Vast majority of labels are appropriate • Difficult to spam • Powerful idea: Reaching consensus with little or no communication

The ESP Game – Strengths and Weaknesses • Weaknesses • The ultimate object is ill-defined • Finds mostly general labels • Already millions of images for these • “Lowest common denominator” problem • Human time is used sub-optimally

A “Robot” Playing the ESP Game Video of recorded play.

The ESP Game – Labels are Predictable • Synonyms are redundant • “guy” => “man” for 81% of images • Co-occurrence reduces “new” information • “clouds” => “sky” for 68% of images • Colors are easy to agree on • “black” is 3.3% of all occurrences

How to Predict the Next Label T = {“beach”, “water”}, next label t = ??

How to Predict the Next Label Want to know: P(“blue” next label | {“beach”, “water”}) P(“car” next label | {“beach”, “water”}) P(“sky” next label | {“beach”, “water”}) P(“bcn” next label | {“beach”, “water”}) Problem of data sparsity!

How to Predict the Next Label Want to know: P(“t” next label | T) = P(T | “t” next label)¢P(“t”) / P(T) Use conditional independence … Give a random topic to two people. Ask them to each think of 3 related terms. Bayes’ Theorem P(A,B) = P(A|B)¢P(B) = P(B|A)¢P(A)

Conditional Independence Madrid sun paella beach soccer flamenco p1 p2 “Spain” sky water eyes azul blau bleu p1 “blue” p2 P(“p1: sky”, “p2: azul” | “blue”) = P(“p1: sky” | “blue”) ¢ P(“p2: azul” | “blue”) P(A,B|C) = P(A|C)¢P(B|C)

How to Predict the Next Label C.I. Assumption violated in practice, but “close enough”. P({s1, s2} | “t”) ¢ P(“t”) / P(T) = P(s1 | “t”) ¢ P(s2 | “t”) ¢ P(“t”) / P(T) P(s | “t”) will still be zero very often ! smoothing P(s | “t”) = (1-¸) P(s | “t”) + ¸ P(s) Non-zero background probability

How to Predict the Next Label P(“t” next label | T already present) = s2 TP(s | “t”) P(“t”) / C where C is a normalizing constant ¸ chosen using a “validation set”. ¸ = 0.85 in the experiments. Model trained on ~13,000 tag sets. Also see: Naïve Bayes classifier Cond. indep. assumption Bayes’ Theorem

Experimental Results: Part 1 Number of • games played 205 • images encountered 1,335 • images w/ OLT 1,105 Percentage w/ match • all images 69% • only images with OLTs 81% • all entered tags 17% Av. number of labels entered • per image 4.1 • per game 26.7 Agreement index • mean 2.6 • median 2.0 The “robot” plays reasonably well. The “robot” plays human-like.

Quantifying “Predictability” and “Information” So, labels are fairly predictable. But how can we quantify “predictability”?

Quantifying “Predictability” and “Information” • “sunny” vs. “cloudy” tomorrow in BCN • The role of a cubic dice • The next single letter in “barcelo*” • The next single letter in “re*” • Clicked search result for “yahoo research”

Entropy and Information • An event occurring with probability p corresponds to an information of -log2(p) bits ... … number of bits required to encode in optimally compressed encoding • Example: Compressed weather forecast: P(“sunny”) = 0.5 0 (1 bit) P(“cloudy”) = 0.25 10 (2 bits) P(“rain”) = 0.125 110 (3 bits) P(“thunderstorm”) = 0.125 111 (3 bits)

Entropy and Information • p=1 ! 0 bits of information • Cubic dice showed a number in [1,6] • p¼0 ! many, many bits of information • The numbers for the lottery “information” = “amount of surprise”

Entropy and Information • Expected information for p1, p2, …, pn: i -pi¢ log(pi) = (Shannon) entropy • Might not know true p1, p2, …, pn, but think they are p1, p2, …, pn. Then, w.r.t. p you observe i -pi¢ log(pi) minimized for p = p p given by earlier model. p is then observed.

Experimental Results: Part 2 Later labels are more predictable. Equidistribution = 12.3 bits. “Static” distribution = 9.3 bits. Human thinks harder and harder.

Improving the ESP Game • Could score points according to –log2(p) • - Number of bits of information added to the system • Have an activation time limit for “obvious” labels • Remove the immediate satisfaction for simple matches • Hide off-limits terms • Have to be more careful to avoid “obvious” labels • Try to match “experts” • - Use previous tags or meta information • Educate players • - Use previously labeled images to unlearn behavior • Automatically expand the off-limits list • - Easy, but 10+ terms not practical

Questions Thank you! ingmar@yahoo-inc.com

Rethinking the ESP Game