510 likes | 639 Views
Wisdom of Crowds and Rank Aggregation. Mark Steyvers Department of Cognitive Sciences University of California, Irvine. Joint work with: Brent Miller, Pernille Hemmer, Mike Yi, Michael Lee. Wisdom of crowds phenomenon.
E N D
Wisdom of Crowds and Rank Aggregation Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Brent Miller, Pernille Hemmer, Mike Yi, Michael Lee
Wisdom of crowds phenomenon • Aggregating over individuals in a group often leads to an estimate that is better than any of the individual estimates
Examples of wisdom of crowds phenomenon Galton’s Ox (1907): Median of individual weight estimates came close to true answer Prediction markets
Our research: ranking problems What is the correct chronological order? Abraham Lincoln Ulysses S. Grant time Ulysses S. Grant Rutherford B. Hayes Rutherford B. Hayes James Garfield Abraham Lincoln Andrew Johnson James Garfield Andrew Johnson
Aggregating ranking data ground truth group answer ? A B C D = A B C D Aggregation Algorithm A D B C D A B C B A D C A C B D A B D C
Task constraints • No communication between individuals • There is always a true answer (ground truth) • Unsupervisedalgorithms • no feedback is available • ground truth only used for evaluation
Unsupervised models for ranking data • Classic models: • Thurstone (1927) • Mallows (1957); Fligner and Verducci, 1986 • Diaconis(1989) • Voting methods: e.g. Borda count (1770) • Machine learning applications • Information retrieval and meta-search • e.g. Klementiev, Roth et al. (2008; 2009), Lebanon & Mao (2008); Dwork et al. (2001) • multi-object tracking • e.g. Huan, Guestrin, Guibas (2009); Kondor, Howard, Jebara (2007) Many models were developed for preference rankings and voting situations no known ground truth
Unsupervised Approach latent ground truth ? ? ? ? Incorporate individual differences Generative Model A D B C D A B C B A D C A C B D A B D C
Overview of talk • Reconstruct the order of US presidents • Effect of group size and expertise • Reconstruct the order of events • Traveling Salesman Problem
Measuring performance Kendall’s Tau: The number of adjacent pair-wise swaps = 1 = 1+1 = 2 Ordering by Individual A B E C D A B E CD E C D A B C D E A B True Order A B C D E
Empirical Results (random guessing) t
Thurstonian Model A. George Washington B. James Madison C. Andrew Jackson Each item has a true coordinate on some dimension
Thurstonian Model A. George Washington B. James Madison C. Andrew Jackson … but there is noise because of encoding errors
Thurstonian Model A. George Washington B. James Madison C. Andrew Jackson A B C Each person’s mental encoding is based on a single sample from each distribution
Thurstonian Model A. George Washington B. James Madison C. Andrew Jackson A A < C < B B C The observed ordering is based on the ordering of the samples
Thurstonian Model A. George Washington B. James Madison C. Andrew Jackson A A < B < C B C The observed ordering is based on the ordering of the samples
Thurstonian Model A. George Washington B. James Madison C. Andrew Jackson Important assumption: across individuals, standard deviation can vary but not the means
Graphical Model of Extended Thurstonian Model Latent group means Individual noise level Mental representation Observed ordering j individuals
Inferred Distributions for 44 US Presidents George Washington (1) John Adams (2) Thomas Jefferson (3) James Madison (4) James Monroe (6) John Quincy Adams (5) Andrew Jackson (7) Martin Van Buren (8) William Henry Harrison (21) John Tyler (10) James Knox Polk (18) Zachary Taylor (16) Millard Fillmore (11) Franklin Pierce (19) James Buchanan (13) Abraham Lincoln (9) Andrew Johnson (12) Ulysses S. Grant (17) Rutherford B. Hayes (20) James Garfield (22) Chester Arthur (15) Grover Cleveland 1 (23) Benjamin Harrison (14) Grover Cleveland 2 (25) William McKinley (24) Theodore Roosevelt (29) William Howard Taft (27) Woodrow Wilson (30) Warren Harding (26) Calvin Coolidge (28) Herbert Hoover (31) Franklin D. Roosevelt (32) Harry S. Truman (33) Dwight Eisenhower (34) John F. Kennedy (37) Lyndon B. Johnson (36) Richard Nixon (39) Gerald Ford (35) James Carter (38) Ronald Reagan (40) George H.W. Bush (41) William Clinton (42) George W. Bush (43) Barack Obama (44) error bars = median and minimumsigma
Calibration of individuals t individual t distance to ground truth s inferred noise level for each individual
Alternative Heuristic Models • Many heuristic methods from voting theory • E.g., Borda count method • Suppose we have 10 items • assign a count of 10 to first item, 9 for second item, etc • add counts over individuals • order items by the Borda count • i.e., rank by average rank across people
Model Comparison t Borda
Overview of talk • Reconstruct the order of US presidents • Effect of group size and expertise • Reconstruct the order of events • Traveling Salesman Problem
Experiment • 78 participants • 17 ordering problems each with 10 items • Chronological Events • Physical Measures • Purely ordinal problems, e.g. • Ten Amendments • Ten commandments
Ordering states west-east Oregon (1) Utah (2) Nebraska (3) Iowa (4) Alabama (6) Ohio (5) Virginia (7) Delaware (8) Connecticut (9) Maine (10)
Ordering Ten Amendments Freedom of speech & religion (1) Right to bear arms (2) No quartering of soldiers (4) No unreasonable searches (3) Due process (5) Trial by Jury (6) Civil Trial by Jury (7) No cruel punishment (8) Right to non-specified rights (10) Power for the States & People (9)
How effective are small groups of experts? • Want to find experts endogenously – without feedback • Approach: select individuals with the smallest estimated noise levels based on previous tasks • We are identifying general expertise (“Pearson’s g”)
Group Composition based on prior performance # previous tasks T = 0 T = 2 T = 8 t Group size (best individuals first)
Endogenous no feedback required Exogenousselecting people based on actual performance t t
Overview of talk • Reconstruct the order of US presidents • Effect of group size and expertise • Reconstruct the order of events • Traveling Salesman Problem
Recollecting Order from Episodic Memory Study this sequence of images
Place the images in correct sequence (serial recall) A B C D E F G H I J
Calibration of individuals t individual distance to ground truth s inferred noise level (pizza sequence; perturbation model)
Overview of talk • Reconstruct the order of US presidents • Effect of group size and expertise • Reconstruct the order of events • Traveling Salesman Problem
Find the shortest route between cities Individual 5 Individual 83 Optimal Individual 60 B30-21
Dataset Vickers, Bovet, Lee, & Hughes (2003) • 83 participants • 7 problems of 30 cities
TSP Aggregation Problem • Data consists of city order only • No access to city locations
Heuristic Approach • Idea: find tours with edges for which many individuals agree • Calculate agreement matrix A • A = n × n matrix, where n is the number of cities • aij indicates the number of participants that connect cities i and j. • Find tour that maximizes (this itself is a non-Euclidian TSP problem)
Results averaged across 7 problems aggregate
Summary • Combine ordering / ranking data • going beyond numerical estimates or multiple choice questions • Incorporate individual differences • assume some individuals might be “experts” • going beyond models that treat every vote equally • Applications • combine multiple eyewitness accounts • combine solutions in complex problem-solving situations • fantasy football
That’s all Do the experiments yourself: http://psiexp.ss.uci.edu/
Predictive Rankings: fantasy football Australian Football League (29 people rank 16 teams) South Australian Football League (32 people rank 9 teams)
Predicting problem difficulty city size rankings t t distance of group answer to ground truth ordering states geographically std( s ) dispersion of noise levels across individual