840 likes | 969 Views
Wisdom of Crowds in Human Memory: Reconstructing Events by Aggregating Memories across Individuals. Mark Steyvers Department of Cognitive Sciences University of California, Irvine. Joint work with: Brent Miller, Pernille Hemmer, Mike Yi Michael Lee, Bill Batchelder , Paolo Napoletano.
E N D
Wisdom of Crowds in Human Memory: Reconstructing Events by Aggregating Memories across Individuals Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Brent Miller, Pernille Hemmer, Mike Yi Michael Lee, Bill Batchelder, Paolo Napoletano
Wisdom of crowds phenomenon • Group estimate often performs as well as or better than best individual in the group
Examples of wisdom of crowds phenomenon Galton’s Ox (1907): Median of individual estimates comes close to true answer Who wants to be a millionaire?
Tasks studied in our research • Ordering/ranking problems • declarative memory: order of US presidents, ranking cities by size • episodic memory: order of events (i.e., serial recall) • predictive rankings: fantasy football • Matching problems • assign N items to N responses • e.g., match paintings to artists, or flags to countries • Traveling Salesman problems • find shortest route between cities problems involving permutations
Recollecting order from Declarative Memory Abraham Lincoln Ulysses S. Grant time Ulysses S. Grant Rutherford B. Hayes Rutherford B. Hayes James Garfield Abraham Lincoln Andrew Johnson James Garfield Andrew Johnson Place these presidents in the correct order
Recollecting order from episodic memory http://www.youtube.com/watch?v=a6tSyDHXViM&feature=related
Place scenes in correct order (serial recall) A B C D time
Goal: aggregating responses ground truth group answer ? A B C D = A B C D Aggregation Algorithm A D B C D A B C B A D C A C B D A B D C
Bayesian Approach group answer = latent random variable A B C D Generative Model A D B C D A B C B A D C A C B D A B D C
Task constraints • No communication between individuals • There is always a true answer (ground truth) • Aggregation algorithm never has access to ground truth • unsupervised methods • ground truth only used for evaluation
Research Goals • Aggregation of permutation data • going beyond numerical estimates or multiple choice questions • combinatorially complex • Incorporate individual differences • going beyond models that treat every vote equally • assume some individuals might be “experts” • Take cognitive processes into account • going beyond mere statistical aggregation Hierarchical Bayesian models
Experiment 1 • Task: order all 44 US presidents • Methods • 26 participants (college undergraduates) • Names of presidents written on cards • Cards could be shuffled on large table
Measuring performance Kendall’s Tau: The number of adjacent pair-wise swaps = 1 = 1+1 = 2 Ordering by Individual A B E C D A B E CD E C D A B C D E A B True Order A B C D E
Empirical Results (random guessing) t
Many methods for analyzing rank data… • Probabilistic models • Thurstone (1927), Mallows (1957), Plackett-Luce (1975) • Lebanon-Mao (2008) • Spectral methods • Diaconis (1989) • Heuristic methods from voting theory • Borda count … however, many of these approaches were developed for preference rankings
Bayesian models constrained by human cognition • Extension of Thurstone’s (1927) model • Extension of Estes (1972) perturbation model
Bayesian Thurstonian Approach C B A Each item has a true coordinate on some dimension
Bayesian Thurstonian Approach Person 1 B A C … but there is noise because of encoding and/or retrieval error
Bayesian Thurstonian Approach Person 1 B A C B C A Each person’s mental representation is based on (latent) samples of these distributions
Bayesian Thurstonian Approach Person 1 B A C Observed Ordering: A < B < C B C A The observed ordering is based on the ordering of the samples
Bayesian Thurstonian Approach Person 1 B A C Observed Ordering: A < B < C B C A Person 2 B C A Observed Ordering: A < C < B C B A People draw from distributions with common means but different variances
Graphical Model Notation j=1..3 shaded = observed not shaded = latent
Graphical Model of Bayesian Thurstonian Model Latent ground truth Individual noise level Mental representation Observed ordering j individuals
(weak) wisdom of crowds effect t model’s ordering is as good as best individual (but not better)
Inferred Distributions for 44 US Presidents George Washington (1) John Adams (2) Thomas Jefferson (3) James Madison (4) median and minimumsigma James Monroe (6) John Quincy Adams (5) Andrew Jackson (7) Martin Van Buren (8) William Henry Harrison (21) John Tyler (10) James Knox Polk (18) Zachary Taylor (16) Millard Fillmore (11) Franklin Pierce (19) James Buchanan (13) Abraham Lincoln (9) Andrew Johnson (12) Ulysses S. Grant (17) Rutherford B. Hayes (20) James Garfield (22) Chester Arthur (15) Grover Cleveland 1 (23) Benjamin Harrison (14) Grover Cleveland 2 (25) William McKinley (24) Theodore Roosevelt (29) William Howard Taft (27) Woodrow Wilson (30) Warren Harding (26) Calvin Coolidge (28) Herbert Hoover (31) Franklin D. Roosevelt (32) Harry S. Truman (33) Dwight Eisenhower (34) John F. Kennedy (37) Lyndon B. Johnson (36) Richard Nixon (39) Gerald Ford (35) James Carter (38) Ronald Reagan (40) George H.W. Bush (41) William Clinton (42) George W. Bush (43) Barack Obama (44)
Model can predict individual performance t individual t distance to ground truth s inferred noise level for each individual
Extension of Estes (1972) Perturbation Model • Main idea: • item order is perturbed locally • Our extension: • perturbation noise varies between individuals and items True order A B C D E A C B D E Recalled order
Strong wisdom of crowds effect t Perturbation Perturbation model’s ordering is better than best individual
Inferred Perturbation Matrix and Item Accuracy Abraham Lincoln Richard Nixon James Carter
Alternative Heuristic Models • Many heuristic methods from voting theory • E.g., Borda count method • Suppose we have 10 items • assign a count of 10 to first item, 9 for second item, etc • add counts over individuals • order items by the Borda count • i.e., rank by average rank across people
Model Comparison t Borda
Experiment 2 • 78 participants • 17 problems each with 10 items • Chronological Events • Physical Measures • Purely ordinal problems, e.g. • Ten Amendments • Ten commandments
Example results Perturbation Model Thurstonian Model 1. Oregon (1) 2. Utah (2) 3. Nebraska (3) 4. Iowa (4) 5. Alabama (6) 6. Ohio (5) 7. Virginia (7) 8. Delaware (8) 9. Connecticut (9) 10. Maine (10) 1. Freedom of speech & relig... (1) 2. Right to bear arms (2) 3. No quartering of soldiers... (3) 4. No unreasonable searches (4) 5. Due process (5) 6. Trial by Jury (6) 7. Civil Trial by Jury (7) 8. No cruel punishment (8) 9. Right to non-specified ri... (10) 10. Power for the States & Pe... (9)
Average results over 17 Problems t Strong wisdom of crowds effect across problems Mean Individuals
Predicting problem difficulty city size rankings t t distance of group answer to ground truth ordering states geographically std( s ) dispersion of noise levels across individual
Effect of Group Composition • How many individuals do we need to average over?
Experts vs. Crowds • Can we find experts in the crowd? Can we form small groups of experts? • Approach • Form a group for some particular task • Select individuals with the smallest sigma (“experts”) based on previous tasks • Vary the number of previous tasks
Group Composition based on prior performance # previous tasks T = 0 T = 2 T = 8 t Group size (best individuals first)
Methods for Selecting Experts Endogenous: no feedback required Exogenous: selecting people based on actual performance t t
Aggregating Episodic Memories Study this sequence of images
Place the images in correct sequence (serial recall) A B C D E F G H I J
Example calibration result for individuals t individual distance to ground truth s inferred noise level (pizza sequence; perturbation model)
Predictive Rankings: fantasy football Australian Football League (29 people rank 16 teams) South Australian Football League (32 people rank 9 teams)
Find all matching pairs C A B D E 1 2 3 4 5
Experiment • 15 subjects • 8 problems • 4 problems with 5 items • 4 problems with 10 items