790 likes | 922 Views
Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals. Mark Steyvers Department of Cognitive Sciences University of California, Irvine. Joint work with: Michael Lee Brent Miller Pernille Hemmer Bill Batchelder Paolo Napoletano. Ordering problem:.
E N D
Hierarchical Bayesian Models for Aggregating Retrieved Memories across Individuals Mark Steyvers Department of Cognitive Sciences University of California, Irvine Joint work with: Michael Lee Brent Miller Pernille Hemmer Bill Batchelder Paolo Napoletano
Ordering problem: what is the correct order of these Presidents? George Washington Thomas Jefferson time Thomas Jefferson James Monroe James Monroe Andrew Jackson John Adams George Washington Andrew Jackson John Adams
Goal: aggregating responses ground truth group answer ? A B C D = A B C D Aggregation Algorithm A D B C D A B C B A D C A C B D A B D C
Bayesian Approach ground truth = latent common cause A B C D Generative Model A D B C D A B C B A D C A C B D A B D C
Important notes: • No communication between individuals • There is always a true answer (ground truth) • Aggregation algorithm never has access to ground truth • ground truth only used for evaluation
Matching problem: Van Gogh Rembrandt Monet Renoir C A D B
Wisdom of crowds phenomenon • Crowd estimate is often better than any individual in the crowd (Think of independent noise influencing each individual)
Examples of wisdom of crowds phenomenon Galton’s Ox (1907): Median of individual estimates comes close to true answer Who wants to be a millionaire?
Limitations of Current “Wisdom of Crowds” Research • Studies restricted to numeric or categorical judgments • simple averaging schemes: • Mode • Median • Mean • No treatment of individual differences • every “vote” is treated equally • downplayed role of expertise
Cultural Consensus Theory (CCT)E.g. Romney, Batchelder, and Weller (1987) • Finds the “answer key” to multiple choice questions when ground truth is lost • takes person and item differences into account • Informal version of CCT also developed for ranking data
Research Goals • Generalize “wisdom of crowds” effect to more complex data • Aggregation of permutations • Ranking data • Matching (assignment) data
Hierarchical Bayesian Models • Probability distributions over all permutations of items • with N items, there are N! combinations • e.g., when N=44, we have 44! > 10^53 combinations • Approximate inference methods: MCMC • Cognitively plausible generative processes • Treatment of individual differences
Experiment 1 • Task: order all 44 US presidents • Methods • 26 participants (college undergraduates) • Names of presidents written on cards • Cards could be shuffled on large table
Measuring performance Kendall’s Tau: The number of adjacent pair-wise swaps = 1 = 1+1 = 2 Participant Ordering 1 2 5 3 4 1 2 5 3 4 1 2 3 4 5 1 2 5 3 4 Ground Truth 1 2 3 4 5
Empirical Results (random guessing) t
Many approaches for analyzing rank data… • Probabilistic models • Thurstone (1927) • Mallows (1957) • Plackett-Luce (1975) • Lebanon-Mao (2008) • Spectral methods • Diaconis (1989) • Heuristic methods from voting theory • Borda count … however, many of these approached developed for preference rankings
Bayesian Thurstonian Approach B C A Each item has a true coordinate on some dimension
Bayesian Thurstonian Approach Person 1 B A C … but there is noise because of encoding and/or retrieval error
Bayesian Thurstonian Approach Person 1 B A C B C A Each person’s mental representation is based on (latent) samples of these distributions
Bayesian Thurstonian Approach Person 1 B A C Observed Ordering: A < B < C B C A The observed ordering is based on the ordering of the samples
Bayesian Thurstonian Approach Person 1 B A C Observed Ordering: A < B < C B C A Person 2 B C A Observed Ordering: A < C < B C B A People draw from distributions with common mean but different variances
Graphical Model Notation j=1..3 shaded = observed not shaded = latent
Graphical Model of Bayesian Thurstonian Model Latent ground truth Individual ability Mental representation Observed ordering j individuals
Inference • Need the posterior distribution • Markov Chain Monte Carlo • Gibbs sampling on • Metropolis-hastings on and • Draw 400 samples • group ordering based on average of across samples
Wisdom of Crowds effect t model’s ordering is as good as best individual
Inferred Distributions for 44 US Presidents George Washington (1) John Adams (2) Thomas Jefferson (3) James Madison (4) median and minimumsigma James Monroe (6) John Quincy Adams (5) Andrew Jackson (7) Martin Van Buren (8) William Henry Harrison (21) John Tyler (10) James Knox Polk (18) Zachary Taylor (16) Millard Fillmore (11) Franklin Pierce (19) James Buchanan (13) Abraham Lincoln (9) Andrew Johnson (12) Ulysses S. Grant (17) Rutherford B. Hayes (20) James Garfield (22) Chester Arthur (15) Grover Cleveland 1 (23) Benjamin Harrison (14) Grover Cleveland 2 (25) William McKinley (24) Theodore Roosevelt (29) William Howard Taft (27) Woodrow Wilson (30) Warren Harding (26) Calvin Coolidge (28) Herbert Hoover (31) Franklin D. Roosevelt (32) Harry S. Truman (33) Dwight Eisenhower (34) John F. Kennedy (37) Lyndon B. Johnson (36) Richard Nixon (39) Gerald Ford (35) James Carter (38) Ronald Reagan (40) George H.W. Bush (41) William Clinton (42) George W. Bush (43) Barack Obama (44)
Model is calibrated Individuals with large sigma are far from the truth t s
Alternative Models • Many heuristic methods from voting theory • E.g., Borda count method • Suppose we have 10 items • assign a count of 10 to first item, 9 for second item, etc • add counts over individuals • order items by the Borda count • i.e., rank by average rank across people
Experiment 2 • 78 participants • 17 problems each with 10 items • Chronological Events • Physical Measures • Purely ordinal problems, e.g. • Ten Amendments • Ten commandments
45 R=0.961 40 35 30 25 t 20 15 10 5 0 0 1 2 3 s Ordering states west-east Oregon (1) Utah (2) Nebraska (3) Iowa (4) Alabama (6) Ohio (5) Virginia (7) Delaware (8) Connecticut (9) Maine (10)
35 R=0.722 30 25 20 t 15 10 5 0 0 0.5 1 1.5 2 s Ordering Ten Commandments Worship any other God (1) Make a graven image (7) Take the Lord's name in vain (2) Break the Sabbath (3) Dishonor your parents (4) Murder (6) Commit adultery (8) Steal (5) Bear false witness (9) Covet (10)
25 20 15 10 5 0 1 10 20 30 40 50 60 70 80 Average results over 17 Problems Thurstonian Model Borda count Mode Individuals t Mean Individuals
Effect of Group Composition • How many individuals do we need to average over?
Experts vs. Crowds • Can we find experts in the crowd? Can we form small groups of experts? • Approach • Form a group for some particular task • Select individuals with the smallest sigma (“experts”) based on previous tasks • Vary the number of previous tasks
Group Composition based on prior performance # previous tasks T = 0 T = 2 T = 8 t Group size (best individuals first)
Methods for Selecting Experts Endogenous: no feedback required Exogenous: selecting people based on actual performance t t
Model incorporating overall person ability Overall ability j individuals Task specific ability j individuals m tasks
Average results over 17 Problems new model t Mean
Another ordering problem: A B time C http://www.youtube.com/watch?v=29VGZtnCD30&feature=related D
Experiment 3 • 26 participants • 6 videos • 3 videos with stereotyped event sequences (e.g. wedding) • 3 videos “unpredictable” videos (e.g., example video) • extracted 10 stills for testing • Method • study video • followed by immediate ordering test of 10 items
Two other examples t = 1 t = 0
Overall Results t Mean