1 / 32

Jonathan Huang Carlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

Learning Hierarchical Riffle Independent Groupings from Rankings. Jonathan Huang Carlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel. American Psychological Association Elections. Each ballot in election data is a ranked list of candidates [ Diaconis , 89]

fedora
Download Presentation

Jonathan Huang Carlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Learning Hierarchical Riffle Independent Groupings from Rankings Jonathan Huang Carlos Guestrin Carnegie Mellon University ICML 2010 Haifa, Israel

  2. American Psychological Association Elections • Each ballot in election data is a ranked list of candidates [Diaconis, 89] • 5738 full ballots (1980 election) • 5 candidates • William Bevan • Ira Iscoe • Charles Kiesler • Max Siegle • Logan Wright First-order matrix Prob(candidate i was ranked j) 0.1 0.09 0.08 0.07 0.06 probability 0.05 0.04 0.03 0.02 0.01 ranks 0 10 20 30 40 50 60 70 80 90 100 110 120 permutations Candidate 3 has most 1st place votes And many last place votes candidates

  3. Factorial Possibilities • n items means n!rankings: 0.1 0.09 0.08 0.07 0.06 probability 0.05 0.04 • (Not to mention sample • complexity issues…) 0.03 0.02 0.01 0 10 20 30 40 50 60 70 80 90 100 110 120 permutations • Possible learning biases for taming complexity: • Parametric? (e.g., Mallows,Plackett Luce,…) • Sparsity? [Reid79,Jagabathula08] • Independence/Graphical models? [Huang09]

  4. Full independence on rankings Graphical model for joint ranking of 6 items {A,B,C}, {D,E,F} independent Mutual exclusivity leads to fully connected model! A F A F B E B E C D C D Rank ofitem C Ranks of {A,B,C} a permutation of {1,2,3} Ranks of {D,E,F} a permutation of {4,5,6}

  5. First-order independence condition Sparsity: any permutation putting A, B, or C in ranks 4, 5, or 6 has zero probability! But… such sparsity unlikely to exist in real data Prob(candidate i was ranked j) Not independent Independent A F A F B E B E ranks vs. C D C D Ranks Ranks candidates Candidates Candidates

  6. Drawing independent fruits/veggies Full independence: fruit/veggie positions fixed ahead of time! • Veggies in ranks {1,2}, Fruits in ranks {3,4} • (veggies always better than fruits) • Draw veggie rankings, fruit rankings independently: • Form joint ranking of veggies and fruits: Fruits Veggies Artichoke > Dates Broccoli > Cherry Veggie Artichoke Artichoke Veggie Broccoli Veggie > > > Broccoli Veggie Broccoli Veggie Artichoke Veggie > > > Date Fruit Fruit Cherry Fruit Date > > > Cherry Fruit Date Fruit Fruit Cherry

  7. Riffled independence • Riffled independence model [Huang, Guestrin, 2009]: • Draw interleaving of veggie/fruit positions (according to a distribution) Veggie Veggie Fruit Fruit Fruits Veggies > > > > Artichoke > Dates Broccoli > Cherry Veggie Fruit Veggie Veggie > > > > Fruit Veggie Veggie Fruit > > > > Artichoke Fruit Fruit Fruit Veggie > Cherry > Broccoli > Dates

  8. Riffle Shuffles Riffle shuffles corresponds to distributions over interleavings • Riffle shuffle (dovetail shuffle) • Cut deck into two piles. • Interleave piles. Interleaving distribution

  9. American Psych. Assoc. Election (1980) Best KL split 0.1 {12345} 0.09 0.08 0.07 0.06 probability 0.05 {1345} {2} 0.04 0.03 0.02 0.01 Candidate 3 fully independent vs. Candidate 2 riffle independent 0 10 20 30 40 50 60 70 80 90 100 110 120 Minimize: KL(empirical || riffle indep. approx.) permutations

  10. Irish House of Parliament election 2002 • 64,081votes, 14 candidates • Two main parties: • Fianna Fail (FF) • Fine Gael (FG) • Minor parties: • Independents (I) • Green Party (GP) • Christian Solidarity (CS) • Labour (L) • Sinn Fein (SF) Ireland [Gormley, Murphy, 2006]

  11. Approximating the Irish Election Prob(candidate i was ranked j) n=14 candidates Sinn Fein, Christian Solidarity marginals not well captured by a single riffle independent split! FF FG I FF FG FG I I I GP CS SF FF L FF FG I FF FG FG I I I GP CS SF FF L Candidates Candidates “True” first order marginals Riffle Independent approx. Major partiesriffle independent ofminor parties?

  12. Back to Fruits and Vegetables candy, cookies banana, apple, orange,broccoli, carrot, lettuce banana, apple, orange broccoli, carrot, lettuce candy, cookies candy, cookies ?? ?? Need to factor out {candy, cookies} first!

  13. Hierarchical Decompositions Banana, apple, orange,broccoli, carrot, lettuce, candy, cookies All foods items Banana, apple, orange,broccoli, carrot, lettuce Healthy foods Junk food Candy, cookies Fruits, Vegetables, marginally riffle independent Fruits Vegetables Banana, apple, orange Broccoli, carrot, lettuce

  14. Generative process for hierarchy better Rank fruits Rank vegetables Hierarchical Riffle Independent Models Interleave fruits/vegetables Rank Junk food • Encode intuitive independence constraints • Can be learned with lower sample complexity • Have interpretable parameters/structure Interleave Healthy foods with Junk food

  15. Contributions Structured Representation:Introduction of a hierarchical model based on riffled independence factorizations Structure Learning Objectives:An objective function for learning model structure from ranking data Structure Learning Algorithms:Efficient algorithms for optimizing the proposed objective

  16. Learning a Hierarchy • Exponentially many hierarchies, each encoding a distinct set of independence assumptions {12345} {12345} {12345} {12345} {12345} {12345} {12345} {243} {1345} {245} {2345} {1345} {1234} {1234} {1} {13} {3} {15} {2} {5} {5} {34} {13} {1} {34} {24} {234} {234} {452} {2} {5} {25} {45} {1} {1} {2} {24} {34} {3} Problem statement: given i.i.d. ranked data, learn the hierarchical structure that generated thedata

  17. Learning a Hierarchy • Our Approach: top-down partitioning of item set X={1,…,n} • Binary splits at each stage {12345} {1234} {5} {234} {1}

  18. Hierarchical Decomposition Best KL hierarchy “True” first order Learned first order 0.1 {12345} 0.09 0.08 0.07 1 0.06 probability 0.05 {1345} {2} 2 0.04 0.03 1 Community psychologists 0.02 ranks • KL(true,best hierarchy)=6.76e-02, • TV(true, best hierarchy)=1.44e-01 0.25 3 0.01 0 2 10 20 30 40 50 60 70 80 90 100 110 120 0.2 permutations Minimize: KL(empirical || hierarchical model) {13} {45} 4 ranks Research psychologists Clinical psychologists 3 0.15 5 0.1 1 2 3 4 5 4 candidates 0.05 5 0 1 2 3 4 5 candidates

  19. KL-based objective function • KL Objective: • Minimize KL(true distribution || riffle independent approx) • Algorithm: • For each binary partitioning of the item set: [A,B] • Estimate parameters: (ranking probabilities, for A, B and interleaving probabilities) • Compute log likelihood of data • Return maximum likelihood partition Same as maximizing log-likelihood Need to search over exponentially many subsets! If hierarchical structure of A or B is unknown, might not have enough samples to learn parameters!

  20. Pairwise measures unlikely to be effective for detecting riffled independence! Finding fully independent sets by clustering • Why this won’t work (for riffled independence): • Pairs of candidates on opposite sides of the split can be strongly correlated in general • (If I vote up Democrat, I am likely to vote down Republican) • Computepairwise mutualinformations • Partition resulting graph A B

  21. Higher order measure of riffled independence • Key insight: • Riffled independence means: absoluterankings in A not informative about relative rankings in B Idea: measure mutual information between singleton rankings and pairwise rankings relativepreference over Republicans j & k preference over Democrat i If i, (j,k) lie on opposite sides, Mutual information=0

  22. Estimating measure: What’s the sample complexity? Tripletwise measure: no longer obviously a graph cut problem… Tripletwise objective function • Objective function: B all items in set A –plays nono role inobjective A

  23. Estimating the objective • Objective function not directly available: • Need to estimate mutual information terms • Strongly connected: Theorem: If A and B are riffle independent and each strongly connected, then OBJ is minimized exactly at [A,B] with prob. at least given samples.

  24. Efficient Splitting: Anchors heuristic • Given two elements of A, we can decide whether rest of elements are in A or B: Large Small small large anchor elements

  25. Efficient Splitting: Anchors heuristic Theorem: Anchors heuristic guaranteed to recover riffle independent split (under certain strong connectivity assumptions) In practice, anchor elements a1, a2, unknown!

  26. Learning a chain on synthetic data -5400 -5500 true structure -5600 -5700 -5800 log-likelihood -5900 learned structure -6000 -6100 random 1-chain (with learned parameters) -6200 -6300 1 2 3 4 5 16 items, 4 items in each leaf log10(# samples)

  27. Anchors on Irish election data Irish Election hierarchy (first four splits): 0.25 2 {1,2,3,4,5,6,7,8,9,10,11,12,13,14} “True” first order Learned first order 0.2 4 2 6 0.15 {1,2,3,4,5,6,7,8,9,10,11,13,14} {12} 4 Ranks 8 Sinn Fein 0.1 6 10 Ranks 8 {1,2,3,4,5,6,7,8,9,10,13,14} 0.05 12 {11} 10 Christian Solidarity 14 12 0 2 4 6 8 10 12 14 14 {2,3,5,6,7,8,9,10,14} {1,4,13} Candidates Running time 2 4 6 8 10 12 14 Fianna Fail Candidates Brute force optimization: 70.2s {2,5,6} {3,7,8,9,10,14} Anchors method: 2.3s Fine Gael Independents, Labour, Green

  28. Meath log likelihood comparison Optimized Hierarchy -4500 loglikelihood Optimized 1-chain -5000 50 85 144 243 412 698 1180 2000 # samples (logarithmic scale)

  29. Bootstrapped substructures: Irish data Major party {FF, FG} leaves recovered 1 Top partition recovered 0.8 0.6 Success rate All leaves recovered 0.4 0.2 Full tree recovered 0 76 171 389 882 2001 # samples

  30. Sushi ranking • Dataset: 5000 preference rankings of 10 types of sushi • Types • 1. Ebi (shrimp) • 2. Anago (sea eel) • 3. Maguro (tuna) • 4. Ika (squid) • 5. Uni (sea urchin) • 6. Sake (salmon roe) • 7. Tamago (egg) • 8. Toro (fatty tuna) • 9. Tekka-make (tuna roll) • 10. Kappa-maki (cucumber roll) Prob(sushi i was ranked j) Fatty tuna (Toro) is a favorite! ranks No one likes cucumber roll ! sushi

  31. Sushi hierarchy {1,2,3,4,5,6,7,8,9,10} {2} {1,3,4,5,6,7,8,9,10} (sea eel) {4} {1,3,5,6,7,8,9,10} (squid) {1,3,7,8,9,10} {5,6} (sea urchin, salmoe roe) {3,7,8,9,10} {1} (shrimp) {3,8,9} {7,10} (tuna, fatty tuna, tuna roll) (egg, cucumber roll)

  32. Riffled Independence… • is a natural notion of independence for rankings • can be exploited for efficient inference, low sample complexity • approximately holds in many real datasets • Hierarchical riffled independence • captures more structure in data • structure can be learned efficiently: • related to clustering and to graphical model structure learning • efficient algorithm & polynomial sample complexity result Acknowledgements: Brendan Murphy and Claire Gormley provided the Irish voting datasets. Discussions with Marina Meila provided important initial ideas upon which this work is based.

More Related