210 likes | 349 Views
Applications of Voting Theory to Information Mashups. Alfredo Alba Varun Bhagwan Julia Grace Daniel Gruhl Kevin Haas Meenakshi Nagarajan Jan Pieper Christine Robson Nachiketa Sahoo. Overview. BBC approached IBM in 2007
E N D
Applications of Voting Theory to Information Mashups Alfredo Alba Varun Bhagwan Julia Grace Daniel Gruhl Kevin Haas Meenakshi Nagarajan Jan Pieper Christine Robson Nachiketa Sahoo Julia Grace, IBM Almaden Research
Overview • BBC approached IBM in 2007 • Goal: Create a better music chart that is more reflective of current tastes and trends in popular music • Billboard charts are no longer relevant • Do not reflect music listened to and purchased online • Looked to online music communities for data • page views, music listens, blog posts • We needed a way of combing these sources • Y.A.M.? (Yet another mashup?) Julia Grace, IBM Almaden Research
Overview • Traditional Mashup • Google Maps + Craigslist • Music Mashups • Interweaving 2 tracks • Always same modalities • Similar, homogenous data sets Combine “like” data by simple summation • Information Mashup • Data from disparate online music communities • Different modalities [views, listens, posts] Julia Grace, IBM Almaden Research
Overview • New means of combining/mashing our data • New methodolgy for mashups • Our Approach • Voting theory • Think of our data sources as constituents in an election Julia Grace, IBM Almaden Research
Music Mashup • End Goal:Gauge Popularity • Challenging • Diverse data silos • Different sites have different demographics and user bases • Data volumes vary widely • MySpace: 13,697,565 • Bebo: 10,194 • Data itself comes in different flavors How do you represent the “voices” of each of these music communities in a single Top-10 list? Julia Grace, IBM Almaden Research
Voting Theory • Voting Systems • Designed to combine many “voices” into a single decision that is representative of all communities Different voting systems have Different priorities resulting in Different outcomes • You have to choose the voting system that is right for your circumstances • We are not going to invent a new voting system • Examine several well-known systems. Julia Grace, IBM Almaden Research
Example: US Presidential Election • US Presidential Election uses Delegate System • Guarantees states with larger populations don’t always drastically sway elections • This methodology was used because at the time of implementation, that was what was important • Bush vs. Gore 2000 Presidential Election Julia Grace, IBM Almaden Research
How to Choose a Voting System? • Voting theory: how “good” your voting system is varies from person to person and situation to situation • Metric is needed to gauge the quality of a voting system for your circumstances. Julia Grace, IBM Almaden Research
How to Choose a Voting System? • Example: • Delegate System in United States • Equal voice for each state by population was the priority • How to evaluate the quality Top-10 list? • Ideally we would create lists and perform a massive user study to determine the best voting system • This is not feasible and does not scale • So we need some heuristics to gauge the quality of our lists Fortunately, this is a solved problem… Voting theory employs a “Social Welfare Function” to gauge the quality of a voting system Julia Grace, IBM Almaden Research
Social Welfare Functions • What is a Social Welfare Function? definition: “Mathematical means to quantify the attributes that you prioritize in a voting system (i.e. all communities have a voice, most popular candidate wins)” • “Simple example” • Situation: People only care if their first choice wins the election • Resulting Social Welfare Function: Measure how many people had their first choice picked • We will use a Social Welfare Function to measure what is the best voting system to use to combine the data in our mashup to generate our Top-10 list of music Julia Grace, IBM Almaden Research
Well established Social Welfare Functions • Spearman Footrule: Christine • Type A personality • Preservation of position in the rankings • Entire ranking reflecting accurately • middle range artists should be in the middle, low range towards the end, etc. • For example: • Christine ranked Coldplay #2, so she will be happy if Coldplay is #2 in the final list • Precision Optimal Aggregation: Julia • Representation (not rank) • For example: • Julia had Rihanna in her list, so she would like Rihanna to be in the final list Julia Grace, IBM Almaden Research
Voting Systems • We evaluated 8 well established voting systems • Important to keep in mind • We are not electing a single candidate, we are creating a rank-ordered list • Position of artists matters just as much as who is #1 Julia Grace, IBM Almaden Research
Voting Systems • Total vote (i.e. election by popular vote) • Tally counts, listens, etc. regardless of “type” of data • Easy to understand, very transparent • Modalities with very large amounts of data tend dominate the vote • Weighted votes • Use a multiplier so that postings count more than listens, delegate, count rank • Semi-Proportional • Each source gets the same number of votes regardless of how many people vote • Delegates • Each source gets a set number of votes, decided in advance • Simple Rank (Naru) • Every candidate gets a position vote – person with the smallest number is the winner • Inverse Rank • Close to Rank except use 1/number and biggest number wins more weight to being close to top of list • Run-off • When ½ the sources agree on a candidate that candidate is elected Julia Grace, IBM Almaden Research
Election Setup • Data preparation: Crawled, extracted, cleaned, mined, analyzed… • Applied a voting system • Total Vote, Naru, Run-off, etc. • Ouput: Top-10 list of popular artists • Tested Top-10 list against SWF • Is Julia happy? • Is Christine happy? Julia Grace, IBM Almaden Research
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Total Votes • Total Votes: simple summing Key YouTube video view counts are so high they dominate all other communities Precision Optimal Aggregation SWF Spearman Footrule SWF Contribution of combined ranking for the artist from each source Explanation: YouTube dominates all other music communities – it was coincidental that Bebo was also able to contribute to the rankings YouTube and Bebo are nearly sole contributors to Rihanna being #1 Julia Grace, IBM Almaden Research
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Naru • Election system used on Pacific Island nation of Naru Key All communities contribute! Precision Optimal Aggregation SWF Spearman Footrule SWF Contribution of combined ranking for the artist from each source Naru maximized the Precision Optimal Aggregation SWF Significantly more even distribution of sources Julia Grace, IBM Almaden Research
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 Run-off • From the top, select artists one at a time from each source in a fixed order Key All communities contribute! Precision Optimal Aggregation SWF Spearman Footrule SWF Contribution of combined ranking for the artist from each source Run-off maximized the Spearman Footrule SWF Significantly more even distribution of sources Julia Grace, IBM Almaden Research
http://www.bbc.co.uk/soundindex/ Julia Grace, IBM Almaden Research
Lessons Learned • Choosing a voting methodology depends on what you prioritize • Think hard about what your Social Welfare Function • Deciding factor in how to combine data • How you measure the success of your mashup Julia Grace, IBM Almaden Research
Conclusion • Novel, new approach to mashups • We feel this is the future of information mashups from different modalities Julia Grace, IBM Almaden Research
Thank you • Any Questions • Julia Grace (jhgrace@us.ibm.com) Julia Grace, IBM Almaden Research