1 / 34

SpotRank : A Robust Voting System for Social News Websites

SpotRank : A Robust Voting System for Social News Websites. Thomas Largillier , Guillaume Peyronnet , Sylvain Peyronnet Univ Paris- Sud LRI, Nalrem Mdeias , Univ Paris- Sud LRI WICOW’10 January 26 2011 Presented by Somin Kim. Outline. Introduction Related Work

roddy
Download Presentation

SpotRank : A Robust Voting System for Social News Websites

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SpotRank: A Robust Voting System for Social News Websites Thomas Largillier, Guillaume Peyronnet, Sylvain Peyronnet Univ Paris-Sud LRI, NalremMdeias, Univ Paris-Sud LRI WICOW’10 January 26 2011 Presented by Somin Kim

  2. Outline • Introduction • Related Work • SpotRank Algorithm • Experiments • Conclusion

  3. Introduction • In social news website, users share content they found on the web and can vote for the news they like the most • Voting for a news is then considered as a recommendation • News with a sufficient number of recommendations are displayed on front page.

  4. Introduction • It is tempting for a user to use malicious techniques in order to obtain a good visibility for his websites • Being on the front page of a website such as Digg seems to be very interesting and thousands of unique visitors are obtained within one day • The top users are acting together in order to have websites they support displayed on the front page • Using daily mailing list • Posting hundreds of links • Voting for themselves

  5. Outline • Introduction • Related Work • SpotRank Algorithm • Experiments • Conclusion

  6. Related Work • Spam countermeasures for social websites • Identification-based methods : detection of spam and spammers • Ranked-based methods : demotion of spam • Limit-based methods : preventing spam by making spam content difficult to publish • A related field of research • Machine learning based ranking framework for social media • Detection of click fraud in the Pay Per Click • Giving to users a good selection of news • We focus on techniques that demote votes that are malicious, or done by users known to be malicious

  7. Outline • Introduction • Related Work • SpotRank Algorithm • Framework and principle • Proposing a spot • Voting for a spot • Detecting cabals • Experiments • Conclusion

  8. SpotRank AlgorithmFramework and principle • U : a community of users who use the voting system • S : the set of spots • Spot : news or content proposed by any user • V : the set of all votes • Vote is a triple of (u, s, v) where u, v ∈U and s ∈ S • Some notations :

  9. SpotRank AlgorithmFramework and principle • Two votes do not necessarily have the same value • A score to each vote will be assigned depending on many factors • The higher the score of a spot, the closer to the first place is the spot. • Pertinence • The pertinence of a user depends on the pertinence of the spots he voted for, and vice versa

  10. SpotRank AlgorithmFramework and principle • Voting process of SpotRank

  11. SpotRank AlgorithmProposing a spot • When a user proposes a spot it is necessary to initialize its score • n : the number of spots proposed by the user in the last 24 hours • m : the number of spots previously posted from the user’s IP in the last 20 minutes • With this formula, we prevent the effective “spot bombing” from spammers

  12. SpotRank AlgorithmVoting for a spot • Once a spot has been proposed, it can be “pushed” to the front page according to its score • The base score of a vote is the pertinence of the voter • This value is then modified according to several criteria to provide its score • The voting part is the most important part where the spammers will concentrate • We propose a set of filters whose aim is to counter all the attacks a spammer could think of

  13. SpotRank AlgorithmVoting for a spot • Base value of a vote : pertinence • Pert(u) is the mean value of the pertinence of the spots u voted for • Pert(s) is its score divided by the number of votes it received

  14. SpotRank AlgorithmVoting for a spot • High frequency voting • A typical spammer votes for a lot of spots in a short amount of time • α4 is the time interval that is reasonable between two votes

  15. SpotRank AlgorithmVoting for a spot • Abusiveone-way voting • A typical spammer uses several accounts • One clean account to propose spots • Several disposable accounts to vote for the spots proposed by the clean account • Users that vote only for one specific user will have their vote becoming useless

  16. SpotRank AlgorithmVoting for a spot • Quick voting • The behavior of a spammer is to propose a spot and to quickly vote for it • A spammer will not stay a long time on one given website • To avoid quick voting we block any vote in the first minute of appearance of the spot s on the site and after that we use a stair function time(s) • t : current time

  17. SpotRank AlgorithmVoting for a spot • Multiple avatars and physical community • SpotRank demotes votes for a given spot if they come from the same IP address • A typical spammer will have many accounts, sometimes he will also have automatic voting mechanisms • These voting bots are often located on only a few servers, so they share the same IP address (or only very few IPs addresses) • n : number of previous votes from this IP address

  18. SpotRank AlgorithmVoting for a spot • Avoiding the voting list effect • A group of people can unite their efforts in order to promote their own spots • This is classically done through daily mailing lists • if a user u votes for a user u’ and both users are in the same cluster then the value of the vote is weighted by the inverse of the size of this cluster

  19. SpotRank AlgorithmVoting for a spot • Summary : Computation of the actual score of a vote

  20. SpotRank AlgorithmVoting for a spot • Computation of the score of a spot • The score of a spot is simply the sum of all votes for this spot and of the initial score of the spot • The score of a spot s is updated each time a user votes for it, but also periodically since the value of time decay varies over time • Time decay is used to promote new spots against old strong spots

  21. SpotRank AlgorithmDetecting cabals • We propose here to regroup people that massively vote between themselves • We use the following algorithm that should be run regularly to identify new cabals and actualize the existing ones

  22. Outline • Introduction • Related Work • SpotRank Algorithm • Experiments • Log analysis of spotrank.fr • Human evaluation • Conclusion

  23. Experiments • In order to collect data about the behavior of SpotRank, spotrank.fr has been launched • The data were collected from 09/07/2009 to 10/26/2009 • 15600 visits, 43000 page views • Average time spent by a visitor on the website : 2:37 minutes • We estimated that at least 10 to 15% of accounts belong to spammers

  24. ExperimentsLog analysis of spotrank.fr • % of users with regard to pertinence • As time goes and the number of users grows, the pertinence of the users tends to spread more 2009/07/23 2009/09/08 2009/10/26 • Two categories of users • the non-relevant users : pertinence (u) < 10 • It contains mainly spammers • the relevant users : pertinence(u) > 50 (except newcomers )

  25. ExperimentsLog analysis of spotrank.fr • % of low and high pertinent users with regard to time (during 3 months) • The percentage of non-relevant users including spammers is decreasing while the percentage of relevant users is increasing

  26. ExperimentsLog analysis of spotrank.fr • # users versus # proposed spots • Majority of users proposes a few spots (less than 3) • There are few people with a oddly high number of proposed spots • Most of them are spammers

  27. ExperimentsLog analysis of spotrank.fr • % users with regard to # votes • Most users don’t vote a lot • The people that vote the most are clearly the ones we suspect to be spammers

  28. ExperimentsLog analysis of spotrank.fr • # votes versus their scores • Most of the votes have very low score • Most legitimate users seems to have votes with score between 5 and 50

  29. ExperimentsHuman evaluation • We compared the top “stories” of spotrank.fr and two other major social news websites in France • Survey protocol • Collect the first five spot on each website periodically • Generate a webpage containing a shuffle of list of 15 news • Each webpage is sent to a volunteer who has to tell for each news if, • Yes, it is relevant for the news to appear on the front page of a social news website • No, it is not relevant for the news to appear on the front page of a social news website • DnK, he is not able to determine if the news deserve to be on the front page or not • Err, the news was not accessible when he tried

  30. ExperimentsHuman evaluation • # answers of each type • The ranking given by SpotRank is of higher quality than two others • The filtering of SpotRank gives clearer results

  31. ExperimentsHuman evaluation • Rank with regard to the number of Yes, No, DnK • User satisfaction survey show clearly that the filtering of SpotRank is perceived to be of high quality Yes No DnK

  32. Outline • Introduction • Related Work • SpotRank Algorithm • Experiments • Conclusion

  33. Conclusion • We presented a robust voting system for social news website • to demote the effect of manipulation • SpotRank clearly outperforms real competitors in a real life web ecosystem

More Related