170 likes | 272 Views
Reputation Systems for Open Collaboration. CACM 2010 Bo Adler, Luca de Alfaro, Ashutosh Kulshreshtha , Ian Pye. Reviewed by : Minghao Yan. Introduction. Open Collaboration: Egalitarian, meritocratic, self-organizing Efficient, but with challenges
E N D
Reputation Systems for Open Collaboration CACM 2010 Bo Adler, Luca de Alfaro, AshutoshKulshreshtha, Ian Pye Reviewed by : Minghao Yan
Reputation Systems Introduction • Open Collaboration: • Egalitarian, meritocratic, self-organizing • Efficient, but with challenges • quality: spam, vandalism • trust: how much you can rely on that? • Reputation Systems: • computes reputation scores for objects within a domain, based on the content of themselves or the external ratings. • help stem abuse • offer indications of content quality • regulates people’s interaction in open collaboraion • Relevance to our course content • recommendation system • PageRank and HITS are “page” reputation systems
Reputation Systems Content-driven vs. User-driven
Reputation Systems WikiTrust • a reputation system for wiki authors and content • goals: • incentivize users to give lasting contributions • help increase quality of content and spot vandalism • offer guide to quality of content • consists of: • user reputation system • gain reputation: when user making edits preserved later • lose reputation: when their edits undone by other users in future • content reputation system • gain reputation: when revised by high-reputation user • lose reputation: when disturbed by edits
Reputation Systems User Reputation System • assumptions: • sequence of revisions made by different author • possible to compare and measure the difference of two revisions • possible to track unchanged content across revisions • user reputation: • quality and quantity of contributions they make • contribution quality: • good quality: the change is preserved in subsequent revisions • bad quality: the change is rolled back in subsequent revisions • measure on how good the contribution is?
Reputation Systems Contribution Quality • relies on an edit distance function d: • d(r,r’) = how many words have been deleted, inserted, replaced and displaced from r to r’ • language independent b: the current revision a: a past revision c: a future revision -1 <=q( b | a, c ) <= 1 q( b | a, c ) = 1 : revision b fully preserved q( b | a, c ) = -1 : revision b fully reverted unable to judge newly created revisions!
Reputation Systems User Reputation • only consider non-negative reputation values • new user assigned reputation close to 0 • calculating revision: • 5 subsequent, 5 preceding, 2 previous by high-reputation author and 2 previous with high average text reputation • why? – to let it be difficult to subvert • calculating user reputation: • r(B) = k * d(a,b) * q(b | a,c) * log(r(C)) • r(B) is reputation increment of author B of revision b • r(C) is reputation of author C of revision c • why using logarithm? – balances the influence of reputation contribution between users
Reputation Systems User Reputation • resistant to manipulation • only way to damage reputation is to revert revision • maintain fairness, resistant to sybil attack • increase reputation of B only if C has higher reputation • sybil attack – creating fake identities to gain reputation • evaluation • ability of using user reputation to predict quality of future contribution • recall is high: high-reputation user are unlikely to be reverted • precision is low: many novice authors make good contributions
Reputation Systems Content Reputation • informative, robust, explainable • how ? – according to which the content has been revised, and the reputation of the author of the revision • edit part – assigned small faction of the author’s reputation • unchanged part – gains reputation • tweaks • deleting, re-arranging text – low reputation mark • raise reputation only up to author’s own reputation • associate word with last few editing authors who raised the text’s reputation • block moves • adopting edit distance weight
Reputation Systems Crowdsensus • a reputation system to analyze user edits to Google Maps • goals • measure accuracy of users contributing information • reconstruct possible correct listing information • design space • relies on the existence of ground truth • user reputation is not visible • identity notion is stronger • global computation is possible
Reputation Systems Crowdsensus • input • triple(u, a, v) – user u asserts attribute a has value v • structure– fixpoint graph algorithm • vertices are users and attributes • for each (u, a, v), insert an edge valued v from u to a and back • each user vertex is associated with a truthfulness value qu • iterations • all qu are initialized to an a-priori default • user vertex send (q, v) pairs to attribute vertex • attribute inference algorithm to derive the probability distribution over (v1, v2, ..., vn) • send back the user vertex the probability of vi is correct • truthfulness inference algorithm estimates the truthfulness of users • go for another iteration
Reputation Systems Crowdsensus • heart of crowdsensus – attribute inference algorithm • standard algorithm – Bayesian inference • bad for real cases • information are not independent • business attributes have different characteristics • complete system • for multiple correct value attributes • dealing with spam • protecting system from abuse • integrated with other data pipeline components
Reputation Systems Design Space • content-driven vs. user-driven • reputation system visible to user? • week identity vs. strong identity • existence of ground truth • affect which algorithm used • chronological vs. global reputation updates • global model can utilize information in graph topology (PageRank, HITS) • chronological model can leverage past and future to prevent attack(sybil attack)
Reputation Systems Design Space
Reputation Systems Conclusion • reputation systems are the on-line equivalent of the body of laws regulates real-world people interactions • reputation systems provide ways for users to evaluate content and improve trust level • design of reputation systems should leverage different aspects • reputation systems should be robust, and invulnerable to attacks (or their is no trust) • reputation systems with population-dynamic approach • reputation systems with multiple goals
Reputation Systems Pros • well defined reputation systems characteristics and goals • discussion on design aspects and influence on reputation systems • detail level wikitrust implementation tweaks for preventing system from abuse and attacks • comparison of two content-driven systems well illustrated and supported the discussion of system design considerations • provided good evaluation measures of systems accuracy on wiki real data
Reputation Systems Cons • lack of deeper explanation of algorithms in Crowdsensus • lack of evidence of Crowdsensus algorithm’s better performance than standard Bayesian inference on real data • lack of comparison between user-driven and content-driven model’s performance and how these two can work together