1 / 50

Collaborative work beneath the surface

Collaborative work beneath the surface. Visitors only look at article pages But much of Wikipedia comprised of other pages Conflict resolution, coordination, policies and procedures. Types of work. Talk, user, procedure. Article. Direct work Immediately consumable. Indirect work

nola-duffy
Download Presentation

Collaborative work beneath the surface

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Collaborative work beneath the surface • Visitors only look at article pages • But much of Wikipedia comprised of other pages • Conflict resolution, coordination, policies and procedures

  2. Types of work Talk, user, procedure Article Direct work Immediately consumable Indirect work Coordination, conflict Maintenance work Reverts, vandalism

  3. Less direct work • Decrease in proportion of edits to article page 70%

  4. More indirect work • Increase in proportion of edits to user talk 8%

  5. More indirect work • Increase in proportion of edits to user talk • Increase in proportion of edits to procedure 11%

  6. More maintenance work • Increase in proportion of edits that are reverts 7%

  7. More wasted work • Increase in proportion of edits that are reverts • Increase in proportion of edits reverting vandalism 1-2%

  8. Global level • Coordination costs are growing • Less direct work (articles) • More indirect work (article talk, user, procedure) • More maintenance work (reverts, vandalism) Kittur, Suh, Pendleton, & Chi, 2007

  9. Article lifespan • How do articles change over time? • High discussion and coordination • Kittur et al., 2007; Viegas et al., 2007 • When does this happen? • Hyp 1: Early when articles are growing • Hyp 2: Late when articles are more stable

  10. Article lifespan

  11. User lifespan • How do users change over time?

  12. Centralization in Wikipedia • How much centralization? • “Gang of 500” (Jimmy Wales, 2004) • Small group of ~500 does half the work • Masses do the work (Aaron Swartz, 2006) • New users add most of the words

  13. Hypotheses • Masses dominate • Elite privileged group • Shift from elites to masses • Technology adoption (Rogers, 1962) Masses Elites Shift

  14. Elites • Admins • Editing status (fixed-size) • Editing status (scaling)

  15. Admins • Waxing and waning of admin influence Proportion of all edits Nature News, 2/2007; Kittur, Chi, Pendleton, Suh, Mytkowicz, 2007

  16. Admins • Similar for changed words Proportion of words changed

  17. Elites • Admins • Editing status (fixed-size) • Editing status (scaling)

  18. Editing status (fixed size)

  19. Elites • Admins • Editing status (fixed-size) • Editing status (scaling)

  20. Editing status (scaling) • Proportional influence of elites still high • Though absolute number of elites growing

  21. Summary: Centralization • Centralized elite influence is waning • Decline in admin influence • Decline in data-driven “Gang of 500” • Decentralized proportional influence remains high • Top 1/3/5% of users account for ~50/70/80% of edits • The “Bourgeosie”

  22. Challenges for Wikipedia • Coordination costs • Organization structure • Conflict

  23. Characterizing conflict

  24. Conflict at the article level • What leads to conflict in articles? • Build a characterization model of article conflict • Identify page features and metrics associated with conflict • Automatically identify high-conflict articles

  25. Page metrics • Chose metrics for identifying conflict in articles • Easily computable, scalable

  26. Defining conflict • Operational definition for conflict • Revisions tagged controversial • Conflict revision count

  27. Machine learning • Predict conflict from page metrics • Training set of “controversial” pages • Support vector machine regression predicting # controversial revisions (SMOreg; Smola & Scholkopf, 1998) • Not just conflict/no conflict, but how much conflict

  28. Performance: Cross-validation • 5x cross-validation, R2 = 0.897

  29. Performance: Cross-validation • 5x cross-validation, R2 = 0.897

  30. Determinants of conflict • —Revisions (talk) • —Minor edits (talk) • ˜Unique editors (talk) • —Revisions (article) • ˜Unique editors (article) • —Anonymous edits (talk) • ˜Anonymous edits (article) Highly weighted metrics of conflict model:

  31. Identifying untagged articles • Detect conflicts for unlabeled articles • Majority of articles have never been conflict tagged • Testing model generalization • Applied model to untagged articles • Sample of 28 articles rated by 13 expert Wikipedians • Significant positive correlation with predicted scores • By rank correlation, p < 0.013 (Spearman’s rho)

  32. Characterizing conflict

  33. Conflict at the user level • How can we identify conflict between users? • Reverts between users as a proxy for user conflict • Force directed layout to cluster users • Group similar viewpoints • Find conflicts between groups

  34. Group D Group A Group C Group B Dokdo/Takeshima opinion groups

  35. Terry Schiavo Anonymous (vandals/spammers) Sympathetic to husband Mediators Sympathetic to parents

  36. Cognitive atlas

  37. Visualizing hypotheses

  38. Distributed collaboration • Lots of people • Each doing a little bit of work • Leads to high quality outcome (i.e., “wisdom of crowds”) Francis Galton Scale Ox

  39. Distributed collaboration • Applications of distributed collaboration: • Judging: weight of an ox, temperature of a room • Search: Google PageRank • Predicting: Iowa Electronic Market, Las Vegas, HP • Filtering: Digg, Reddit • Organizing: del.icio.us • Common characteristics: • Independent judgments • Independent aggregation

  40. Wikipedia and the wisdom of crowds • But these are not characteristic of Wikipedia: • Independent judgments • High coordination costs(Kittur et al., 2007) • Independent aggregation • Competitive aggregation(everyone is editing the same information) • To the extent that judgments and aggregation of individual tasks are not independent and instead require coordination and engender conflict, having more editors may not be beneficial and may even be harmful

  41. Travesty of the commoners? • Increasing size of group generally has negative consequences: • Increased coordination costs • Increased anonymity and social loafing • Decreased attribution and individual reward • More negative social relations • Greater conflict and misbehavior • Loss of control • Cognitive overload see Bettenhausen, 1991; Levine & Moreland, 1990

  42. Wilkinson & Huberman, 2007 • Examined featured articles vs. non-featured articles • Controlling for PageRank (i.e., popularity) • Featured articles = more edits, more editors • “More work, better outcome”: WP similar to other distributed collaboration systems Nature News (2/27/07)

  43. Problem: Distribution of work • However, articles can have different distributions of work, even with same edits/editors • If an article has 1000 edits and 100 editors, it could have: • 1 editor making 901 edits, 99 making 1 edit • 100 editors making 10 edits each <>

  44. Capturing skew • Gini coefficient: measures inequality of distribution • Measure Gini coefficient for each article • Count how many edits each editor makes, calculate ratio • If an article is driven by few, gini -> 1 • If an article is driven by many, gini -> 0 http://en.wikipedia.org/wiki/Gini_coefficient

  45. Old results

  46. P(Featured | Gini quintile)

  47. Unique editors x Edits

  48. New results • Sampled articles at a variety of quality levels • Defined and rated by expert Wikipedians • Hundreds of thousands of articles rated

  49. Cross-sectional analysis • 900 articles sampled from Start through Featured • Higher quality associated with higher gini, higher editors

More Related