1 / 7

ZOT! To W IKIPEDI A V ANDALIS M PAN-10 @ CLEF 2010 Shared Task #2 Rebecca Maessen and James White

ZOT! To W IKIPEDI A V ANDALIS M PAN-10 @ CLEF 2010 Shared Task #2 Rebecca Maessen and James White. Data Set Low number of vandalism edits in PAN@CLEF data gave little information on patterns distinguishing vandalism from regular edits. Added 5,276 manually classified vandalism edits from

devona
Download Presentation

ZOT! To W IKIPEDI A V ANDALIS M PAN-10 @ CLEF 2010 Shared Task #2 Rebecca Maessen and James White

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. ZOT! To WIKIPEDIAVANDALISMPAN-10 @ CLEF 2010 Shared Task #2Rebecca Maessen and James White Data Set Low number of vandalism edits in PAN@CLEF data gave little information on patterns distinguishing vandalism from regular edits. Added 5,276 manually classified vandalism edits from West et all research Combined data set 20,283 edits • 31% “ill-intentioned” edits compared to 6% before

  2. Algorithms • Logistic regression on word vector and W-J48 decision trees on metadata features • Logistic regression and W-J48 decision trees after combining the features • Ensemble methods: Bagging, Boosting and Random Forest

  3. Combining word vector and metadata attributes

  4. Ensemble Methods

  5. 10-val cross validation on bagging with W-J48graft

  6. ROC and AUC plot

  7. Conclusion • Results better than expected beforehand • Achieve an f-measure that is just as good as results from previous works • Decision trees is a favorable approach to the Wikipedia vandalism problem • Top down feature analysis and statistical information from word vectors are both relevant to classifying vandalism

More Related