1 / 21

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis. Presented by: Deepika Sethi Raga Sowmya. Introduction. What is vandalism Problem-In general around 8% of Wikipedia edits are vandalized Types of vandalism Ex. abusive words, changing dates.

abby
Download Presentation

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis • Presented by: • DeepikaSethi • Raga Sowmya

  2. Introduction • What is vandalism • Problem-In general around 8% of Wikipedia edits are vandalized • Types of vandalism • Ex. abusive words, changing dates

  3. Research goals and problem statement • Detection of vandalism using context aware approaches • Two approaches : 1. Trustworthy Ranking based on a Trustworthy search engine 2: Semantic Context Analysis using Ontology

  4. Motivating Example • These days criminals, adults, teens and even kids are vandalizing the web pages. • As a result bogus information is provided which misleads the users. This should stop somewhere !!!

  5. Approach 1Trustworthy Ranking based on a Trustworthy search engine

  6. Extract Words Use search engine to collect the top ranked documents for each word Check the co-occurrence probability for each word with its corresponding page Probability too low might imply out of context and vandalism Feature extraction followed by data trained classification

  7. Facebook • Classifier: Decision Tree

  8. Butter chicken • Classifier : Bayes Net

  9. Powerlifting • Classifier: Decision Tree

  10. Folic acid • Classifier : Adaboost

  11. United States Dollar • Classifier : Decision Tree

  12. Approach 2Semantic Context Analysis using Ontology

  13. Extract Words Use Dbpedia to check the relationships for each word with its corresponding page Calculate the semantic distance for each word with its corresponding page Distance too high might imply out of context and vandalism Feature extraction followed by data trained classification

  14. Butter Chicken • Classifier : DecisionTree

  15. Facebook • Classifier : Decision Tree

  16. Folic Acid • Classifier: Adaboost

  17. United States Dollar • Classifier : Bayes Net

  18. Powerlifting • Classifier : Decision tree

  19. Comparison to related work • Current vandalism detection methods are based on either machine-learning or rule-based approaches. • Machine learning approaches, use a set of features and a set of training data to determine whether an edit is vandalism or not. • The rule based approaches manifest themselves in the form of automatic Wikipedia bots. ex. deleting all the content of an article

  20. Conclusion • In our project, we implemented two approaches and provided a comparison for both the approaches. • We observed that the first approach gives better results than the second approach.

More Related