210 likes | 438 Views
Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis. Presented by: Deepika Sethi Raga Sowmya. Introduction. What is vandalism Problem-In general around 8% of Wikipedia edits are vandalized Types of vandalism Ex. abusive words, changing dates.
E N D
Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis • Presented by: • DeepikaSethi • Raga Sowmya
Introduction • What is vandalism • Problem-In general around 8% of Wikipedia edits are vandalized • Types of vandalism • Ex. abusive words, changing dates
Research goals and problem statement • Detection of vandalism using context aware approaches • Two approaches : 1. Trustworthy Ranking based on a Trustworthy search engine 2: Semantic Context Analysis using Ontology
Motivating Example • These days criminals, adults, teens and even kids are vandalizing the web pages. • As a result bogus information is provided which misleads the users. This should stop somewhere !!!
Approach 1Trustworthy Ranking based on a Trustworthy search engine
Extract Words Use search engine to collect the top ranked documents for each word Check the co-occurrence probability for each word with its corresponding page Probability too low might imply out of context and vandalism Feature extraction followed by data trained classification
Facebook • Classifier: Decision Tree
Butter chicken • Classifier : Bayes Net
Powerlifting • Classifier: Decision Tree
Folic acid • Classifier : Adaboost
United States Dollar • Classifier : Decision Tree
Extract Words Use Dbpedia to check the relationships for each word with its corresponding page Calculate the semantic distance for each word with its corresponding page Distance too high might imply out of context and vandalism Feature extraction followed by data trained classification
Butter Chicken • Classifier : DecisionTree
Facebook • Classifier : Decision Tree
Folic Acid • Classifier: Adaboost
United States Dollar • Classifier : Bayes Net
Powerlifting • Classifier : Decision tree
Comparison to related work • Current vandalism detection methods are based on either machine-learning or rule-based approaches. • Machine learning approaches, use a set of features and a set of training data to determine whether an edit is vandalism or not. • The rule based approaches manifest themselves in the form of automatic Wikipedia bots. ex. deleting all the content of an article
Conclusion • In our project, we implemented two approaches and provided a comparison for both the approaches. • We observed that the first approach gives better results than the second approach.