Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis • Presented by: • DeepikaSethi • Raga Sowmya

Introduction • What is vandalism • Problem-In general around 8% of Wikipedia edits are vandalized • Types of vandalism • Ex. abusive words, changing dates

Research goals and problem statement • Detection of vandalism using context aware approaches • Two approaches : 1. Trustworthy Ranking based on a Trustworthy search engine 2: Semantic Context Analysis using Ontology

Motivating Example • These days criminals, adults, teens and even kids are vandalizing the web pages. • As a result bogus information is provided which misleads the users. This should stop somewhere !!!

Approach 1Trustworthy Ranking based on a Trustworthy search engine

Extract Words Use search engine to collect the top ranked documents for each word Check the co-occurrence probability for each word with its corresponding page Probability too low might imply out of context and vandalism Feature extraction followed by data trained classification

Facebook • Classifier: Decision Tree

Butter chicken • Classifier : Bayes Net

Powerlifting • Classifier: Decision Tree

Folic acid • Classifier : Adaboost

United States Dollar • Classifier : Decision Tree

Approach 2Semantic Context Analysis using Ontology

Extract Words Use Dbpedia to check the relationships for each word with its corresponding page Calculate the semantic distance for each word with its corresponding page Distance too high might imply out of context and vandalism Feature extraction followed by data trained classification

Butter Chicken • Classifier : DecisionTree

Facebook • Classifier : Decision Tree

Folic Acid • Classifier: Adaboost

United States Dollar • Classifier : Bayes Net

Powerlifting • Classifier : Decision tree

Comparison to related work • Current vandalism detection methods are based on either machine-learning or rule-based approaches. • Machine learning approaches, use a set of features and a set of training data to determine whether an edit is vandalism or not. • The rule based approaches manifest themselves in the form of automatic Wikipedia bots. ex. deleting all the content of an article

Conclusion • In our project, we implemented two approaches and provided a comparison for both the approaches. • We observed that the first approach gives better results than the second approach.

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis

Presentation Transcript

Generating Semantic Annotations for Frequent Patterns Using Context Analysis

Computing semantic relatedness using Wikipedia features

Wikipedia Vandalism Detection : Combining Natural Language, Metadata, and Reputation Features

Vandalism Detection in Wikipedia using Trustworthy Ranking and Semantic Context Analysis

Introduction Vandalism -deliberate activity that compromises Wikipedia integrity.

A semantic approach for question classification using WordNet and Wikipedia

Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion

Semantic Wikipedia The missing links

NOVELTY DETECTION THROUGH SEMANTIC CONTEXT MODELLING

Context-based Visual Concept Detection Using Domain Adaptive Semantic Diffusion

Trustworthy Semantic Webs Building Geospatial Semantic Webs

Semantic Processing with Context Analysis

Context-sensitive ranking

Trustworthy Semantic Webs

Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis

Context Aware Semantic Association Ranking

Trustworthy Semantic Web

Entity Ranking Using Wikipedia as a Pivot

Computing Semantic Relatedness using Wikipedia-based Explicit Semantic Analysis

Semantic Processing with Context Analysis

Building Trustworthy Semantic Webs

Trustworthy Semantic Webs