130 likes | 263 Views
16. April 2012. Measuring the Quality of Web Content using Factual Information . WebQuality 2012 workshop at WWW 2012. Elisabeth Lex, Michael Voelske , Marcelo Errecalde , Edgardo Ferretti , Leticia Cagnina , Christopher Horn, Benno Stein and Michael Granitzer. Agenda .
E N D
16. April 2012 Measuringthe Quality of Web Content usingFactual Information WebQuality2012 workshop at WWW 2012 Elisabeth Lex, Michael Voelske, Marcelo Errecalde, Edgardo Ferretti, Leticia Cagnina, Christopher Horn, Benno Stein and Michael Granitzer
Agenda • Motivation • Approach • Results • Summary and Outlook
Motivation • People‘sdecisionsoftenbased on Web content • lackingqualitycontrol, noverification • Inaccurate, incorrectinfomation • Nofactchecking • Measuresneededtocapturecredibilityandqualityaspects • In respecttofacts!
Approach • Measure information quality based on factual information • 3 Approaches: • Use simple statistics about the facts obtained from text • Exploit relational information contained in facts • Use semantic relationships like meronymy and hypernymy • First approach: • Use simple statistical features about facts in a document • Indicates how informative a document is • Derive facts from Web content using Open Information Extraction
Definition ofFactualDensity • Fact Count • Factual Density
Experiments • Wikipedia: 1000 FeaturedandGoodarticles versus 1000 Non-Featured (randomlyselected) • Featured: a comprehensivecoverageofthemajorfacts in thecontextofthearticle’ssubject • Baseline: Word Count [Blumenstock 2008] • Featuredarticleslongerthan non-featured • Bias: longerdocscontainmorefacts • Evaluation: 2 Datasets • Unbalanced: articlesdiffer in length • Balanced: articlessimilar in length
Experiments – Relational Features • Approach 2: exploiting relational informationcontained in facts • Extract relational featuresfromarticles • UserelationsfromReVerb: binaryrelations (e1, relation, e2) • Usethemtotrain a classifiertodiscriminatebetweenfeatured/goodand non-featured
Experiments – Relational Features • Approach 2: exploiting relational informationcontained in facts • Extract relational featuresfromarticles • UserelationsfromReVerb: binaryrelations (e1, relation, e2) • Usethemtotrain a classifiertodiscriminatebetweenfeatured/goodand non-featured
Summary • Simple factrelatedmeasure: FactualDensity • Based on FactualDensity, featured/goodarticlescanbeseparatedfrom non-featuredifarticlelengthsimilar • Ifarticlesdiffer in length, wordcount! Forfuturework, combinationofboth • Plan toincorporateedithistory: moreeditors, higherfactualdensity • Preliminaryexperimentswith relational features • Promising results, morework in thisdirection • Goal hereisto bring semantics in tothefieldof Information Quality • Weexpectthistounlockseveral IQ dimensions, e.g. generalityvsspecificity
Thankyouforyourattention! • Elisabeth Lex • elex@know-center.at