Fact Checking and Automatic Fact Checking Systems Review

Review on Fact Checking and Automatic Fact Checking Systems Deep Learning Research & Application Center 17 October 2017 Claire Li

Fact Checking • Factual claims are those that can be verified • The room measures ten feet by twelve feet • Fact checking is a way of knowledge-based news content verification, it assigns a truth value (can be in some degree) to a factual claim made in a particular context • Important features include Context, time, speaker, multiple sources (URL) , evidences, etc • Fact-checkers verdict factual claims by investigating relevant data and documents and publish their verdicts • Automatic fact checking systems consists of • Pre-restrict the task to claims can be fact-checking objectively (scope of the task), for example • Spot check-worthy factual claims, how,related publications: [1],[2] • Verdict check-worthy factual claims automatically, how, related publications: [1], [2]; websites: fullfact.org, • Hybrid technologies: deep learning approaches + reasoning techniques with world knowledge base • Integrating existing tools

Unsuitable claims for the task of automatic fact-checking [2] assessing causal relations, e.g. whether a statistic should be attributed to a particular law concerning the future, e.g. speculations involving oil prices not concerning facts, e.g. whether a politician is supporting certain policies (e.g. opinions, believes) statements whose verdict relied on data that were not available online such as needing personal communications And more…

Automatic Fact-checking System collect information and data; analyze claims&extract evidence; match claims with evidence; validation and explanations Claim Verdict Model Create & publish Monitor Model Claim Spotting Model Extract natural language sentences from textual/audio sources; Separate factual claims from opinions, beliefs, hyperboles, questions

Spot claims worth checking [1][2] • Match statements to ones already fact-checked claims (problem of K-nearest neighbor/semantic similarity between statements) • Create a google Custom Search Engine for claim corpus [4] • Google’s structured data with ClaimReview markup • From fact-checking websites construct publicly available fact checked local database • Hoax-Slayer, archives for fact-checked claims • politFact, more than 6,000 fact-checks • Google’s Schema.org: more than 7,000 fact-checking with ClaimReviewmarkup • channel 4, http://blogs.channel4.com/factcheck/ • Washington post • Calculating semantic similarities between sentences based on word2vec

ClaimReview as a subtype of Review. • "A fact-checking review of claims made in some creative work." • claimReviewedas a property of ClaimReview. • "A short summary of the specific claims reviewed in a ClaimReview." • author property on Review to indicate • the organization behind the review. • claimReviewSiteLogoon the (Claim)Review • the fact-checking organization's logo. • itemReviewed property • the document that carries the claims being reviewed (which could include as shown here, offline newspaper articles). • rating • how the claim was judged by the article

<script type="application/ld+json"> { "@context": "http://schema.org", "@type": "ClaimReview", "datePublished": "2014-07-23", "url": "http://www.politifact.com/texas/statements/2014/jul/23/rick-perry/rick-perry-claim-about-3000-homicides-illegal-immi/", "author": { "@type": "Organization", "url": "http://www.politifact.com/", "sameAs": "https://twitter.com/politifact"}, "claimReviewed": "More than 3,000 homicides were committed by \"illegal aliens\" over the past six years.", "reviewRating": { "@type": "Rating", "ratingValue": 1, "bestRating": 6, "worstRating": "1", "alternateName": "False", "image": "http://static.politifact.com/mediapage/jpgs/politifact-logo-big.jpg"}, "itemReviewed": { "@type": "CreativeWork", "author": { "@type": "Person", "name": "Rich Perry", "jobTitle": "Former Governor of Texas", "image": "https://upload.wikimedia.org/wikipedia/commons/thumb/1/15/Gov._Perry_CPAC_February_2015.jpg/440px-Gov._Perry_CPAC_February_2015.jpg"}, "datePublished": "2014-07-17", "name": "The St. Petersburg Times interview [...]" }}</script>

<script type="application/ld+json"> { "@context": "http://schema.org", "@type": "ClaimReview", "datePublished": "2016-06-22", "url": "http://example.com/news/science/worldisflat.html", "itemReviewed":{ "@type": "CreativeWork", "author": { "@type": "Organization", "name": "Square World Society", "sameAs": https://example.flatworlders.com/we-know-that-the-world-is-flat}, "datePublished": "2016-06-20“}, "claimReviewed": "The world is flat", "author":{ "@type": "Organization", "name": "Example.com science watch“}, "reviewRating": { "@type": "Rating", "ratingValue": "1", "bestRating": "5", "worstRating": "1", "alternateName" : "False“} }</script>

Spot claims worth checking • Identifying new factual claims that have not been fact checked before in new text • Use machine learning algorithms to detect ‘check-worthy claims’, related publications • ClaimBuster: a platform that allows you to score political sentencesto assess how check-worthy they are • uses a human-labeled dataset of check-worthy factual claims from the U.S. general election debate transcripts • learns from labelled check-worthy sentences, identifies features they have in common, then looks for these features in new sentences

Claim Verdict Model • Information retrieval from free open source search engines • Given claims spotted, search for documents contain relevant fact checks or evidences • Ranking and classification problem • Apache Lucene, in Java, cross-platform • fuzzy searches: e.g. roam~0.8, find terms similar in spelling to roam as 0.8 • proximityquery: e.g., "Barack michellea"~10 • range query, title:{Aida TO Carmen} • phrase query: e.g., “new york" • used by infomedia, Bloomberg , and Twitter’s real time searching • Apache Solr (better for text search) and Elastic Search (better for complex time series search and aggregations) • Solr/elasticsearch are built on top of Lucene • Basic Queries, text: obama, all docs with text field containing obama • Phrase query, text: “Obama michellea” • Proximity query, text: ”big analytics”~1, big analytics, big data analytics • Boolean query, solr AND search OR facet NOT highlight • Range query, age: [18 To 30] • Used by Netflix, eBay, Instagram, and Amazon CloudSearch

Claim Verdict Model [1][2] Monitor Model Claim Spotting Model Claim Verdict Model Create & publish LSTMs True Mostly true Half true Half false Mostly false False LSTMs

Claim Verdict Model: extraction of evidence

Claim Verdict Model: Claim Validation True, mostly true, half true, half false, mostly false, false

Related works 2015-Computational Fact Checking from Knowledge Networks, PLoSOne 2017towards automated fact-checking-detecting check-worthy factual claims by claimBuster 2017-Fully Automated Fact Checking Using External Sources 2017-ClaimBuster:The First-ever End-to-end Fact-checking System.pdf Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking, emnlp 2017

Reference Sarah Cohen, Chengkai Li, Jun Yang, and Cong Yu. 2011. Computational journalism: A call to arms to database researchers. In Proceedings of the Conference on Innovative Data Systems Research, volume 2011, pages 148–151. Fact Checking: Task definition and dataset construction.pdf In Proceedings of the ACL 2014 Workshop on Language Technologies and Computational Social Science, Baltimore, MD N. Hassan, B. Adair, J. T. Hamilton, C. Li, M. Tremayne, J. Yang, and C. Yu. The quest to automate fact-checking. In Computation Journalism Symposium, 2015 Creating Custom Search Engine with configuration files https://developers.google.com/custom-search/docs/basics

Fact Checking and Automatic Fact Checking Systems Review