210 likes | 220 Views
Problem Statement of Factual Claim Validation - Domains, topics, and task specification. Deep Learning Research & Application Center 13 November 2017 Claire Li. Fact checking - claim validation. Most Popular fact checking web sites What are the popular domains
E N D
Problem Statement of Factual Claim Validation - Domains, topics, and task specification Deep Learning Research & Application Center 13November 2017 Claire Li
Fact checking - claim validation • Most Popular fact checking web sites • What are the popular domains • What are the most concerned topics • Proposed problem scope & task • Domains and Topics • Task Specification • Agreement based Source-claim Iterative Models for Truth Discovery
The 6 best sites for manual political fact checking on the Internet • Politifact • For finding the truth in American politics • FactCheck.org • for Getting Political Facts • monitor the factual accuracy of what is said by the president and top administration officials, as well as congressional and party leaders • focus on claims that are false or misleading, all stories with sources provided • Washington Post‘s Fact Checker • the site assesses claims made by politicians or political advocacy groups and gives out Pinochios (mostly true, half true, half false, whoppers) based on its level of accuracy • OpenSecrets • On money’s influence on U.S. elections and public policy by tracking tracks money in U.S. politics and its effect on elections and public policy • RESTful APIs for machine-readable access to the data we display on OpenSecrets and OpenData available for academic researches • The Sunlight Foundation • sunlight uses the power of the Internet to catalyze greater government openness and transparency • The open government work now takes place at the local, state, federal and international levels • Citizens for Responsibility and Ethics in Washington (CREW) • providing a necessary knowledge base for building a better political system
Popular Fact Check Websites for Urban Legends, Religion, Education economy • Snopes.com • true, false, mostly true, mostly false, mixture, legend, unproven • On the domain of Urban Legends and Rumors since 2007 • topics include college90, computers198, religion72, food234, travel, business450, political news, horrors-234, crimes432, history108, media matters324 • Political figures: Trump922; Obama-338; hillary324 • TruthOrFiction.com • truth, fiction, mostly truth, mostly fiction, truth&fiction, commentary, unproven • Email Reality Check • Topics include education51, government380, food/drink150, religion360, health/medical80, holidays, immigration40, politics740, business60, crimes280, terrorism210, viruses, war • Political figures: Obama210, trump20 • ABC News • negative ruling, positive ruling, in between • determines the accuracy of claims by politicians, public figures, advocacy groups and institutions engaged in the public debate • Topics include economy, immigration, environment, education, health • Wikipedia • Wikipedia founder Jimmy Wales has launched a crowd-funded news platform, WikiTribune, aims to combat fake news with the help of journalists and fact-checkers. • initially launched in English and hopes to expand to other languages • Media Bias/Fact Check (MBFC) • True, mostly true, mostly false, blatant lie • News in least biased, left bias news, lift-center bias news(slight to moderate liberal bias), right-center bias news, right bias news () moderately to strongly biased toward conservative causes
Snopes.com • Example 1 (travel) • Individuals fleeing danger can request to be "unlisted" in a hotel so no one can find them. • Unproven; further what’s true and what’s false analysis provided • Example 2(Horrors) • Poisoned Halloween Candy • false • Example3 (religion) • Christ Church in Alexandria, Virginia, is "ripping out" a plaque dedicated to George Washington because it might offend people • Mostly false; further what’s true and what’s false analysis provided • Example 4(computers) • Accepting a friend request from a stranger will provide hackers with access to your computer and online accounts • false • Example 5 (college) • A student mistook examples of unsolved math problems for a homework assignment and solved them • True • Example 6 (food) • Drinking cocktails from a copper mug can cause copper poisoning • Mixture; further what’s true and what’s false analysis provided
TruthOrFiction.com • Example 1(government) • Bill Clinton’s Love Child, Danney Williams, Found Dead-Fiction! • Example 2(food) • McDonald’s McRib Sandwiches Made From Inverted Pork Rectums-Fiction! • Example 3(religion) • Sign at Swiss Hotel: Jewish Guests Must Shower Before Using Swimming Pool-Truth! • Example 4(terrorism) • SeddiqueMateen Visits White House; Meets with President Obama-Unproven!
Examples, Abc.news Health, Do more people die in Australia than Sweden due to poorly heated homes? – overstated! Education, Do Australian taxpayers subsidise over half the cost of each student's higher education – incorrect!
即時核查(real time fact check)在香港&中國 2017/03/21 港媒新嘗試即時核查特首選舉論壇 • 香港01 https://www.hk01.com/ “整個團隊為約20人,來自不同的部門,由4名較資深的同事,負責細聽候選人的說話,一聽到有可疑之處,便分派記者立即查證。”例如: 林鄭月娥表示政府司局長落區「完全無警察安排」,但香港01發現政府官員落區多次要警方布防 • 端傳媒https://theinitium.com/ “端傳媒請來十多名不同範疇的專家同時收看直播,當專家察覺候選人的對話有問題時,便會由記者翻查資料。” 例如: 就前財政司司長曾俊華稱”香港為世界三大金融中心之一”查證,指「金融中心指數」去年4月公布的排名,香港排行第四 林鄭月娥又提到居屋資產上限只是5萬多港元,但事實上是85萬元 • 事實核查在中國並未形成一個專門的崗位,采編者天然承擔了事實核查的部分工作 • 2017-07-25愛讀網:ONE實驗室成為中國首個設立事實核查崗位的媒體, 而事實核查員其實也只有一個,劉洋 • 2017-10-12北京新浪網:以李海鵬為首的ONE 實驗室團隊解散
News helper http://newshelper.g0v.tw/
Problem Scope & Task • Political news • Two-step aims: provide an automatic live fact check/Truth discovery tool with domain and task specifications • To retrieve fact-checked/world-knowledge-based claims with their truth labels • based on the similarity calculation, to provide evidences with associating lists of web sources for novel claims to Journalist • based on the truth discovery models, returns the veracity label and score of each data value as well as the trustworthiness scores of the sources [VERA] • large dataset for building up & training in RNN for a domain specification practical system • To automatically provide the suggested truth labels with confidence scores for the novel claims • Urban legends &rumors • Topics: food, computers, education, health, famous figures
Task Specification 即時核查 特首選舉論壇 城市論壇
Claim Validation Approaches Approaches: measurement of relatedness and reliability • Semantic Similarity based, for the repetition and paraphrase claims • Calculate the semantic similarity between the given claim and the already fact-checked ones, return the label in K-nearest neighbor • for novel Claim Validation • Deep Learning Model • Features • claims, evidences, web sources with trustworthiness scores, speaker etc • The trustworthiness or accuracy of a web source is the probability that it contains the correct value for a fact • E.g. for a fact of Barack Obama’s nationality • How to calculate the Web source trustworthiness? • agreement based Source-claim Iterative Models • Input: claims with true/false labels • Output: disagreeing and agreeing web sources with trustworthiness socres • Truth Discovery Model: VERA • the first attempt to demonstrate truth discovery in action from Web data and Twitter data • Search knowledge base or google search engine, for world knowledge claims (e.g. population, GDP rate) • Wolfram Alpha search API • Wikipedia – calculation needed
Claim Validation Architecture with Truth Discovery trustworthiness scores of the web sources Claim Truth value Claim Query API Information Extraction: Tools: semilar Truth Discovery Multi-Source & Evidence Discovery Tools: TextRunner, DeepDive, TwitIE Entity/relationship Extraction Data Fusion Agreement based Source-claim Iterative Models
Agreement based Source-claim Iterative Models for Truth Discovery • Problem of Truth Discovery • Given aset of assertions claimed by multiple sources, label each claimedvalue as true or false and compute the reliability of each source • E.g., we combine evidences of claims from web sources of different trustworthiness to verify Claims • Web sources might be an individual web page or a whole web site • General principle of truth discovery • If a web source provides trustworthy information frequently, it will be assigned a high reliability; • meanwhile, if one piece of information is supported by web sources with high reliabilities, it will have big chance to be selected as truth
Iterative computation of source trustworthiness and claim belief Input: Claims (si, dj, vk) Output label: true/false d1: Russia.CurrentPresident, v1:Medvedev, v2: Putin, v3: Yeltsin d2: USA.currentPresident, v4: Clinton, v5: Obama d3: France.currentPresident, v6: Hollande, v7: Sarkozy
Iterative computation of source trustworthiness and claim belief Select s of Ts Threshold 13 (s) = Source s Trustworthiness Ts Iteration 1: 1 2 1 2 2 1 1 Iteration 2: 3 5 1 7 7 5 1 Iteration 3: 8 13 1 262619 1 s7 s2 s3 s4 s5 s6 s1 e6 e7 e5 e4 e1 e2 e3 Initialization 1 1 1 1 1 1 1 Iteration 1: 3 1 2 2 5 2 1 Iteration 2: 8 1 7 5 19 7 1 Iteration 3: 21 1 26 13 7126 1 Evidence e Confidence Ce Select e of Ce Threshold 21 Claim 3 T/F Claim 1 T/F Claim 2 T/F
Other Information Top 15 most popular celebrity gossip websites. http://www.ebizmba.com/articles/gossip-websites, July 2017 Websites that Post Fake and Satirical Stories http://www.factcheck.org/2017/07/websites-post-fake-satirical-stories/ The list of Questionable sites https://mediabiasfactcheck.com/fake-news/ The list of satire sites https://mediabiasfactcheck.com/satire/ The list of Conspiracy-Pseudoscience https://mediabiasfactcheck.com/conspiracy/
Related works TextRunner: Open Information Extraction on the Web, University of Washington, Computer Science and Engineering, NAACL HLT Demonstration Program A Review of Data Fusion Techniques,Hindawi Publishing Corporation,TheScientificWorldJournal Volume 2013 VERA: A Platform for Veracity Estimation over Web Data, WWW’16 Companion, April 11–15, 2016, Montréal, Québec, Canada, ACM 978-1-4503-4144-8/16/04 A Survey on Truth Discovery Methods for Big Data,International Journal of Computational Intelligence Research ISSN 0973-1873 Volume 13, Number 7 (2017), pp. 1799-1810 Veracity of Big Data, CIKM2015 tutorial
香港政府新聞網 http://www.news.gov.hk 明鏡集團網 http://www.mirrormediagroup.com/index.html BBC中文網 http://www.bbc.com/zhongwen/trad Yahoo news https://hk.news.yahoo.com google fact checking algorithm websites authority