1 / 17

Deep Learning Research & Application Center October 2017 Claire Li

Factual Claim Validation Models. Deep Learning Research & Application Center October 2017 Claire Li. Available fact checking tools ClaimBuster Google search API and other free ones Claim Validation Model with RNN. Available fact checking tools.

jwagner
Download Presentation

Deep Learning Research & Application Center October 2017 Claire Li

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Factual Claim Validation Models Deep Learning Research & Application Center October 2017 Claire Li

  2. Available fact checking tools ClaimBuster Google search API and other free ones Claim Validation Model with RNN

  3. Available fact checking tools Automated fact checking projects vary in what kinds of sources they deal with, what kinds of claims they deal with, and what topics they deal with

  4. Narrow scope is the key for practical tools for fact-checkers • claimBuster • political sentences currently • Based on machine learning models • As a ranking and classification task • Fake news detection as a stance classification task

  5. Claim Validation with ClaimBuster Scoring sentences: Classification &scoring models, features of tokens and tokens of PoS Retrieve evidence: Context from google SE; Ans from wolfram alpha& Google answer box; Verdicts from above Similarity calculation: Similarity of token & Sematic similarity from semilar Monitors & retrieves sentences

  6. Claim Validation with ClaimBuster • Given a factual claim which is scored • Search in a repository for similar claims that have already been fact-checked by professionals (claim matcher) • Sematic similarity match (3-10) spots the matched fact-checked claims • Returns the truth rating if any • Otherwise goto 1) • ClaimBusteris not able to produce a verdict • processes search engine results for evidence based on the similarity to the input claim • Use question-answering systems • translate the natural languageclaim into questions • queries external knowledge bases (Google Answer Boxer and Wolfram Alpha ) with derived questions

  7. Search in a repository for similar claims that have already been fact-checked by professionals, e.g. • claim (string) • the matched fact-checked claim • host (string) • the source of the fact-check • search (string) • the search measure which yielded the fact-match • similarity_rating (number) • 3-10 for a good match • speaker (string) • speaker of fact-checked claim • truth_rating (string) • true, false, pants on fire, indeterminate • url • the URL location of the matched fact-check

  8. Processes search engine results for evidence based on the similarity to the input claim, ex • sentence (string) • an anchor sentence is the one which has a high similarity score to the input claim • context (array[string]) • a context is composed of, some sentences to the left of the anchor + the anchor sentence + some sentences to the right of the anchor • similarity_rating (number) • 0-1, measure between input claim and anchor • url (string) • urlof context • host (string) • the hostname of the URL

  9. Use question-answering systems, e.g. • answer_box_html (string, optional) • Complete raw html where the justification was extracted from Google Answer Boxes • justification (string) • Either the text scraped from the Google Answer Box or the Wolframalpha response • question (string) • question which was derived from your input claim and subsequently input into the question answering system specified in the source parameter • source (string) • either Google Answer Boxes or Wolfram Alpha API • truth_rating (string, optional) • If the truth value of true, false, pants on fire, indeterminate is inferable

  10. Use a world knowledge base of fact-checked statements Google Answer Boxer: what is the time in Hong Kong Wolfram Alpha: How many undocumented people in United States?

  11. Google custom search API &Wolfram|Alpah API pricing • By default, the Google Custom Search API has a quota of 100 queries per day. If you exceed this quota, you can upgrade to 1000 queries per day for one month for $5

  12. Free Open Source Search Engines • Information retrieval from free open source search engines • Given claims spotted, search for documents contain relevant fact checks or evidences • Ranking and classification problem • Apache Lucene, in Java, cross-platform • fuzzy searches: e.g. roam~0.8, find terms similar in spelling to roam as 0.8 • proximityquery: e.g.,   "Barack michellea"~10 • range query, title:{Aida TO Carmen} • phrase query: e.g., “new york" • used by infomedia, Bloomberg , and Twitter’s real time searching • Apache Solr (better for text search) and Elastic Search (better for complex time series search and aggregations) • Solr/elasticsearch are built on top of Lucene • Basic Queries, text: obama, all docs with text field containing obama • Phrase query, text: “Obama michellea” • Proximity query, text: ”big analytics”~1, big analytics, big data analytics • Boolean query, solr AND search OR facet NOT highlight • Range query, age: [18 To 30] • Used by Netflix, eBay, Instagram, and Amazon CloudSearch

  13. Claim Validation Model with RNN [1][2] Monitor Model Claim Spotting Model Claim Verdict Model Create & publish LSTMs True Mostly true Half true Half false Mostly false False LSTMs

  14. Claim Validation Model: extraction of evidence

  15. Claim Verdict Model: Claim Validation True, mostly true, half true, half false, mostly false, false

  16. Works with RNN CNN- and LSTM-based Claim Classification in Online User Comments, COLING 2016 Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM Fake News Detection using Stacked Ensemble of Classifiers, nlpj2017 Identification and Verification of Simple Claims about Statistical Properties, emnlp 2015

More Related