Deep Learning Research & Application Center 7 September 2017 Claire Li

Literature Review on Detection of Fake News Deep Learning Research & Application Center 7September 2017 Claire Li

Characteristics of Fake news articles • Prototype demo • Review on Fake News Detection (FND) • Features • Dataset • Models/Approaches • Performance • Review on Rumours Detection

Kaggle Fake News Data Types Types and counts of news articles bias 443 bs 11492 conspiracy 430 fake 19 hate 246 junksci 102 satire 146 state 121

Top-30 word list & Top-100 word list • For Type=fake (total 7,318 words, 1,754 unique words:) Trump: 51; Clinton: 31; Hillary: 26; Chronicles: 20; Election: 19; Philippines: 18; White: 15; House: 15; President: 13; college: 11; Filipino: 10;

Top-20 frequent bi-grams from Type fake This Article: 4; Executive Order: 4; join The: 4; Lady Gaga:4 The Resistance:8; United States: 9; White House: 15; Donald Trump: 16; Adobo Chronicles:20 The adobo: 20

Prototype • Fake news from Kaggle dataset (53.8M), real news from Signal Media News (1.05G) • Training data • 8,000 false news and 8,000 real news dataset both in English • Test Data • 2,000 false news set and 2,000 real news set both are exclusive from the training set • Model: • Embedding layer: pretrained 100-dimensional word2vec embeddings from GLOVE • Hidden layer: stacked LSTM with output dimension as 100, with dropout regularization • Output layer: sigmoid • Training history • Prediction Examples

Sample data analysis from type=bs with LIWC2015 Characteristics of Fake news articles[7]: • tend to be shorter, in terms of word count • seem to adopt a more personal disclosing tenor • a higher authenticity is associated with a more honest, personal, and disclosing text, while lower numbers suggest a more guarded, distanced form of discourse. • to convey less clout • expertise/confidence • tones for fake news articles are more likely to be negative • greater anxiety, sadness, or hostility • lesser analytical thinking • more informal, personal, and narrative thinking

if a word shows up a bunch in “fake” articles and rarely in “real” articles then its fake to real ratio score will be pretty high

History of Dataset on FND Based on literature reviews up to 2017: • In 2014, 221 annotated statements • A fake news detection and fact-checking dataset used by Vlachos and Riedel (2014) [5] • In 2016, 300 labeled rumors from PolitiFact • Ferreira and Vlachos (2016) [6] released the Emergent dataset for rumors detection • LIAR, the first publicly large scale dataset for fake news detection, includes 12.8K human labeled short statements from POLITIFACT.COM’s API • Kaggle competition dataset (53.8M) • Tencent Weibo for Chinese Dataset

Reviews on Fake News Detection Fake News Detection on Social Media: A Data Mining Perspective, arXiv preprint, arXiv:1708.01967v2 [cs.SI], 2017 CSI: A Hybrid Deep Model for Fake News, arXivpreprint, arXiv:1703.06959, 2017 Liar, Liar Pants on Fire: A New Benchmark Dataset for Fake News Detection, ACL 2017

1. Fake News Detection on Social Media: A Data Mining Perspective, arXiv preprint, arXiv:1708.01967v2 [cs.SI], 2017 • A narrow definition of fake news is news articles that are intentionally and verifiably false and could mislead readers • The following concepts are not fake news according to the above definition: • Satire news with proper context, which has no intent to mislead or deceive consumers and is unlikely to be mis-perceived as factual; • Rumors that did not originate from news events; • Conspiracy theories, which are difficult verify as true or false; • Misinformation that is created unintentionally; and • Hoaxes that are only motivated by fun or to scam targeted individuals. • Fake news is written and published with the intent to mislead in order to gain financially or politically, often with sensationalist, exaggerated, or patently false headlines that grab attention. [wiki]

Features News article features • Publisher (author): author name, domain, age • Content: headline, text, image • Common linguistic features • Lexical features: total words, # of char per word, large frequency of words, and unique words • Syntactic features: n-grams, BoW, Punctuation, POS • Domain-specific linguistic features • quoted words (e.g. adv. In news domain), external links, number of graphs, and the average length of graphs (e.g. IPO documents) • Visual-based features (image/video) • clarity score • measure the ambiguity of a query in relation to the collection in which the query issuer is seeking information • coherence score, similarity distribution histogram, diversity score, and clustering score • image ratio, multi-image ratio, hot image ratio, long image ratio

Features • An engagement eit = {ui,pi,t} represents that a user ui spreads news article a using post pi at time t Social context features, such as user social engagements on social media • User-based: characteristics of the users which post messages • Individual user: registration age, number of followers/followees, number of tweets the user has authored • Group level user: ‘% of veriﬁed users’ and ‘average number of followers’ • Post-based: information from the posts to infer the veracity of news • post level • Stance features: supporting, denying, can’t decide etc • Credibility features: degree of reliability • Topic features • represent a document using topic words generated with LDA by calculating the conditional probability of a word wi given a topic zj • temporal level • temporal variations of post-level feature values • group level • the average credibility scores are used to evaluate the credibility of news

Social Context Features • Network-based • The stance network can be built with nodes indicating all the tweets relevant to the news and the edge indicating the weights of similarity of stances • The co-occurrence network, which is built based on the user engagements by counting whether those users write posts relevant to the same news articles • The friendship network indicates the following/followee structure of users who post related tweets • The diffusion network, where nodes represent the users, a diﬀusion path between two users ui and uj exists if and only if (1) ujfollows ui, and (2) uj posts about a given news only after ui does so

Datasets • Sources: news agency homepages (Agence France-Presse, reuters), search engines, and social media websites (Twitter, facebook, weibo). • Publicly available datasets • BuzzFeedNews • News published in Facebook from 9 news agencies over a week close to the 2016 U.S. election from September 19 to 23 and September 26 and 27 • Every post and the linked article were fact-checked claim-by-claim by 5 BuzzFeedjournalists • 1,627 articles: 826 mainstream, 356 left-wing, and 545 right-wing articles. • LIAR • from fact-checking website PolitiFact • 12,836 human-labeled short statements from news releases, TV/radio, interviews, campaign speeches etc • labels: pants-fire, false, barely-true, half-true, mostly true, and true • BS Detector • collected from BS detector developed for checking news veracity • CREDBANK • a large-scale crowdsourced dataset of approximately 60 million tweets that cover 96 days starting from October 2015, With Associated Credibility Annotations

Datasets

News Content Models Fact checking is a way of knowledge-based news content verification, can be categorized as • Expert-oriented • relies on human domain experts to investigate relevant data and documents to construct the verdicts of claim veracity, e.g., PolitiFact11, Snopes12 • Crowdsourcing-oriented • exploits the “wisdom of crowd” to enable normal people based news content annotations which are then aggregated to produce an overall assessment of the news veracity, e.g., Fiskkit, ‘for real’ account of LINE • Computational-oriented • provide an automatic scalable system to classify true and false claims using open web and structured knowledge graph (e.g. Google Knowledge Graph Search API) • identifying check-worthy claims • discriminating the veracity of fact claims

News Content Models Style-based: tries to capture the manipulators in the writing style (e.g. appeal to and persuade ) of news content • Deception-oriented model: capture the deceptive statements or claims from news content • Deep syntax models using probabilistic context free grammers (PCFG), with which sentences can be transformed into rules that describe the syntax structure. • Rhetorical structure theory capturing the diﬀerences between deceptive and truthful sentences • Deep network models: CNN, RNN • Objectivity-oriented: capture style signals (e.g., misleading and deceptive clickbait titles) that can indicate a decreased objectivity of news content,

Social Context Models • Stance-based: in favor of, neutral toward, or against • Using explicit user stances or implicit stances automatically extracted from social media posts • Learn latent stance with latent dirichlet allocation (LDA) • Propagation-based: the credibility of a news event is highly related to the credibilities of relevant social media posts • homogeneous credibility networks • consist of a single type of entities, such as post or event • heterogeneous credibility networks • Involve different types of entities, such as posts, sub-events, and events

Performance Evaluation Metrics For Precision, Recall, F1, and Accuracy, the higher the value, the better the performance.

RELATED AREAS • Rumor Classification • Truth Discovery • Detecting true facts from multiple conflicting sources • Clickbait Detection • Spammer and Bot Detection • Capture malicious users for spreading ads, disseminating, pornography, delivering viruses, and phishing • E.g., social bots automatically retweet posts without verifying the facts

2. CSI: A Hybrid Deep Model for Fake News • Features • the text of an article, • the user response it receives, and • user information: the source characteristic of users promoting it : the number of engagement, one engagement is defined as the number of user response to an article ai ∆t: the time between engagements Xu: user features, user ui that engaged with ajat time t • Dataset

CSI: A Hybrid Deep Model for Fake News • Dataset: Twitter and Weibo • Models: LSTM

CSI: A Hybrid Deep Model for Fake News Performance

3. Liar, Liar, Pants on Fire: A New Benchmark Dataset for Fake News Detection, ACL 2017 • Features • Textual • Speaker related meta-data: party afﬁliations, current job, home state, and prior credit history • A credit history vector h = {19,32,34,58,33},which corresponds to the speaker’s counts of “pants on ﬁre”, “false”, “barely true”, “half true”, “mostly true” for historical statements. • Difficulties: detection of short statements (17.9 tokens in average) from categories of TV/radio interviews, posts on Facebook or Twitters

Dataset • LIAR, a new publicly available dataset includes 12,836 short statements from POLITIFACT.COM (2007-2016) labeled for truthfulness, subject, context/venue, speaker, home state, party, and prior credit history • 6 Labels for truthfulness: pants-ﬁre (1,050), false, barely true, half-true, mostly-true, and true

The categories include news releases, TV/radio interviews, campaign speeches, TV ads, tweets, debates, Facebook posts, etc. • Top-10 subjects: economy, healthcare, taxes, federal-budget, education, jobs, state-budget, candidates-biography, elections, and immigration.

Model

Performance

Rumors Detection Detection and Resolution of Rumours in Social Media: A Survey, arXiv preprint arXiv:1704.0065, 2017 Call Attention to Rumors: Deep Attention Based Recurrent Neural Networks for Early Rumor Detection,arxiv: 1704.05973,2017 Detecting Rumors from Microblogs with Recurrent Neural Networks, Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence (IJCAI-16)

1. Detection and Resolution of Rumours in Social Media: A Survey, arXiv preprint arXiv:1704.0065, 2017 Rumour Data Collection Strategies • filtering by keywords to collect data related to an event • Obama is muslim or not by using keywords like ‘Obama and muslim’ to filter the posts • defining a bounding box to collect data posted from predefined geographical locations • listing a set of users of interest to track their posts

Different Classification System • Strong correlations between rumor support and veracity • a majority of users support true rumors, while a higher number of users denies false rumors • The lack of an ofﬁcial source and personal involvement are the most important factors • Tweets from established users with a larger follower network were spread the most

Detection Approaches Dataset • PHEME dataset • 1,972 rumours and 3,830 non-rumours associated with ﬁve breaking news stories Approaches • Using pre-identified rumours, classiﬁes new tweets as being related to one of the known rumours or not • Rumourswill provoke tweets from skeptic users who question or enquire about their veracity • a number of enquiring tweets associated information is rumourous • 0.410 in precision and 0.065 in recall • Using Conditional Random Fields (CRF) classifier learns context throughout a breaking news story to determine if a tweet constitutes a rumour • 0.667 in precision and 0.556 in recall

Tracking Approaches Dataset • Qazvinianet al. 2011 [2] dataset • over 10,000 tweets, associated with 5 different rumours??????, each tweet annotated for relevance towards the rumour as related or unrelated Features • content • unigrams, bigrams and their part-of-speech (POS) tags • network • Retweets, replies, comments • Twitter speciﬁc memes • content features inferred from hashtags and URLs Approaches • Qazvinian [2] uses Bayesian classiﬁer, achieves mean average precision of 96.5% • Hamidianand Diab[3] uses Tweet Latent Vector (TLV), which creates a latent vector representative of a tweet to overcome the limited length and context of a tweet by linking it to the Semantic Textual Similarity (STS) model • precision score of 97.2%,

Rumour Stance Classification Dataset • PHEME stance dataset13 with stance labels of support, deny, query, comment • Qazvinian et al. 2011 over 10,000 tweets with stance labels of affirm, deny, neutral, unrelated Features • content • unigrams, bigrams and their part-of-speech (POS) tags • network • retweets • time related information • pragmatics • named entities, events, sentiment and emoticons (smiley) • twitter conversations • tweet domain features • meta-data about users, hashtags and event specific keywords

Rumour Stance Classification Approaches • Bayesian classifiers • J48 decision tree [3] • 82.9% F-1 measure • Random Forest classification • 87% in precision, 96.9% in recall, 91.7% in F1-measure and 88.4% in accuracy • Conditional Random Fields (CRF) • Hybrid of machine learning classifiers (SVM) and manually created rules • LSTMs

Rumour Veracity Classification Dataset • RumourEval 2017 • 300 rumours annotated for veracity as one of true, false or unverified • Each rumour includes a stream of tweets associated with it • Twitter and SinaWeibo • liuyanbaike.com (a Chinese rumour debunking platform) Features • message-based • the length of a message, whether the message contains exclamation/question marks, number of positive/ negative sentiment words, whether the message contains a hashtag, and whether it is a retweet • user-based • Registration age, number of followers, number of followees and the number of tweets the user has authored in the past • topic-based • the fraction of tweets that contain URLs, the fraction of tweets with hashtags • propagation-based features • depth of the retweet tree or the number of initial tweets on a topic

Rumour Veracity Classification Features • Temporal features • to capture how rumours spread over time. • The structural features • model the connectivity between users who posted about the rumour • Linguistic features • are obtained through the Linguistic Inquiry and Word Count (LIWC) dictionaries • Client-based features • information about the software that was used to perform the messaging • Location-based features • information relating to whether the message was sent from within the same country where the event happened or not • Network features • Creating a social network based on reviews or comments attached to the source tweet • Negation words (comprehensibility category), past, present, future POS in the tweets (time-orientation category), discrepancy, sweat and exclusion features (writing style category) and, finally, home, leisure, religion and sex topic features (topic category)

Rumour Veracity Classification Approaches • Machine learning: Bayesian networks, SVM, decision trees based on J48 (89.2% accuracy, 89.1% precision, 89.1% recall and 89.1% F1-measure) • Random Forest classifier • accuracy (90%), precision (93.5%), recall (89.2%) and F1-measure (89.3%) • Decision trees with J48 leads to 77.4%, SVM with the RBF kernel to 77.9% and random forests to 81.5%. • [4] reports the correlation between features and veracity of rumours using logistic regression • features like mention of numbers, the source the rumour originated from and hyperlinks, positively correlate with true rumours and rumours containing some wishes are positively correlated with false rumours. If images are included in the rumours then those were negatively correlated with true rumours

Rumours detection related applications • Hoaxy • A Platform for Tracking Online Misinformation • PHEME • research project into establishing the veracity of claims made on the internet • RumorLens • aid journalists in finding posts that spread or correct a particular rumor on Twitter by exploring the size of the audiences that those posts have reached • TwitterTrails • Interactive, web-based tool that allows users to investigate the origin and propagation characteristics of a rumor and its denial on Twitter • RumourFlow • multiple visualizations and modeling tools that are integrated to reveal rumour contents and participants’ activity, both within a rumour and across different rumours • REVEAL • CrossCheck • Emergent

References ICWSM 2015 paper "CREDBANK: A Large-scale Social Media Corpus With Associated Credibility Annotations” VahedQazvinian, Emily Rosengren, Dragomir R. Radev, and Qiaozhu Mei. 2011. Rumor Has It: Identifying Misinformation in Microblogs. In Proceedings of EMNLP. 1589–1599 SardarHamidian and Mona T Diab. 2015. Rumor Detection and Classiﬁcation for Twitter Data. In Proceedings of the Fifth International Conference on Social Media Technologies, Communication, and Informatics (SOTICS) Cheng Chang, Yihong Zhang, Claudia Szabo, and Quan Z Sheng. 2016. Extreme User and Political Rumor Detection on Twitter. In Advanced DataMining and Applications: 12th International Conference, ADMA, 2016, Gold Coast, QLD, Australia, December 12-15, 2016, Proceedings. Springer, 751–763 Andreas Vlachos and Sebastian Riedel. 2014. Fact checking: Task definition and dataset construction. Proceedings of the ACL 2014 Workshop on Language Technology and Computational Social Science William Ferreira and Andreas Vlachos. 2016. Emergent: a novel data-set for stance classification. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. ACL. Automated Fake News Detection Using Linguistic Analysis and Machine Learning

Deep Learning Research & Application Center 7 September 2017 Claire Li