1 / 67

Applications of news analytics in finance: a review

Applications of news analytics in finance: a review. Gautam Mitra Co-author Leela Mitra. Summary and scope.

sierra
Download Presentation

Applications of news analytics in finance: a review

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applications of news analytics in finance: a review Gautam Mitra Co-author Leela Mitra

  2. Summary and scope • In this talk we set out a structured (reading) guide to the published research outputs: Journal papers, white papers, case studies which are emerging in the domain of “news analytics” applied to finance. • We aim to provide insight into the subtle interplay of information technology (including AI), the quantitative models and behavioural biases in the context of trading and investment decisions. • Applications such as low frequency and high frequency trading are presented; some desirable/potential applications are discussed.

  3. Outline • Introduction • News data • Data sources • Pre analysis of data • Determining sentiment scores • General overview • Das and Chen • Lo • Models and applications in summary form • (abnormal ) Returns • Volatility and risk control • Desirable industry applications • Summary and discussions

  4. Introduction • News. • Market Environment. • Sentiment. • Investment Decisions. • Risk Control.

  5. Introduction • Traders [ High Frequency ] • Fund Managers [ Low Frequency ] • Desktop • Market Data • NewsWire • Data WareHouse • DataMart

  6. R & D Challenge  Identify Killer Application Introduction • Smart investors rapidly analyse/digest information. • News stories/announcements. • Stock price moves (market reactions). • Act promptly to take trading/investment decisions. • Can a machine act intelligently(AI) to compete or outsmart humans ?

  7. Introduction • At least can we have IT/AI tools which help humans make good investment decisions?Intelligence Amplification<Gearing… engineering concept> • Thus three disciplines converge; • Information Systems • AI, in particular, Natural Language Processing • Financial Engineering/quantitative Modelling ( including behavioural finance )

  8. Introduction Mainstream News Pre-News Pre-Analysis Classifiers Analysis Consolidated Datamart Updated beliefs, Ex-ante view of market environment Sentiment Scores Web 2.0 Social Media • Quant Models • Return Predictions • Fund Management / Trading Decisions • Volatility estimates and risk control (Numeric) financial market data Data  analysis  Datamart  quant models

  9. Outline • Introduction • News data • Data sources • Pre analysis of data • Determining sentiment scores • General overview • Das and Chen • Lo • Models and applications in summary form • (abnormal) Returns • Volatility and risk control • Desirable industry applications • Summary and discussions

  10. News data: Data sources • Sources of news/informational flows (Leinweber) • News: Mainstream media, reputable sources. • Newswires to traders desks. • Newspapers, radio and TV. • Pre-News: Source data • SEC reports and filings. Government agency reports. • Scheduled announcements, macro economic news, industry stats, company earnings reports… • Social media: Blogs, websites and message boards • Quality can vary significantly • Barriers to entry low • Human behaviour and agendas

  11. News data: Data sources • Web based news • Individual investors pay more attention than institutional investors (Das and Rieger) • “Collective Intelligence” large group of people (no ulterior motives) their collective opinion may be useful. • SEC does monitor message boards • Far from perfect vetting of information. • Financial news can be split between • Scheduled news (Synchronous) • Unscheduled news (Asynchronous, event driven)

  12. News data: Data sources • Scheduled news (Synchronous) • Arrives at pre scheduled times • Much of pre news • Structured format • Often basic numerical format • Typically macro economic announcements and earnings announcements

  13. News data: Data sources • Macro economic announcements • Widely used in automated trading • Impact large and most liquid markets (foreign exchange, Govt. debt, futures markets) • Naturally affects trading strategies. • Speed and accuracy are key... technology requirements substantial • Providers in this space • Trade the News, Need to Know News, Market News International, Thomson Reuters, Dow Jones, Bloomberg… • Earnings announcements • Directly influences stock prices’ • Widely anticipated and used in trading strategies

  14. News data: Data sources • Unscheduled news (Asynchronous, event driven) • Arrives unexpectedly over time • Mainstream news and social media • Unstructured, qualitative, textual form • Non-numeric • Difficult to process quickly and quantitatively • May contain information about effect and cause of an event • To be applied in quant models needs to be converted to an input time series

  15. Outline • Introduction • News data • Data sources • Pre analysis of data • Determining sentiment scores • General overview • Das and Chen • Lo • Models and applications in summary form • (abnormal) Returns • Volatility and risk control • Desirable industry applications • Summary and discussions

  16. News data: Pre analysis of data • Collecting, cleaning and analysingnews data …challenging • Major newswire providers collect newsfrom a wide range of sources e.g. Factiva database from Dow Jones, news from 400 sources • Tagging– Machine readable meta data Major newswire providers tag incoming news stories • Reporters tag stories as they enter them to system • Machine learning techniques also used to identify relevant tags (RavenPack) • Unstructured stories into basic machine readable form • Tags held in XML < standard for meta-data exchange> • Reveals story’s topic areas and other useful meta data

  17. News data: Pre analysis of data • Need to identify news which is relevant and current • “Information events” distinguish stories reporting on old news from genuinely “new” news • Tetlock et al. event study shows “information leakage”

  18. News data: Pre analysis of data • Need to identify news which is relevant and current • Reuters give for each article • Relevance scores … measures by how much the article is about a particular company • Novelty/uniqueness determines the repetition among articles • RavenPack • Distinguish stories which are events • Carry first mention of a particular theme • Stories which are not events are excluded • To minimise number of duplicate stories

  19. News data: Pre analysis of data • Classification of news • Tagged stories provide hundreds of event types • Need to distinguish what types of news are relevant to our application • Market may react differently to different types of news • e.g. Moniz et. al. find market reacts more strongly to earnings news than strategic news • Different news is available for different assets • Larger companies with more liquid stock, tend to have higher news coverage

  20. News data: Pre analysis of data • Classification of news • Accounting related news • Earnings • Announcements of earnings • Restatements of Operating Results etc.. • Trading updates • Announcements of Sales/Trading Statement etc… • Strategic news • M&A Related • M&A Rumours and discussion • M&A Transaction announcements etc… • Restructuring issues etc…

  21. News data: Pre analysis of data • Relationship of different news items / Independence of news… important consideration • Seasonality of news (Hafez, Lo, Moniz) • Need to be able to identify unexpected newsflow from variation due to seasonality • Hourly, daily and weekly seasonality • Intraday - larger volumes of newsflow just before opening of European, US and Asian stockmarkets (Hafez)

  22. News data: Pre analysis of data Illustration of Seasonality (Hafez, RavenPack)

  23. Outline • Introduction • News data • Data sources • Pre analysis of data • Determining sentiment scores • General overview • Das and Chen • Lo • Models and applications in summary form • (abnormal) Returns • Volatility and risk control • Desirable industry applications • Summary and discussions

  24. Determining sentiment scores • Informational content of news: Converting qualitative data into a quantitative form … challenging • Distinguish the sentiment of stories (positive/negative) • scale of positivity / negativity … sentiment scores • Consider the story’s context and language  • How positively/negatively human interprets story… emotive content • Expert classification • Psychosocial dictionaries e.g. General Inquirer • Different groups of people effected by events differently or have different interpretations of same events …conflicts may arise

  25. Determining sentiment scores • Market based measures (Lo, Moniz et. al. and Lavernko) • Markets’ lagged relative change in returns/volatility for a particular asset (asset class) • Machine learning and natural language techniques can be used, to determine sentiment of incoming stories … sentiment indices over time • Index validation - To use index we must be able to find relationship with relevant market variables

  26. Outline • Introduction • News data • Data sources • Pre analysis of data • Determining sentiment scores • General overview • Das and Chen • Lo • Models and applications in summary form • (abnormal) Returns • Volatility and risk control • Desirable industry applications • Summary and discussions

  27. Das and Chen • extract investor sentiment from stock message boards • for Morgan Stanley High Tech (MSH) Index • Web scraper program downloads tech sector message board messages • Five algorithms with different conceptual underpinnings are used to classify each message • Voting scheme is then applied

  28. Das and Chen • Three supplementary databases • Dictionary – nature of the word, noun adjective, adverb. • Lexicon - collection of hand picked words which form variables for statistical inference within the algorithms • Grammar – training corpus of base messages used in determining in-sample statistical information. Applied for use on the out-of-sample messages • Lexicon and grammar jointly determine the context of the sentiment

  29. Das and Chen • Five algorithms: (=Classifiers) 1. Naïve classifier • Based on word count of positive and negative connotation words 2. Vector distance classifier • Each of the D words in the lexicon is assigned a dimension in vector space • Each training message is pre classified as positive, negative or neutral • Each new message is classified by comparison to the cluster of pre trained vectors and is assigned the same classification as that vector with which it has the smallest angle

  30. Das and Chen 3. Discriminant based classifier • NC weights all words within the lexicon equally. The discriminant based classification method replaces this simple word count with a weighted word count. • The weights determine how well a particular lexicon word discriminates between the different message categories 4. Adjective-adverb phrase classifier • This is based on the assumption that phrases which use adjectives and adverbs emphasize sentiment and require greater weight. • Uses a word count but uses only those words within phrases containing adjectives and adverbs.

  31. Das and Chen 5. Bayesian classifier • Given the class of each message in the training set we can determine the frequency with which a lexical word appears in a particular class. • For a new message we are able to compute the probability it falls within a particular class given its component lexicon words • The message is classified as being from the category with the highest probability. • Voting scheme … final classification based on achieving majority amongst classifiers • Reduces number of messages classified • Enhances classification accuracy

  32. Das and Chen • Ambiguity - stock message boards messages often highly ambiguous • Use General Inquirer … determine optimism score • Filter in and consider only most highly optimistic stories in positive category • Filter in and consider only the most highly pessimistic scores in the negative category • Number of false positive in classification declines • Disagreement–0 no disagreement; 1 high disagreement

  33. Das and Chen • Relationship between sentiment indices and market variables ? Nature of sentiment index? • Positive sentiment bias • Fig shows histogram of normalised sentiment for a stock…positively skewed • RavenPack find positive bias in classifiers … more marked in bull markets

  34. Das and Chen • Relationship between sentiment indices and market variables • Sentiment and stock levels – are related …determining precise nature of price relationship is difficult • Sentiment inversely related to disagreement • Disgreement rises, sentiment falls • Sentiment correlated to posting volume • Discussion increases, indicates optimism about stock is rising • Strong relationship between message volume and volatility (Antweiler and Frank (2004) also) • Strong relationship between trading volume and volatility

  35. Outline • Introduction • News data • Data sources • Pre analysis of data • Determining sentiment scores • General overview • Das and Chen • Lo • Models and applications in summary form • (abnormal) Returns • Volatility and risk control • Desirable industry applications • Summary and discussions

  36. Lo • Reuters NewsScope Event Indices (NEI) are constructed • to have predictive power for returns and realised volatility • integrated framework, returns and volatility used in calibrating indices • News data • Reuters newsalerts -quick news flashes issued when newsworthy events occur – timely and relevant • Tags machine readable • Headlines concise, small vocabulary…good for machine learning analysis

  37. Lo • The following parameters are used • List of keywords and phrases with real valued weights • A rolling “sentiment window” of size r (say 5/10 minutes) • A rolling calibration window of size R (say 90 days) • is the vector of keyword frequencies over • Raw score is defined as this will tend to be high when news volume is high …normalised score

  38. Lo • Normalised score • At all times t in R days of calibration window record • raw score • news volume; • Normalised score determined by comparing current raw score against raw scores where news volume equals current news volume • St =0.92: 92 % of time news volume is at current level, the raw score is less than it currently is.

  39. Lo • Model calibration • Determine keywords • Create list of keywords by hand • Tool to extract news from periods when scores are high… determine whether keywords are legitimate or need adjusting • Optimal weights for intraday return sentiment index • regress word frequencies against intraday returns • Optimal weights for intraday volatility sentiment index • regress word frequencies against (deseasonalised) intraday realised volatility

  40. Lo • Model calibration • Determining optimal weights more general classification problem • Other techniques…machine learning…perceptron algorithm, support vector machines…

  41. Lo • Index validation – to establish empirical significance of indices… event study analysis • Event is defined when (return/volatility sentiment) index exceeds a threshold value (0.995) • Remove events that follow in less than one hour of another event … consider only “new” events • Tests null hypothesis: Distribution of returns / deseasonalised realised volatility is the same before / after an event. • Visual inspection • t –test for equality of means • Levene’s test for change in standard deviation • Chi – squared goodness of fit

  42. Lo • Index validation – to establish empirical significance of indices… event study analysis

  43. Lo • Index validation – to establish empirical significance of indices… event study analysis

  44. RavenPack Sentiment Scores

  45. Reuters NewsScope Sentiment Engine

  46. Outline • Introduction • News data • Data sources • Pre analysis of data • Determining sentiment scores • General overview • Das and Chen • Lo • Models and applications in summary form • (abnormal) Returns • Volatility and risk control • Desirable industry applications • Summary and discussions

  47. Average Stock Price Reaction to Negative News Events Model & Applications… (abnormal ) Returns Source: Macquarie Quant Research –May 2009

  48. Model & Applications… (abnormal ) Returns Average Stock Price Reaction to Positive News Events Source: Macquarie Quant Research –May 2009

  49. Model & Applications… (abnormal ) Returns • Traders and quant managers … identify and exploit asset mispricings before they correct … generate alpha • News data can be used • Stock picking and generating trading signal • Factor models • Exploit behavioural biases in investor decisions

  50. Model & Applications… (abnormal ) Returns • Stock picking and generating trading signal • Li (2006) simple ranking procedure • … identify stocks with positive and negative sentiment • 10 K SEC filings for non-financial firms 1994 – 2005 • Risk sentiment measure – count number of times words risk, risks, risky, uncertain, uncertainty and uncertainties appear in management discussion and analysis section • Strategy long in low risk sentiment stocks • short in high risk sentiment stocks • … reasonable level returns • Leinweber (2010)– event studies based on Reuters NewsScope Sentiment Engine

More Related