670 likes | 832 Views
The role of News Analytics in financial engineering: a review and the road ahead . Gautam Mitra 7 December 2011 London. Outline. Introduction What… Why… How. A commercial News data Data sources Information Contents/Metadata Summary Information/Views Information/modelling architecture
E N D
The role of News Analytics in financial engineering: a review and the road ahead Gautam Mitra 7 December 2011 London
Outline • Introduction • What… Why… How. • A commercial • News data • Data sources • Information Contents/Metadata • Summary Information/Views • Information/modelling architecture • Models and Applications • Abnormal Returns • News Enhanced Trading Strategies • Risk Control • Case studies • Risk Control • News Analytics Toolkit • Momentum study • Summary Conclusion
WHATNews analytics : a working definition • News analytics refers to the measurement of the various qualitative and quantitative attributes of textual news stories. Some of these attributes are: sentiment, relevance, and novelty. Expressing news stories as numbers permits the manipulation of …information in a mathematical and statistical way < Taken from Wiki > A news story is about an event
WHY the research problem = the business problem The world of financial analytics is concerned with three leading problems. ( i ) Pricing of assets in a temporal setting ( ii ) Making optimum investment decisions- low frequency or optimum trading decisions- high frequency ( iii )Controlling risk at different time exposures
Howthe message Finance industry focuses on three major applications: > High frequency :Trading strategies > Low frequency :Investment strategies > Risk control By increasing the information set with quantified news the legacy models for the above applications can be enhanced Knowledge from three disciplines are required > Information engineering > AI …Knowledge Engineering > Financial Engineering
Introduction • News • Market Environment • Sentiment [Behavioural finance < greed..fear..irrational exuberance >……… Wall Street 1 Wall Street 2 => money never sleeps ]
Introduction [ neo classical models for choice or decision making] • Trading Strategies/ Decisions • Investment Decisions • Risk Control Decisions
R & D Challenge Identify Killer Application Introduction • Smart investors rapidly analyse/digest information. • News stories/announcements. • Stock price moves (market reactions). • Act promptly to take trading/investment decisions. • Can a machine act intelligently(AI) to compete or outsmart humans ?
Commercial • Read The Handbook of News Analytics in Finance By: Gautam Mitra and Leela Mitra < for an instant understanding ...! > • < or look uphttp://www.bis.gov.uk/foresight/our-work/projects/current-projects/computer-trading • The Future of Computer Trading in Financial Markets • Our report: Automated analysis of news to compute market sentiment: its impact on liquidity and trading...Gautam Mitra , Dan DiBartolomeo, Ashok Banerjee, Xiang Yu.
Outline • Introduction • What… Why… How. • A commercial • News data • Data sources • Information Contents/Metadata • Summary Information/Views • Information/modelling architecture • Models and Applications • Abnormal Returns • News Enhanced Trading Strategies • Risk Control • Case studies • Risk Control • News Analytics Toolkit • Momentum study • Summary Conclusion
News data: Data sources • Which Asset classes....? • FX- Currency • Commodities • Fixed income (Bonds) • Stocks (Equities) • Wall Street proverb: • ‘Stocks are stories bonds are mathematics’
News Data Feed Providers Market Data Feed Providers Tertiary Market Participants Customers Institutional Customers Retail Customers Main Market Participants Broker-Dealers & Market Makers Retail Brokers & Market Makers ECN Exchange
News data: Data sources • Traders [ High Frequency ] • Fund Managers [ Low Frequency ] • Desktop • Market Data • NewsWire • Web < blogs, twitter, message boards > • Data WareHouse • DataMart
News data: Data sources • Sources of news/informational flows (Leinweber) • News: Mainstream media, reputable sources. • Newswires to traders desks. • Newspapers, radio and TV. • Pre-News: Source data • SEC reports and filings. Government agency reports. • Scheduled announcements, macro economic news, industry stats, company earnings reports… • Web based news • Social media: Blogs, websites and message boards • Quality can vary significantly • Barriers to entry low • Human behaviour and agendas
News data: Data sources • Financial news can be split between • Scheduled news (Synchronous) • Unscheduled news (Asynchronous, event driven) • Scheduled news (Synchronous) • Arrives at pre scheduled times • Much of pre news • Structured format < XML..XBRL > • Often basic numerical format • Typically macro economic announcements and earnings announcements
News data: Data sources • Unscheduled news (Asynchronous, event driven) • Arrives unexpectedly over time • Mainstream news and social media • Unstructured, qualitative, textual form • Non-numeric • Difficult to process quickly and quantitatively • May contain information about effect and cause of an event • To be applied in quant models needs to be converted to an input time series
Information contents/Metadata Key Attributes include: • Entity Recognition • Relevance • Novelty • Events categories • Sentiment Preanalysis extracts/computes/mines these attributes and using text analysis and AI-classifiers sentiment scores are created This is the (news) metadata Also the news flow/the intensity influences the resulting sentiment
Information/modelling architecture Mainstream News metadata Pre-News Pre-Analysis (Classifiers & others) Analysis Consolidated Data mart Updated beliefs, Ex-ante view of market environment • Entity Recognition • Relevance • Novelty • Events • Sentiment Score Web 2.0 Social Media • Quant Models • Return Predictions • Fund Management / Trading Decisions • Volatility estimates and risk control News Flow/Intensity (Numeric) financial market data Information value chain Data… …information… knowledge Data analysis Data mart quant models
Analysis ..synthesis ..miningentity recognition Identify entities such as companies in news stories using point-in-time sensitive information: • Short names • Long names • Common abbreviations • Common misspellings • Securities identifiers • Subsidiaries
Analysis ..synthesis ..mining relevance Calculate the relevance of a story to a given company: • Mentions in the text • Positioning in the story (headline vs. last paragraph) • Total number of companies mentioned • Detect roles played by companies in the story • Represent the context numerically
Analysis ..synthesis ..mining novelty Is the news story "new" or novel? • Elementize the various characteristics of a news story • Distinguish between similar vs. duplicate stories • Define a time window between stories Example: Toyota’s Vehicle Recall (news flow in the first 30 minutes)
Analysis ..synthesis ..mining:event categories Company news and events are categorized: • Identify actionable events • The more detailed the event, the better • Differentiate between scheduled vs. unscheduled news events • Distinguish between explanatory or predictive inputs
Summary information and views Thomson Reuters News Analytics Equity coverage and available data • Coverage • Equity: All equities ............................34,037 (100.0%?) Active companies ................32,719 (96.1%) Inactive companies............. 1,318 (3.9%) Equity coverage by region Americas: ...............................14,785 APAC: .....................................11,055 EMEA:.......................................8,197 Equity Coverage Updates: Bi-weekly updated for recent changes (de-listings, M&A,IPOs). History: Available from January 2003 (history kept for delisted companies; symbology changes tracked). RavenPack News Analytics Equity Coverage by Region All equities...................................28,279 (100%) Americas: ...................................11,950 (42.24%) Asia: ............................................8,858 (31.31%) Europe:...................................... 5,859 (20.71%) Oceania: ....................................436 (5.08%) Africa: .........................................186 (0.66%) For the most updated list of supported companies download the companies.csv file at: https://ravenpack.com/newsscores/ Historical Data: Data format: Comma separated values (.csv) files Date/Time info: In Universal Coordinated Time (UTC) Archive Range: Since Jan 1, 2005 Archive Packaging: Monthly .csv files compressed in .zip files on a per year basis
Summary information • Other suppliers • Deutsche Boerse < Alpha Flash > • Bloomberg ‘Black box newsfeed’ • Dow Jones Elementized Newsfeed
Summary information and views • Tetlock et al. event study shows “information leakage”
Average Stock Price Reaction to Negative News Events Summary information and views Source: Macquarie Quant Research –May 2009
Summary information and views Average Stock Price Reaction to Positive News Events Source: Macquarie Quant Research –May 2009
Summary information and views Illustration of Seasonality (Hafez, RavenPack)
Outline • Introduction • What… Why… How. • A commercial • News data • Data sources • Information Contents/Metadata • Summary Information/Views • Information/modelling architecture • Models and Applications • Abnormal Returns • News Enhanced Trading Strategies • Risk Control • Case studies • Risk Control • News Analytics Toolkit • Momentum study • Summary Conclusion
Model & Applications… (abnormal ) Returns • Traders and quant managers … identify and exploit asset mispricings before they correct … generate alpha • News data can be used • Stock picking and generating trading signal • Factor models • Exploit behavioural biases in investor decisions
Model & Applications… (abnormal ) Returns • Stock picking and generating trading signal • Sentiment reversal as buy signal: J Kitterell uses a sequence of P, N scores as a means of testing sentiment reversal. • Momentum strategy enhanced by news sentiment scores Macquarie research also Sinha reports results with Thomson Reuters data.
Model & Applications… (abnormal ) Returns Behavioural biases Odean and Barber (2007) find evidence individual investors have a tendency to buy attention grabbing stocks. Professional investors better equipped to assess a wider range of stocks they are less prone to buying attention grabbing stocks Da, Engleberg and Gaoalso consider how the amount of attention a stock received affects its cross-section of returns. Use the frequency of Google searches for a particular company as a measure of attention. Find some evidence that changes in investor attention can predict the cross-section of returns.
Model & Applications… (abnormal ) Returns • Stock picking and generating trading signal • Li (2006) simple ranking procedure • … identify stocks with positive and negative sentiment • 10 K SEC filings for non-financial firms 1994 – 2005 • Risk sentiment measure – count number of times words risk, risks, risky, uncertain, uncertainty and uncertainties appear in management discussion and analysis section • Strategy long in low risk sentiment stocks • short in high risk sentiment stocks • … reasonable level returns • Leinweber (2010)– event studies based on Reuters NewsScope Sentiment Engine
News Enhanced Algorithmic Trading • Information/modelling architecture • Modelling architecture • Pre-trade – Post trade Analysis Characterize asset behaviour/dynamics by • Asset Price/Return • Asset (Price) Volatility • Asset (Price) Liquidity Construct trading models using these measures
Market Data Bid, Ask, Execution price, Time bucket Predictive Analysis Model Price/Returns Volatility News Meta Data Time stamp, Company-ID, Relevance, Novelty, Sentiment score, Event category… Liquidity
Pre-Trade Analysis Automated Algo-Strategies Post Trade Analysis (Analytic) Market Data Market Data Predictive Analytics Post Trade Analysis Low Latency Execution Algorithms Report Feed Trade orders Price, volatility, liquidity News Meta Data Feed Market Data News Data Ex-Ante Decision Model Ex-Post Analysis Model
Applications: Risk management Traditionally historic asset price data has been used to estimate risk measures. ex post retrospective measures fail to account for developments in the market environment, investor sentiment and knowledge Significant changes in the market environment Traditional measures can fail to capture the true level of risk (Mitra, Mitra and diBartolomeo 2009; diBartolomeo and Warrick 2005) Incorporating measures or observations of the market environment in risk estimation is important
EQUITY PORTFOLIO RISK (VOLATILITY) ESTIMATION USING MARKET INFORMATION AND SENTIMENT Leela Mitra Co-authors: Gautam Mitra and Dan diBartolomeo . Sponsored by:
Case study: Outline • Problem setting • Model description • Updating the model using quantified news • Study I • Study II • Discussion and conclusions
Introduction & background • Tetlock et al. (2007) note there are three main sources of information • Analyst forecasts • Publicly disclosed accounting variables • Linguistic descriptions of operating environments • If first two are incomplete third may give us relevant information • Tetlock et al. (2007) introduce “news” to a fundamental factor model
Problem setting • Three main types of factor models • Macroeconomic – use economic variables as factors (Chen, Ross and Roll; Sharpe) • Fundamental – based on firm specific (cross-sectional) attributes (BARRA and Fama-French) • Statistical – factors are unobservable and derived via calibration, often orthogonal. • Differ on sources of risk (uncertainty); can be shown to be rotations of each other.
CHANGES TO MARKET ENVIRONMENT CHANGES IN OPTION IMPLIED VOLATILITY CHANGES IN ASSET COVARIANCE MATRIX TRADERS REACT Problem setting • Need for models to update risk structure as environment changes • diBartolomeo and Warrick (2005) update covariance estimates using option implied volatility • Traders respond quickly in an intelligent fashion
Model description • An extension of diBartolomeo & Warrick(2005) • In two parts • “Basic” statistical factor model • Factor variance estimates are updated for changes in option implied volatility
Model description • We construct a statistical factor model using principal component analysis to find orthogonal factors • Update the asset variances using option implied volatility data
Model description • For each asset for which we have option implied volatility data • We wish to identify the new factor variances and asset specific variances implied by updated asset variances • Solve this set of simultaneous equations to derive the values, subject to some further conditions
Model description • Further conditions • Allow for structure that is expected of principal component factors • Assume factor variances do not decline substantially from one period to the next • Similarly assume asset specific variances do not decline substantially from one period to the next
Study I • Period 17 January 2008 to 23 January 2008 • EURO STOXX 50 • Market sentiment worsened • Option implied volatility measures surged • Few key events • Large interest rate cut • George Bush announced stimulus plan • Soc Gen hit by Jerome Kerviel rogue trader scandal