150 likes | 165 Views
This presentation explores the challenges and opportunities of utilizing big data in European statistics. It discusses strategies for deriving new insights using micro-data, the need to connect and integrate data, and the skills required for statisticians in the era of big data. The presentation also highlights the interest and potential cooperation among central banks in exploring the usefulness of big data.
E N D
ECB-UNRESTRICTED Aurel Schubert Director-General Statistics Strategies for European statisticsNew insights using micro data ESS Big Data Workshop 2016 Ljubljana, 13-14 October 2016 Disclaimer: The opinions expressed in this presentation are not necessarily those of the European Central Bank (ECB) or the European System of Central Banks (ESCB)
Overview Big Data – The challenge of moving to more granular data 1 Big Data – Discovery and piloting 2 3 Strategy for new insights
New policy needs drive new statistics • Statistics needs to stay relevant • An increasing heterogeneity world Increasing responsibilities and new challenges for statistics 3
Paradigm shift - Moving to more granular data Quality • Example: Micro-level statistics • Securities by security statistics • Holdings of individual securities • Money market statistics reporting • Loans by loans register (Ana Credit) • Register of Financial Institutions • Individual supervisory data Reliability Consistency 4
Need to connect the dots Linking and integrating Data Management Database infra-structure • Data Quality / Methodology • Common Platform, Data marts • Linking Data Warehouse(s) • Analytical toolbox, user sandbox Semantics • Map and link data (sets) • Information Model/ Data Dictionary • Standards, Identifiers, Master Data • Standardisation of Micro Data Statistics analysis • Best practice in analytics • Data Discovery • Business Analysis Communication outreach • Visualisation & Presentations • Communication • Outreach to frequent users • Explainers
The human dimension – extended Swiss knife Core skills of a statistician • Economics - Understanding of the economic phenomena • Statistics – Statistics methodology and concepts • Research - Modelling, algorithm and errors terms • Lawyer - Drafting legal regulations and guidelines • IT - Building infrastructures, programming and databases • Project management – Planning and implementing • Detective – Assuring quality and detecting errors • Coordinator–STC/ESCB and country knowledge • Analyst – Analysing results • Communicator – Presenting results and methodology New skills • Data science – large datasets, Engineering and Mathematics • Technical skills – Machine learning and data mining • IT skills – Hadoop, Spark, NoSQL, Pypton, • Visualisations – Patterns, discovery, new tools (Tableau)
Big Data – Discovery and piloting Should statisticians play a role, contribute and develop the concept of “Big data” or is it only a temporary phenomenon ? IFC Survey on Big Data • Aim was to assess central banks’ experiences and interest in exploring big data related to financial and economic topics of interest to central banks • IFC on-line survey with 69 responses (83% response rate) • Big data is not just about large data sets • “Pretty Big” Data
Big Data – Discovery and piloting • At senior policy level, there is significant interest in big data within the central banking community 66% • Despite the interest, central banks have limited experience in use of big data 30% • Central banks are interested in cooperating together on specific topics to explore the usefulness of big data 71%
Big Data – Discovery and piloting • Big data can be useful for central banking purposes and is perceived as useful for supporting central banking policies • Central banks are interested in cooperating together in a structural approach • Explore synergies to overcome barriers and challenges
Big Data – Discovery and piloting IFC way forward • to define and contribute to a “big data” roadmap • Share and contribute to selected big data pilot projects • administrative dataset (e.g. corporate balance sheet data) • web search data set (e.g. Google type search info) • commercial dataset (e.g. credit card operations) • financial market data (e.g. frequency trading, price spreads)
Strategic directions for statistics • Managing micro & big data to derive useful statistics • New semantics and methodology (standardisations) • Linkingand integratingdatasets (overcoming silos) • New and efficient production platforms • New skill-sets and staff training • International collaborations • Communication and outreach
Thank you for your attention Any questions?
Annex ECB & Google search data • ECB receives weekly data from Google search machines in a CSV file • The data is an index of weekly volume changes of Google queries by geographic location and category • Google search data is more accurate and uses much larger samples than Google Trends • Google search data includes the following 14 countries: Austria, France, Italy, Slovenia, USA, Belgium, Germany, Netherlands, Spain, United Kingdom, Denmark, Ireland, Portugal, Sweden • Google search data includes 26 categories and 269 subcategoriesE.g. Finance is a category and Banking is a subcategory • The data are normalised starting at 1, one can see the relative change in Google searches by category but nothing can be said about the absolute search volumes
Annex ECB & uses of Google search data/big data • Findings of the ECB Statistics Paper Series released on this topic • “Nowcasting GDP with electronic payments data” by John W. Galbraith and Greg Tkacz • Electronic payment transactions and cheques can be used to formulate nowcasts of current gross domestic product growth • Assesses this technique and finds that debit card transactions contribute most to forecast accuracy • “Social media sentiment and consumer confidence” by Piet J. H. Daas and Marco J. H. Puts • What is the relationship between the changes in Dutch consumer confidence and the Dutch public social media? • The changes in social media sentiment have the same underlying phenomenon as Dutch consumer confidence • Could be used as an indicator for changes in consumer confidence and as an early indicator • “Quantifying the effects of online bullishness on international financial markets” by Huina Mao, Scott Counts and Johan Bollen • The researchers develop a measure of investor sentiment, based on Twitter content and Google search queries • Twitter and Google bullishness are positively correlated to investor sentiment • Twitter bullishness is able to predict increases in stock returns • The results appear to support the investor sentiment hypothesis in behavioural finance
Annex ECB & uses of Google search data/big data (cont’d) • Pipeline publications by the ECB staff • “Big data – the hunt for timely insights and decision certainty: Central banking reflections on the use of big data for policy purposes” by Per Nymand-Andersen • Big data might lead to new economic theories with statistical algorithms applied to multiple big data sources from various disciplines finding new causations • Big data as opportunity for the central banks to apply expertise in testing existing and new models, data sets and theories; to explore new data sources and to obtain new, timely knowledge from the feedback loop between monetary policy and market reactions • Central banks need to start by taking a structural approach to systematically testing the use of non-official big data sources • “Predicting euro area unemployment rate using Google data: Central banks interest and use of big data” by Per Nymand-Andersen and Heikki Koivupalo • Explores how Google search data has been used for macro-economic and financial purposes within the literature • Tests how Google search data can be used for predicting the euro area unemployment rate in advance of the official statistics • Demonstrates that applying Google data within a simple model can improve the predictability of the euro area employment rates • Further testing is needed with the Google search data to establish its usefulness for central banking statistical and analytical toolkit