130 likes | 145 Views
Emilio Di Meglio Eurostat Unit B2 (Methodology and Research). Data Analysis in Official Statistics . What is data analysis ?. Data analysis is the process of looking at and summarizing data with the intent to extract useful information and develop conclusions.
E N D
Emilio Di Meglio Eurostat Unit B2 (Methodology and Research) Data Analysis in Official Statistics
Q2010 What is data analysis ? Data analysis is the process of looking at and summarizing data with the intent to extract useful information and develop conclusions. The systematic study of data so that its meaning, structure, relationships, origins, etc. are understood. The process of systematically applying statistical and logical techniques to describe, summarize, and compare data. Data Information Conclusions
Q2010 Data Analysis and Official Statistics There are two contrasting views: Official statistics should give only ‘objective facts’. Thus, the statistics presentation should refrain from taking particular models to elaborate analysis and interpretation Analysis can give statistics producers valuable insights for both communication with users and future improvements of statistics and its production process.
Q2010 Two options for data analysis in Official Statistics • Data Analysis to better communicate results • Explore relationships, causes and effects for better « story telling » • Interactive visualization tools • Web 2.0 • Virtual reality • Data Analysis to improve production processes and quality • Explore patterns, relationships, causes and effects for having a feedback on the production process • Gather elements to improve quality
Q2010 Data Analysis to improve processes Better understanding data has the potential of improving data production processes. Some possibilities: Detection of outliers in a multidimensional space Simplification of the survey questionnaire Targeted Quality improvement Thoughtful design of traditional tables and graphs, to track processes in its essential dimensions Advanced analysis with interpretation and conclusions on causes, etc …
Q2010 Which methods? Data Analysis is a wide subject, we need to restrain the scope to some aspects Exploratory Data Analysis Robust methods Visual techniques Standardization Easy to share solutions
Q2010 Some useful techniques Multidimensional Data Analysis (MDA) techniques Principal Component Analysis Correspondence Analysis Classification Modelling techniques used in an exploratory framework Regression methods Logistic regression Visualization techniques
Q2010 An example of data analysis use: improving HICP samples (C. De Gregorio) Correspondence analysis on some indicators built on air transport data
Q2010 Country 2 Country 1 Country 4 Country 3 Air transport prices
Q2010 What are the needs Tools Plenty of tools to perform data analysis techniques already exist, need to inventory and integrate them in production. Most of these methods are implemented in the main statistical packages Open source and in particular R can improve exchange of good practices Best practices Guidelines for standard application of techniques and interpretation of results.
Q2010 ESS cooperative project on data analysis It has been decided to launch a cooperative action to improve the use of data analysis techniques Workshop EDAVIS 27/28 May. OBJECTIVES: Knowledge building: Literature review, case studies Methodological development and pilot testing of methods Development of tools: common library of tools Exchange of knowledge and tools, guidelines. Training courses and knowledge-transfer actions. Enabling international comparability of analysis and visualisation methods.
Q2010 Some proposals Development of diagnostic tools To highlight observations which deviate from the trends To identify outliers To inform about the validity of model assumptions To indicate whether classical or robust estimates are required. to give an insight into data structure and data quality. to visualise missing values in the data and possible patterns of missing values Development of visualisation tools for data exploration Development of visualisation tools for spatially dependent data
Q2010 To conclude Exploratory data analysis techniques have a good potential for supporting statisticalproduction processes and improving quality of outputs Need to integrate existing methods/tools in the statistics production Facilitate their use in NSIs Use of open source solution for easier share of method/tools (R?) Best practices