210 likes | 346 Views
International Seminar on Rapid Estimates, Ottawa - May 2009. Building Flash Estimates for Selected PEEIs. Foreword. In 2005 the Economic and Financial Committee stressed the need to improve timeliness “ more use of flash estimation techniques for European aggregates should be considered ”
E N D
International Seminar on Rapid Estimates, Ottawa - May 2009 Building Flash Estimates for Selected PEEIs
Foreword • In 2005 the Economic and Financial Committee stressed the need to improve timeliness • “more use of flash estimation techniques for European aggregates should be considered” • In 2006 Eurostat launched a call for proposals to develop methods and tools to produce flash estimates of 3 short-term economic indicators (GDP, IPI and LCI) • Joint proposal from France (Insee), Germany (Destatis), Italy (Istat) and United-Kingdom (ONS) accepted in November 2006. • Presentation of the joint project (objectives and results) that ran from January 2007 to October 2008
Outline • Foreword • Eurostat Grant “Flash estimates for certain Principal European Economic Indicators” • What is a flash estimate? • What is a “good model”? • The variable selection problem • Some results for LCI, IPI and GDP
Eurostat Call for Proposal • Launched in June 2006 • “Flash estimates for certain PEEIs” • Monthly IPI for Euro-area and EU25 at 30 days (42) • Quarterly GDP for Euro-area and EU25 at 30 days (45) • Quarterly LCI for Euro-area and EU25 at 45 days (74) • It must be seen as an EXPLORATORY project • Explore the possibilities (and techniques) to improve timeliness through flash estimation techniques • Explore also the advantages and drawbacks: implementation, maintenance, communication etc.
A Joint Proposal • A proposal by 4 NSI’s • France (INSEE), Germany (DESTATIS), Italy (ISTAT), United-Kingdom (ONS) • Work divided in 2 phases • Phase 1 (9 months), Preparatory work • Literature, collecting experiences, preparing databases, developing methodologies etc. • Phase 2 (12 months) • Simulations, evaluation and comparison of models, bibliography
What is a Flash Estimate? • We had strong discussion, not to say arguments on this point …. • A “flash estimate” is like a Giraffe: easy to recognize but not so easy to define • A first definition (Eurostat – Barcellan’s paper) • “A flash estimate is an early estimate produced and published as soon as possible after the end of the reference period, using a more incomplete set of information than the set used for traditional estimates” • A clear and secure definition but quite restrictive • No way to produce our flash estimates according to this definition
A more flexible definition • An agreement • Pure AR models (no new information) can be used as benchmarks but are not accepted for FE • Still a disagreement • Can a model be based on soft data (BCS) only? • A compromise • FE models must incorporate hard data (as much as possible). • Easier of course for quarterly data (monthly hard data available for at least part of the quarter)
What is a “good” model? • Simple (few variables) • You can handle it during production …. • Interpretable • You can explain it and …. “sell” it • Good “statistical properties” including robustness • You do not want to change your model every month • Good estimations (small revisions) • You publish! Credibility (“Trustful statistics”, Peter Everaers). • “I prefer no data than misleading data” (Drummond) • Mazzi & Montana’s paper (Eurostat) • “The selected model should be as simple as possible, statistically sound, easy to use in the regular production process ….”
Selecting a batch of explanatory variables • A-priori selection based on several criteria • Timeliness • Economic theory, expert knowledge • Available hard information on the period to “flash-estimate” • Soft data are very timely • Coincident and leading variables: opinion on the current and expected production • Hard data may be leading • New orders • But the selection can be quite large
The variable selection problem • Example: estimating the EA13 GDP • Potential explanatory variables: • IPI, New orders, Energy prices, HICP, Unemployment, Business surveys (Industry, Retail trade, Construction, Services) etc. Easy to find at least 20 variables (monthly and/or quarterly) • 13 countries + EA, 2 lags? • It comes to 20*(13+1)*3=840 potential variables • More than 20 billions possible models with 4 explanatory variables!!!!
Variable selection methods • You must reduce the set of explanatory variables • Trial and Error approach scarcely works • NIESR approach (J. Mitchell) • Drastically restrict the set (Expert knowledge) and then evaluate all possible models • GETS (General to specific) approach • Start from an over-parameterized model and use statistical tests to reduce it • Dynamic Factor Analysis approach • Summarize the set of variables with uncorrelated factors and use some of them in the model • Cluster Analysis approach
Which dependant variable ? • Annual (red) or quarterly (black) growth rate? • No big difference; note the level shift …. • RMSE of an ARIMA model on the linearized quarterly growth rate of SA data : 0.07
Some Remarks • No real problem • Lot of available information (monthly or quarterly) at t+45 • Quite large number of models with good statistical properties and excellent Rsquare (>0.7) • Pure AR models do not perform as well. • Mixed of “monthly” indicators from Business Surveys and other “hard” data.
A “difficult case”: the IPI • Annual (red) or monthly (black) growth rate?
The target variable • Gian Paolo Oneto’s remark on volatility …. • RMSE of an ARIMA model on the linearized monthly growth rate of SA data : 0.7 (1.1 on the annual growth rate). It is huge !!!! • Very few (not to say no) hard data available at t+30 • Difficult to propose a good and simple enough model. • Simple models only explain a small part of the volatility. Very difficult to publish the flash estimate.
An “interesting case”: the GDP • Annual (red) or quarterly (black) growth rate? • RMSE of an ARIMA model on the linearized quarterly growth rate of SA data : 0.3 (quite large)
Some Remarks • Lots of available information (monthly or quarterly) at t+30 • You can reduce the RMSE and find models with a RMSE close to 0.2 but it is still to much. • Why should we take a risk of a big revision to gain only 15 days? • You can do better but with more complex models difficult to handle in production
Last considerations • Note that you look for the “best” model to flash estimate the « worst » figure (the first estimation will always be revised) • To do that, you use “non homogeneous” data (mixed between first, second, third … releases) • It is always better to estimate several models • In case a X-variable is not available • Because “pooled estimations” are better • It is often better to do the estimation at the National level and then aggregate but more models to maintain
Conclusion • Quite disappointing isn’t it? • The solution seems to fasten the production process and respect the first (and restrictive) definition of flash estimates. • But it puts the burden on the National institutes and it could be very costly. • Anyway, in France • We try to get early estimates of the IPI from a restricted sample; • We are working on a GDP flash at t+30 which would respect the way we compute QNA