130 likes | 227 Views
Monitoring year-to-year variation in structural business statistics. Contribution to Q2008 – Rome, 9 July 2008 Session: Editing and Imputation I Guy.Vekeman@ec.europa.eu, ESTAT – G1. Structural Business Statistics (SBS). Yearly statistics covering the ‘Business Economy’ (NACE sections C-K).
E N D
Monitoring year-to-year variation in structural business statistics Contribution to Q2008 – Rome, 9 July 2008 Session: Editing and Imputation I Guy.Vekeman@ec.europa.eu, ESTAT – G1
Structural Business Statistics (SBS) • Yearly statistics covering the ‘Business Economy’ (NACE sections C-K). • Geographical coverage: EEA + candidate countries • Many characteristics: financial, employment… • Multiple breakdowns: activity, size class, region Actors • Produced by National Statistical Institutes, using uniform definitions (but data collection methodologies may vary) • Role of ESTAT: collecting, validating data flows, confidentiality treatment, publishing data series.
Data validation: a ‘macro editing’ tool Main variation causes of aggregates in data flows: • performance of the individual enterprises • change in the composition of the set of enterprises • raw data error (misreporting) • data processing error (editing flaw).
Essential characteristics of macro-editing tool • not to overload correspondents with false alerts • suitable threshold to single out influential anomalies Previous practice • Symmetric [-20%, +20%] confidence interval • Applied to all (but a few) characteristics • Possibly generating hundreds (if not thousands) of “anomalous” variations • Skilful application required by ESTAT database manager • Small aggregates vary more -> Unreasonable burden for NSI of small countries
Factors influencing evolution of SBS data <- can be compensated for • Macro-economic • Economic growth • Inflation PPI/CPI (SBS data are in current prices) • Currency fluctuations • Micro economic • Prospering of enterprises • Business demography in the sector • Administrative: business register related • Registering enterprises / deregistering merged, closed down or suspended units • Activity classification of enterprises
Heuristics: Basic assumptions • Assumptions: • year-to-year variations (YTYV) of individual enterprises = set of random observations • Enterprises very unevenly distributed in size and the YTYV of large corporation influential on the sector average. • Economies of scale come to our rescue: since large YTYV more typical for small companies. • variance of average: YTYV ~ 1/n • Standard deviation on the average • Knowing economic growth G and inflation I, change of the aggregates could be estimated. • So can we expect Vtє [Vt-1* (1+Gt)*(1+It)*(1 ± 2.σ/√nt-1)] with 95% probability ? • No, because of several sources of bias
Heuristics: sources of bias • Non-financial business economy: NACE C-K \ J: not a full coverage • Stratification by NACE: non-random sample -> heavily biased sector evolution, moreover: • We use one unique ‘inflation’ number (CPI) instead of array of sectoral PPI • GDP is a sum of values added. Other characteristics: possibly different evolution Result of bias: expectation value => expectation interval
Heuristics: variability of characteristics • A few characteristics can be negative of close to zero: • Change in stocks or work in progress (frequently) • Gross operating surplus (rarely) • Value added (almost never) • Consequences: • Volatile characteristics -> large % YTYV • Variance increase of the characteristics • Measures taken: • Dropping volatile characteristics • Widening confidence limits of expectation interval (lack of predictability ≈ extra bias source)
Heuristics: Bringing it together • (Standard) Confidence interval limited by a Standard lower boundary (SLB) and standard upper boundary (SUB) • Adapted boundaries: number of enterprises in year t-1 • SLB / ( ) ; SUB * ( ) • 2. σ imply 95% confidence limits, leaving 5% anomalies (too many) … but we have no idea about σ. => 2. σ is considered a parameter: We fit the value 4 to obtain an 80% reduction of the number of ‘anomalies’ as compared to previous practice.
Heuristics: Method applied • Standard Confidence Interval: • width depending on characteristics • tuned using CPI and/or growth data (compare in national currency) • Symmetrical on log-scale • Tuned interval for ‘Business demography’ characteristics. • SLB / ( ) < (nt/nt-1) < SUB*( ) • Tuned interval for Financial characteristics • [SLB / (1+…) * (1+real growth) * (1+inflation rate) ; SUB*(1+ …) * (1+real growth)*(1+inflation rate)] • Tuned interval for Employment characteristics • [SLB / (1+…) * (1+real growth); SUB*(1+ …) * (1+real growth)]
Implementation and discussion • Deterministic method => programmed in Access for distribution • Test more tolerant on small aggregates => Reduced burden for small MS (confirmation in ‘2003-04 field test’) • Raising awareness on influential changes • 'macro-editing tool‘: signalling suspicious aggregates: • Business demographic change? • Micro-data to be reviewed? Selective editing of ‘suspect’ subset. • Same ‘macro editing tool’ front end (NSI) and back end (ESTAT) -> shorter validation cycle • Field test: Number of anomalies varies between 0.37% and 4.6% (!) • Correlation low (0.15) between ‘country size’ (number of inhabitants) and anomaly frequency: small and large MS are treated on equal footing.