1 / 20

UNITED NATIONS - ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS

UNITED NATIONS - ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing Use of administrative data for selective editing: the case of business investments Marco Di Zio (Istat) Paolo Forestieri (Istat) Ugo Guarnera (Istat)

chynna
Download Presentation

UNITED NATIONS - ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. UNITED NATIONS - ECONOMIC COMMISSION FOR EUROPE CONFERENCE OF EUROPEAN STATISTICIANS Work Session on Statistical Data Editing Use of administrative data for selective editing: the case of business investments Marco Di Zio (Istat) Paolo Forestieri (Istat) Ugo Guarnera (Istat) Massimiliano Iommi (Istat) Antonio Regano (Istat) Paris, 28-30/04/2014

  2. Outline Motivations Selective editing using SeleMix Application of SeleMix to investment data Validation Conclusions and future research Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  3. Motivations (1/4) • Investments (Gross Fixed Capital Formation) is of great interest in National Accounts because: • They are an important component of final demand (and then of GDP) • They are an input to estimate capital stock and consumption of fixed capital (that are needed to produce productivity and profitability indicators) • Traditionally National Accounts department in Istat makes some editing to SBS data (it is not a simple “user” of SBS data) • However the selective editing procedure we describe is not “National Accounts specific” Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  4. Motivations (2/4) • Recently administrative data are entering the process of SBS in Istat • Istat currently has access to a rich set of administrative data regarding variables reported in profit and loss accounts and (only for corporations and limited companies) in balance sheets. • Investment data are reported only in the Notes to Financial Statements • Notes comprising a summary of significant accounting policies and details of the values reported in the profit and loss account and in balance sheet • Istat has access to the explanatory notes of corporations and limited companies only in the form of non-standardized text files (one for each company) • No statistical database that could be used to produce SBS data on investment or to automatically correct data • investment values are only gathered through the SBS survey • A valuable source to recover the “true” value of investment for a limited number of firms Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  5. Motivations (3/4) Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  6. Motivations (4/4) • Recovering investment data from the explanatory notes is a time consuming process. • The goal of selective editing framework is just to minimize the number of checks by focusing on the units where editing has the highest expected benefit. • Typically, selective editing is useful if at the review stage it is possible to recover with a high reliability the ‘true’ value • In the case of investments, the revision of the values of the units is done by consulting the Explanatory Notes. Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  7. Selective editing using SeleMix (1/3) • The selective editing method is based on explicitly modelling both true (error-free) data and error mechanism (see Di Zio and Guarnera, 2013, for details). • The model assumptions can be summarized as follows • True data are thought of as n realizations from a random p-vector Y that, conditional on a set of q covariates Xis normally distributed with mean vector BX and covariance matrix Σ • The intermittent nature of the error is modelled through a Bernoullian r.v • Conditional on the presence of error, we assume a Gaussian additive error with zero mean and covariance matrix proportional to Σ • The distribution of true data conditional on observed data is a mixture of a mass density corresponding to absence of error and a Gaussian distribution corresponding to presence of error • The model parameters can be estimated by maximizing the likelihood function based on the observed data Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  8. Selective editing using SeleMix (2/3) • The selective editing strategy consists in using the estimated distribution of true data conditional on observed data to build up a score function. • Score function = difference between observed and anticipated value • The anticipated value is obtained by means of a weighted average of the observed value and a synthetic value. • The weights are given by the probability of being in error. • The synthetic value is in turn the weighted average of the observed value and a robust estimate of the regressed value. • Once all the observations have been ordered according to this score function, it is possible to estimate the residual error remaining in data after the correction of the first k units (k=1,..,n). • The number of most critical units to be edited can be chosen so that the estimate of the residual error is below a prefixed threshold • Selemix allows to explicitly relate the efforts in editing activities to the accuracy of the target estimates. Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  9. Selective editing using SeleMix (3/3) Y*true data Y observed data X covariates (no error) B regression coefficients U residuals I Bernoullian variable: True data model: Error model: Distribution of observed data: Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  10. Application of SeleMix to investment data (1/3) • Data source: Istat Annual Survey on Economic and financial accounts of large enterprises, year 2010 • Actually a census, covering all enterprises operating in Italy with at least 100 employees • Industrial and services sectors excluding financial services • EC Structural Business Statistics. • Units: responding enterprises (5280 observations) • Target variable: gross investment in tangible goods • Covariates: investment of the same firm in the previous year (or, in the case it is not available, the variable depreciation for the current year) • Covariates are considered not affected by errors since they are treated and officially released Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  11. Application of SeleMix to investment data (2/3) • Estimation domains: • Model estimated separately for each NACE Rev2 section • Impact of errors evaluated at the level of 64 industries • corresponding to the classification of economic activity A*64 that is used to disseminate National Accounts data • Threshold used for the selection of critical units is chosen such that the estimated relative residual error for each industry is 5% • threshold chosen taking into account the reasonable number of units that can be checked with the available resources • The model is estimated on the subset of data with positive values of the analyzed variables • predictions are computed also for units whose observed value is zero • A weight for taking into account the non-response is computed and used in the score function Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  12. Application of SeleMix to investment data (3/3) • Hit-rate not very high but the number of selected units very low • Notwithstanding the low number of selected units, quite an high impact on the estimated value: efficient procedure Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  13. Validation (1/5) • The assessment of the quality of the procedure cannot be based only on its efficiency • It is also necessary to quantify the impact of the residual errors on the estimates. • The ideal validation of the procedure would consist in revising all the not selected observations of the sample by looking at their reported values in the notes • We could find the error left in each not selected unit (residual error) • We could compute the impact of residual errors on the released estimate • The manual revision of all not selected observations is not feasible in practice unless a big investment in resources is planned. • Alternative strategy: Manual revision of all the not selected observations in some industries • The analysis of the results in each industry is quite informative, because the threshold used (set equal to 5%) is referred to each industry Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  14. Validation (2/5) • Manually revised three indutries • Industry 17 and 32 satisfactory results • Post editing residual percent error (RE) close to the expected one (5%) • Strong improvement obtained selecting few units • Industry 12 not very satisfactory: • The selective procedure does not select any observation • the error is 12.6%. Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  15. Validation (3/5) • What did happen in industry 12? • The error is due almost entirely to one company that was not selected by the editing procedure: • The company made the same misclassification error in 2009 and 2010 and the error in 2009 was not detected • The company registered as investment of the year both expenditures made in 2010 and reclassifications across different asset categories made during the year • Actually only expenditures of the year should be registred as investment of the year • The company followed the same criteria (then made the same error) when compiling the SBS questionnaire both in 2009 and 2010 and the error in 2009 was not detected. • The values of expenditures and reclassifications in 2010 are quite similar to the values in 2009 and are not outliers in the cross-section distribution Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  16. Validation (4/5) • Reported only to illustrate the type of error. Eni is not the firm we referred to in the previous slide Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  17. Validation (5/5) • From the point of view of the statistical model: • The current value of investment is not atypical conditional on the historical value. • In addition, since 2009 data were treated and officially released, they were used as error-free covariates in the model. • This makes the selection of this error almost impossible. • SeleMix may deal with the case where there is not any auxiliary variable. • In this application, we could have considered also the investments of the previous year a variable with errors • the observation would have been selected only if the values of the investments were jointly considered outliers Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  18. Summing up • The selective editing procedure proved to be quite efficient: strong improvements in the results obtained selecting few units • The selective editing procedure is considered by the survey managers feasible in terms of costs. • The analysis performed to validate the method is encouraging and should be extended to other estimation domains. • A result of the validation analysis is that it is difficult to select an observation with an erroneous investment which it is not atypical with respect to the historical value. Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  19. Future developments • Using other administrative variables as covariate • Expenditures for amortizable goods reported in VAT declarations • A derived variable based on the assets at the end of the year minus assets at the beginning plus depreciation and revaluation • Identifying observations with an erroneous investment which it is not atypical with respect to the historical value • Increasing hit-rate. • Experimenting with an alternative validation procedure, based on the use of samples of not selected observations. • PPS sampling proportional to the scores estimated with SeleMix • Revising all the observations belonging to the domains where the residual error estimated with the previous scheme is considerably far from the prefixed threshold • Removing errors from the historical values • Understanding whether there are other important error mechanisms affecting the quality of the procedure. Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

  20. Thank you Use of administrative data for selective editing the case of business investments, Di Zio et al, Paris, 28-30/4/2014

More Related