1 / 14

Summary

Automatic Editing Data. A New Version of DIA System Prepared by J.M. Gomez Presented by D.Lorca National Statistical Institute of Spain. Summary. DIA system : Generalized software for automatic editing and imputation of qualitative data based on the Fellegi-Holt methodology

yazid
Download Presentation

Summary

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automatic Editing Data. A New Version of DIA SystemPrepared by J.M. GomezPresented by D.LorcaNational Statistical Institute of Spain

  2. Summary DIA system: Generalized software for automatic editing and imputation of qualitative data based on the Fellegi-Holt methodology • Heuristic algorithm to extend DIA system to continuous and integer data • To modify the treatment of the systematic errors to avoid possible re-imputations

  3. The Heuristic Algorithm • It gives a solution to the error localisation (EL) problem without having to calculate the Complete Set of Edits (CSE) • It avoids for most real cases to have to break-down the Set of Explicit Edits (SEE)

  4. The Heuristic Algorithm When a record R0 fails an edit we determine the minimum set (MS) of variables to impute working not with the CSE but with the SEE plus, if it is necessary a small set of implicit edits (SIE) that is specially required to impute the erroneous record R0

  5. The Heuristic Algorithm • Labour Survey: Variables: 146 Explicit edits: 1,500 Valid values: 3,521

  6. The Heuristic Algorithm

  7. Treatment of systematic errors DIA system contains a module aimed at processing systematic errors: Rules of Deterministic Imputation (RDIs) Example: We assume that a systematic error arises if a record has the values A=1, B=2 and C=3 and if so, we impute the value ‘Blank’ to the variable B

  8. Treatment of systematic errors RDI example: On the left of equal sign we express the systematic error and on the right one we determine the imputation

  9. Treatment of systematic errors Current version: • Firstly, DIA system executes RDIs • After DIA system imputes data following the Fellegi-Holt methodology The gap between both types of processes can bring about possible re-imputations To avoid them we define a new edit named Deterministic Imputation Edit (DIE)

  10. Treatment of systematic errors Steps to convert a RDI into DIE 1) The failure condition imposed on the Deterministic Imputation (DI) variable in the RDI is converted to the failure condition imposed on a new variable named the image of the DI variable: IMA_DIA in the DIE 2) The complement (¬) to the imputation in the RDI is converted to a failure condition imposed on the DI variable in the DIE

  11. Treatment of systematic errors RDI example: DIE example: Both edits express the same and DIE matches the normal form of edit required on the Fellegi-Holt model

  12. Treatment of systematic errors • DIA system calculates the MS of variables to impute taking into account both types of errors together • Given that the MS cannot contain repeated variables the possibility of re-imputations disappears

  13. Conclusions (I) • The heuristic algorithm presented permits to extend the DIA system to quantitative data • It avoids for most real cases to have to break-down the SEE into several subsets reducing the number of imputations

  14. Conclusions (II) • The DIE allows to integrate edits expressing systematic errors with edits expressing random errors according to Fellegi-Holt model and thus we can apply the DIA system simultaneously to both type of errors avoiding possible re-imputations

More Related