140 likes | 269 Views
Automatic Editing Data. A New Version of DIA System Prepared by J.M. Gomez Presented by D.Lorca National Statistical Institute of Spain. Summary. DIA system : Generalized software for automatic editing and imputation of qualitative data based on the Fellegi-Holt methodology
E N D
Automatic Editing Data. A New Version of DIA SystemPrepared by J.M. GomezPresented by D.LorcaNational Statistical Institute of Spain
Summary DIA system: Generalized software for automatic editing and imputation of qualitative data based on the Fellegi-Holt methodology • Heuristic algorithm to extend DIA system to continuous and integer data • To modify the treatment of the systematic errors to avoid possible re-imputations
The Heuristic Algorithm • It gives a solution to the error localisation (EL) problem without having to calculate the Complete Set of Edits (CSE) • It avoids for most real cases to have to break-down the Set of Explicit Edits (SEE)
The Heuristic Algorithm When a record R0 fails an edit we determine the minimum set (MS) of variables to impute working not with the CSE but with the SEE plus, if it is necessary a small set of implicit edits (SIE) that is specially required to impute the erroneous record R0
The Heuristic Algorithm • Labour Survey: Variables: 146 Explicit edits: 1,500 Valid values: 3,521
Treatment of systematic errors DIA system contains a module aimed at processing systematic errors: Rules of Deterministic Imputation (RDIs) Example: We assume that a systematic error arises if a record has the values A=1, B=2 and C=3 and if so, we impute the value ‘Blank’ to the variable B
Treatment of systematic errors RDI example: On the left of equal sign we express the systematic error and on the right one we determine the imputation
Treatment of systematic errors Current version: • Firstly, DIA system executes RDIs • After DIA system imputes data following the Fellegi-Holt methodology The gap between both types of processes can bring about possible re-imputations To avoid them we define a new edit named Deterministic Imputation Edit (DIE)
Treatment of systematic errors Steps to convert a RDI into DIE 1) The failure condition imposed on the Deterministic Imputation (DI) variable in the RDI is converted to the failure condition imposed on a new variable named the image of the DI variable: IMA_DIA in the DIE 2) The complement (¬) to the imputation in the RDI is converted to a failure condition imposed on the DI variable in the DIE
Treatment of systematic errors RDI example: DIE example: Both edits express the same and DIE matches the normal form of edit required on the Fellegi-Holt model
Treatment of systematic errors • DIA system calculates the MS of variables to impute taking into account both types of errors together • Given that the MS cannot contain repeated variables the possibility of re-imputations disappears
Conclusions (I) • The heuristic algorithm presented permits to extend the DIA system to quantitative data • It avoids for most real cases to have to break-down the SEE into several subsets reducing the number of imputations
Conclusions (II) • The DIE allows to integrate edits expressing systematic errors with edits expressing random errors according to Fellegi-Holt model and thus we can apply the DIA system simultaneously to both type of errors avoiding possible re-imputations