THE USE OF EVALUATION DATA SETS WHEN IMPLEMENTING SELECTIVE EDITING

THE USE OF EVALUATION DATA SETS WHEN IMPLEMENTING SELECTIVE EDITING Karin Lindgren Statistics Sweden

selekt • The process of implementing and using selective editing in surveys have had some progress • We have been working with selective editing in practice • I will talk about some experiences and problems that can occur

selekt When implementing selekt we try to minimize relative pseudo bias(RPB) in the essential combinations of domain and variable in the statistical output. Where d is domain, j is variable and Q is the proportion of records investigated

selekt • To calculate RPB we use data from a completed survey round and simulate selective editing on the raw data using different levels of Q • For the top Q percent of the units, their unedited values are replaced with edited values. All values for the unit is replaced with no consideration of what edit rule that is the cause of the high score. The assumption is made that all errors of a record are found when a unit is manually investigated

Evaluation data set • One of the problems has been to determine and define a data set that can be seen as the final data and used to create and

Using finalized data set in evaluation • In theory it would be natural to use the data set as it looked at the point of publication • The objective of the selective editing would be to reduce the editing in a way that the estimates still are as similar as the published ones • This approach is not always optimal in practice

Problems with using a finalized data set in evaluation • The data set has undergone both micro and macro editing • The major objective with implementing selective editing is often to reduce the micro editing • Then RPB based on the finalized data set is not the target indicator

Suiting evaluation data The target parameter is RPB based on a data set which can be defined as 100 percent micro edited and 0 percent macro edited. The objective of the selective editing is really to reduce the editing in a way that the estimates are as similar to the estimates we get from the 100 percent micro edited data.

Micro edited data as evaluation When simulating selective editing of different levels of Q on the data, if one unit is flagged; all its variables will be changed from their raw values to their edited values. If edited values originating from macro editing are used, the simulation will be misleading if the edit rules creating the score does not involve all variables with changed values.

Micro edited data as evaluation • It can be challenging to recreate the data as it looked after micro editing, especially if the micro and macro editing have taken place partly during the same time period • It requires some method of tracking changes in the data, e.g. versioning

Replacing only flagged items Another approach is to replace the raw values with edited values only for the variables that are part of the edit rules that flagged the unit. Overlooks the behaviour of the editing staff where they correct obvious erroneous items or items that the respondents tell them are wrong even though the items are not flagged.

Replacing only flagged items Can be a way to find variables that are poorly monitored by existing editing rules.

Some advices • Before initiating the implementation of selective editing, review the existing edit rules; • Create a matrix of the relations between variables and edit rules;

Some advices • Investigate whether important variables are poorly covered by the existing edit rules and that changes are made mainly due to macro editing or by other reasons; • In that case, construct additional edit rules;

Some advices • If possible carry out at least one survey round, using the new edit rules before implementing the selective editing. Make sure to save raw data, data as it looked after micro editing and as it looked when the published estimates were created; • Use the data sets from the previous survey round with the new improved edit rules and simulate the selective editing by only setting flagged variables to their edited value;

Some advices • Use RPB to set a suitable threshold. Keep in mind that it can be very hard to get low RPB in all combinations if the survey holds too many important variables and domains. In that case it is more realistic to concentrate on key variables. In some combinations we might have to be willing to accept a bit higher RPB.

THE USE OF EVALUATION DATA SETS WHEN IMPLEMENTING SELECTIVE EDITING