E N D
Data Editing, Coding, and Just a Little Imputation Katherine (Jenny) Thompson Office of Statistical Methods and Research for Economic Programs Katherine.J.Thompson@census.gov (301) 763-4941
The Basics: What is Editing? Editing (procedures) review reported/keyed data for errors and pinpoints “inconsistent” values • For “industry” • For respondent Editing does not change the data. Items that fail edits are • referred to an analyst; or • automatically imputed (replaced with consistent values)
The Basics: What Is Imputation? Imputation is the replacement of a missing or incorrectly reported item using logical edits or statistical procedures. In other words, Imputation replaces a missing or incorrect data item with an “educated guess.”
The Basics: What is Coding? Coding is the assignment of recognizable values to flags that describe key characteristics of the unit or item, such as • Industry (unit level) • Response status (unit or item level) • Source of data correction (item level) • Imputation model (item level)
We Begin With Coding Before we can evaluate whether a response is reasonable, we have to know where it comes from: • Classification variable(s) value, e.g., industry, state • Frame information may be erroneous or • unit may have changed classification value Each unit must be assigned classification code(s) before editing/imputation
We End With Coding At the end of the processing cycle, we want to know • How the data were changed, • Where the data were changed, • Why (if possible) data were changed, and • The final status of the reporting unit (respondent, non-respondent).
Some Edit Definitions Editing: Procedures for detecting “incorrect” keyed or respondent data. Micro-Editing: Editing at the individual record (questionnaire) level Macro-Editing: Editing at the tabulated value level