1 / 31

Census Data Editing: Structure and Within Record Editing

This workshop covers structure editing techniques including geography edits, hierarchy of records, and correspondence between housing and population records. It emphasizes the importance of accurate data capture and editing relationships within households to ensure reliable census data. Key focus areas include coverage checks, proper record order, and eliminating errors in enumeration areas.

watta
Download Presentation

Census Data Editing: Structure and Within Record Editing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Census Data Editing: Structure and Within Record Editing UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  2. Part I: Structure Editing UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  3. Summary Part I: Structure Edits What are structure edits? Geography edits Hierarchy of records Correspondence between housing and population records Editing relationships in a household Family nuclei UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  4. What are structure edits? Structure edits check coverage and relationships between different units: persons, households, housing units, enumeration areas, etc. Specifically, they check that: all households and collective quarters records within an enumeration area are present and are in the proper order; all occupied housing units have person records, but vacant units have no person records; households must have neither duplicate person records, nor missing person records; enumeration areas must have neither duplicate nor missing housing records. UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  5. Geography edits Each EA must have the right geographic codes (city, province, region...) Every housing unit in an EA should be entered and every record must have a valid EA code The capture process must check this before editing of data commences If errors remain, it is best to find the right code by returning to the enumeration documents and correcting manually, for example. UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  6. Hierarchy of records UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  7. Hierarchy of records UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008 1_EA 2_Housing unit 4_Individual 4_Individual 2_Housing unit 3_Collective living quater 4_Individual 4_Individual 1_EA

  8. Hierarchy of records Type 1 (EA) followed by new Type 1 (if original EA empty) or Type 2 (Housing unit) or Type 3 (Collective Living Quarter) Particular case of homeless people: create a dummy housing record to make structural checking easier Type 2 (Housing Unit) followed by Type 1, 2 or 3 (if original dwelling vacant) or Type 4 (if original dwelling occupied) Type 3 (Collective Living Quarter) followed by Type 4 (Individual) If not occupied, empty CLQ allowed? Type 4 (Individual) followed by Type 4 (other individual in the same dwelling or collective living quarter), or Type 2 or 3 (other dwelling or CLQ) or Type 1 (new EA) UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  9. Correspondence between housing and population records An occupied unit should have at least one person and a vacant unit should have no people: if Type 2 (Housing Unit) & category (vacant) followed by Type 4 (individual) then change the category to occupied The number of occupants recorded on the Housing Unit form should be exactly the same as the sum of the individual records in the household. If not, change the number on the Housing Unit form Population records should be sequenced (numbered) Type 3 (CLQ) & category (Hospital) followed by multiple Type 4 (individual) of category “Retirement home” then change the category of the CLQ to “Retirement home” UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  10. Editing relationships in a household Each individual has a relation to the first person: 1st person (or Head, or reference person) Spouse Child of the 1st or of his/her spouse Parent Other relative Friend Lodger ... UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  11. Editing relationships in a household Household with potential inconsistencies in age reporting UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  12. Family nuclei Father: Sex should be male and Age should be > minimum age Mother Sex should be female and Age should be > minimum age Child Age under a maximum limit ? UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  13. Part II: Within Record Editing UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  14. Summary Part II: Within Record Edits Validity and Consistency Checks Top-down Editing versus Multiple-variable Editing Example of Multiple-Variable Editing Methods of Correcting and Imputing Data Example of Hot Deck for Sample Household (Sex Only) Example of Hot Deck for Sample Household (Sex and Age) Issues Related to Hot Deck Methods of Correcting and Imputing Data: General Principles Edit Trails and the Use of Imputation Flags UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  15. Validity and Consistency Checks • Validity checks are performed to see if the value of individual variables are plausible or lie within a reasonable range • Examples: • 0<=AGE<=110 • SEX= Female or SEX=Male • Consistency checks are performed to ensure that there is coherence between two or more variables • Examples: • Head of Household should have AGE>=15 • A child should be younger than a head of household • A person with AGE<15 should never be married UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  16. Top-down Editing versus Multiple-Variable Editing • Top-down Editing approach starts by editing top priority variable (not necessarily first variable on questionnaire) and moves sequentially through all items in decreasing priority • During editing process, some edits change the value of an item more than once; this can introduce one or more errors in dataset • Example: Child’s age first imputed on basis of mother’s age. Later child’s age re-imputed on basis of reported years of schooling, which might be inconsistent with mother’s age • In this case, child’s age should keep being re-imputed till it is consistent • Important to avoid circular editing! UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  17. Top-down Editing versus Multiple-Variable Editing • Multiple-Editing approach uses a set of rules that state the relationship between variables • Each statement is tested against data to see if true • Edit system keeps track of all false statements relating to invalid entries or inconsistencies • Assessment is then made on how to change record so that it will pass all edits and then decision is made • Fellegi-Holt principle of “minimum change” should be used UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  18. Example of Multiple-Variable EditingHead of household and spouse have same sex UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  19. Example of Multiple-Variable EditingHead of household and spouse have same sex UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  20. Methods of Correcting and Imputing Data • The process of imputation changes one or more responses or missing values in a record or several records to ensure internally coherent records result • Before using any imputation method, the best strategy is to start with manual study of responses; imputation can then handle the remaining unresolved edit failures • Two methods of imputation: Cold Deck and Hot Deck • Cold Deck Imputation: • Used mainly for missing or unknown values (not for inconsistent/invalid values) • Values are imputed on a proportional basis from a distribution of valid responses (e.g., from previous census) • In doing so, cold deck draws values from a fixed (but possibly outdated) distribution of values UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  21. Methods of Correcting and Imputing Data • Hot Deck or Dynamic Imputation: • Used for both missing data and inconsistent/invalid items • Uses one or more variables to estimate the likely response based on data about individuals with similar characteristics • The “donor set” (or imputation matrix) constantly changes through updating; therefore, imputations dynamically change during the process of editing all the records • Thus, hot deck draws from a distribution that dynamically changes with each imputation and eventually (through modifications) “approaches” the distribution of current data set • Caution: if the different items for a particular record have unknown values, hot deck may not use the same “donor” to impute for both missing values; in this case, it is preferable to use the same donor for both items UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  22. Example of Hot Deck for Sample Household (Sex Only) UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  23. Example of Hot Deck for Age (Sex and Relationship) Initial Imputation Matrix For Age Based on Sex and Relationship UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  24. Example of Hot Deck for Age (Sex and Relationship) UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  25. Example of Hot Deck for Age (Sex and Relationship)Initial Imputation Matrix For Age Based on Sex and Relationship Dynamic Imputation Matrix After Multiple Changes UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  26. Issues Related to Hot Deck • Devise dynamic imputation matrices based on people living in same small geographic area since they tend to be homogeneous with respect to many characteristics, i.e., different imputation matrices for different geographic areas should be created • Sometimes the simplest approaches are best: for example, for a missing housing attribute, it may be preferable to use the value of a neighboring household rather than using a complex imputation matrix that may result in the assignment of a value from outside the neighborhood • Before using dynamic imputation, an effort should be made to use related items instead. For example, if marital status is missing for an individual and there exists a spouse for that individual, then the value “married” should be assigned • One should edit key items such as age and sex first so that these can be used in other imputation matrices for lower priority items UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  27. Issues Related to Hot Deck • Construct imputation matrices based on research from administrative sources or previous censuses and surveys • Standardized imputation matrices, (i.e., having standard dimensions, such as age and sex (e.g., for language)) can streamline process since they can be tested and applied quickly • BUT if language missing, first look to language of others in the same household or to race, ethnicity, birthplace before using dynamic imputation; i.e., an attempt should be made to use related information to assign values before resorting to imputation • Some editing teams keep more than one value per cell in imputation matrices to protect against same value being imputed multiple times; e.g., in case of 4 male children in household all with ages unknown, different values will be assigned UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  28. Issues Related to Hot Deck • Imputation matrices that are too big (with too many dimensions) cannot be updated thoroughly, leading to inefficiencies and inaccuracies • Imputation matrices that are too small (with too few dimensions or too few groupings within dimensions) may lead to the same donor value being used repeatedly in imputation before the matrix is updated • Some items such as occupation and industry are notoriously difficult to edit since the large number of categories can make dynamic imputation very cumbersome; in such cases, may be counter-productive to impute and may be preferable to use “not stated” UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  29. Methods of Correcting and Imputing Data: General Principles • Imputed record should closely resemble the failed edit record; impute for a minimum number of variables • Imputed record should satisfy all edits • All imputed values should be flagged and methods and sources of imputation should be clearly specified • Both un-imputed and imputed values should be stored to allow for evaluation of degree and effects of imputation UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  30. Edit Trails and the Use of Imputation Flags • Important to generate edit trail showing all data changes and substituted values with their tallies • Counters of several types are essential to process planning and management: i) number of cases of each type of error; ii) non-response rates for each item; iii) imputation rates for each item, …. • Imputation flags are binary flags that change from initial value of 0 to 1 if original value of data is changed in any way; flags should be added onto each item that is imputed • Although a separate file with imputation flags takes up considerable space, this information is critical for planning of future censuses; e.g., As a means to investigate age threshold below which female with “child ever born” triggers a query edit and to decide if threshold should be modified for future rounds UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

  31. THANK YOU! UNSD Regional Workshop on Census Data Processing for the English speaking African Countries: Contemporary technologies for data capture, methodology and practice of data editing Dar es Salaam, Tanzania, 9-13 June 2008

More Related