340 likes | 432 Views
Multiple Indicator Cluster Surveys Data Dissemination and Further Analysis Workshop. Data Quality from MICS4. Looking at data quality – Why?. Confidence in survey results Identify limitations in results Inform dissemination and policy formulation All surveys are subject to errors.
E N D
Multiple Indicator Cluster SurveysData Dissemination and Further Analysis Workshop Data Quality from MICS4 MICS4 Data Dissemination and Further Analysis Workshop
Looking at data quality – Why? • Confidence in survey results • Identify limitations in results • Inform dissemination and policy formulation • All surveys are subject to errors
Data quality • Two types of errors in surveys • Sampling errors • Non-sampling errors: All other types of errors, due to any stage of the survey process other than the sample design, including • Management decisions • Data processing • Fieldwork performance, etc • All survey stages are interconnected and play roles in non-sampling errors
Data quality • Sampling errors can be estimated before data collection, and measured after data collection • More difficult to control and/or identify non-sampling errors
Data quality • MICS incorporates several features to minimize non-sampling errors: A series of recommendations for quality assurance, including • Roles and responsibilities of fieldwork teams • Easy-to-use data processing programs • Training length and content • Editing and supervision guidelines • Survey tools • Failure to comply with principles behind these recommendations leads to problems in data quality
Data quality • Survey tools to monitor and improve quality, assess quality, identify non-sampling errors • Field check tables to quantitatively identify non-sampling errors during data collection and to improve quality • Possible with simultaneous data entry, when data collection is not too rapid • Data quality tables to be produced at the time of final report
Data quality • Results from data quality tables used to • Identify departures from expected patterns • Identify departures from recommended procedures • Check internal consistency • Completeness • Produce indicators of performance
Data quality analyses • Various surveys from different regions used to • compile • aggregate • graphically illustrate results in data quality tables
Age distribution by sex (DQ1) • Age heaping • Out-transference • Omission • Extent of missing cases • Sex ratios
Low percentages in household lists with missing data on age Evidence of out-transference and/or heaping for under-5s Out-transference from age 15 – for women Large heaping on age 50 – possible out-transference but also digit preference for males and females
Completion rates by age - women and under-5s (DQ2, DQ3, DQ4, DQ5) • Fieldwork performance – re-visits, good planning • Completion rates need to be high, but also uniform by age and background characteristics • Low completion rates in younger women, for better-off groups, for less accessible or challenging areas are likely to bias results
Severe heaping on age 5 or more probably, out-transference (C3, C4, C8, C9) Completion rates are high for all ages
High ratios of women age 50-54 to 45-49 Completion rates generally high in all age groups, but very low for young women in one country (C8) Completion rates by quintiles not alarmingly different
Completeness of reporting – various (DQ6) • Missing information is problematic in surveys • Rule of thumb: Keep missing/don’t know/other cases to less than 10 percent – larger percentages may lead to biased results • No tolerance to missing information on some key variables, such as age, date of last birth
Good results for salt testing On other key variables, C2 and C5 look very problematic Poor performance on dates has an impact on (almost) all other indicators in the survey – from eligibility to calculation of indicators
Completeness of anthropometric data (DQ7) • Assessing data quality of anthropometric indicators is relatively easy • Many tools have been developed for this purpose • Expected patterns, recommended procedures, completeness • Completeness of anthropometric data influenced by • Birth date reporting • Children not weighed, measured • Bad quality measurements
Large percentages of children excluded from analysis in two surveys (C2, C3) Both incomplete date of birth and poor quality measurements can be responsible Low percentages of children not weighed
Completeness of anthropometric data by age may be a concern, if large differences exist – may lead to biases in the anthropometric indicators Usually U or J shaped distributions – large ranges in completion by age in some surveys (e.g. C2)
Heaping in anthropometric data (DQ8) • “Digit preference” – failure to record decimal points, or round… • Or even worse, truncate • Is known to have significant impact on results • Systematic truncation of measurement results can lead to biases of up to 5-10 percent in anthropometric indicators • May be due to insufficient training, use of non-recommended equipment
Excess ratio: Percent reported with 0 or 5 / 20 • Should be close to 1.0 • Lower in weight than height
Observation of documents (DQ10-DQ12) • Objective is to see the maximum percentage of specified documents, and copy information from documents onto questionnaires • Better quality when majority of information is coming from documents
No significant pattern, but overall, low percentages of documents seen • This adds to those cases who do not have the documents in the first place
Respondent for under-5 questionnaire (DQ13) • Respondent for under-5 questionnaire (DQ13) • Random selection for child discipline module (DQ14) • Sex ratios among children ever born and living (DQ16)
Mothers are found and interviewed Correct selection of children for child discipline module Sex ratios among children ever born are expected to be around 1.05, within the range 1.02 to 1.07 – see C2, C8 Expected pattern: higher sex ratios among children deceased – C8?
Percentage of children incorrectly selected for the child discipline module