310 likes | 491 Views
Data Quality assurance in Bangladesh Demographic & Health Surveys. Nitai Chakraborty Professor, Department of Statistics, Biostatistics & Informatics, University of Dhaka. Organization of the Presentation. Data Quality Assurance : Concept & coverage
E N D
Data Quality assurance in Bangladesh Demographic & Health Surveys Nitai Chakraborty Professor, Department of Statistics, Biostatistics & Informatics, University of Dhaka
Organization of the Presentation Data Quality Assurance : Concept & coverage Components of data quality assurance steps undertaken in Demographic & Health Surveys (DHS) during field implementation & data processing
Data Quality Assurance: Concept & coverage Quality assurance could be understood as total quality management paradigm that examines the survey process at each step of survey design and implementation to improve the output of the survey in terms of its relevance, accuracy, coherence and comparability. The major areas of survey where “ Quality assurance “ protocol should be strictly administered are: Selection of survey institution Sampling design & sample size Designing of data collection instruments & testing the instruments Recruitment & training of field staff Data quality control during field implementation & data entry process
Data quality control steps during field implementation & data entry process The components of data quality assurance steps in most of DHS surveys during field implementation are: Supervision & monitoring of field work through “ Quality control” teams Monitoring data quality & performance of field staff through Field -Check tables Double entry verification Secondary editing
Monitoring of field work through “Quality control” teams All DHS Surveys employed Quality control teams to work in the field for the entire duration of the field work, circulating among all teams, to ensure that : The field interviewers are observed closely by the senior staff during first few days of field work and give back their immediate feedback to interviewers. Thoroughly edited all completed questionnaires within a day of the interview or at least before the team leaves the sample cluster. Supervisors and field editors ensure that all questionnaires are thoroughly scrutinized and all errors are tactfully discussed with the interviewer. In most DHS surveys, one of the supervisor’s responsibilities is to conduct re- interviews with approximately 5 percent of the households covered in the survey. The purpose of the re-interviews is to ensure that interviewers are visiting the selected households and that they do not intentionally leave out eligible household members or misreport their ages so as to reduce their workload.
Monitoring data quality with Field -Check tables Field-check tables are one way of monitoring data quality while the field work is still in progress. All DHS surveys including BMMS used the Census and Survey Processing System (CSPro) software package for data processing. Field-check tables on important aspect of data quality are produced regularly using CSPro data processing application. Use of the field-check tables is crucial especially during early stages of fieldwork when the option remains to retraining of personnel, modify procedures or re-interview problem clusters Each table focuses on an important aspect of data quality These Field check tables are run by the supervisors every week starting after entering the first batch of cluster data . As fieldwork progresses and becomes more settled and routine, the checks become bi-weekly
Monitoring data quality with Field -Check tables (cntd.) Table FC-1: Household response rate: Monitors the performance of interview teams in terms of non-response to the household questionnaire. The supervisor should be informed and remedial action is needed if a team or interviewer shows an exceptional pattern of non-response Table FC-2: Eligible Women per Household One way for interviewers to reduce their workload is to deliberately omit eligible women from the household or to estimate their ages to be either above or below the cutoff ages for eligibility (15-49). Table FC-2 monitors the number of eligible women per household.
Monitoring data quality with Field -Check tables (cntd.) Table FC-3 Age Displacement Examines whether interviewers are intentionally displacing the age of young women from the eligible range (15 and over) to an ineligible age (14 and under). An Age Ratio less than 100 indicates a deficit of woman 15 years old compared with those 14 and 16 years old and might indicate intentional displacement Table FC-4 Similar to FC-3 this table whether interviewers are displacing women over the age eligibility boundary, i.e. from ages less than 50 years old to ages 50 and over. An Age Ratio less than 100 indicates a deficit of women 49 years old compared with those 48 and 50 years old and might indicate intentional displacement
Monitoring data quality with Field -Check tables (cntd.) Table FC-5: Birth Displacement Some interviewers intentionally displace the birth dates of children from the fourth or fifth year to the sixth year before the year of the survey, so as to decrease the length and difficulty of their assigned interviewing task. This practice seriously undermines the quality of the data. Field-check Table 5 measures the performance of interviewers regarding displacement of births from calendar years after the cutoff date ( say January 2004) to before the cutoff date. If significant displacement has occurred, the birth year ratio will be found much lower than 100, which is the observed ratio when a smooth change in the number of births is observed from the year before the cutoff (2003) to the year after the cutoff (2005).
Monitoring data quality with Field -Check tables (cntd.) Table FC-7: Completeness of Date/Age Information for Births One of the main objectives of the survey is to estimate mortality rates for different age groups of children. This is why data are collected on the age at death of deceased children. Interviewers are required to record at least an approximate age at death for all deceased children. Field-check Table 7 monitors the performance of interviewers regarding birth date completeness. The table is divided into two parts, one for surviving and one for deceased children, since information about deceased children is typically less complete.
Monitoring data quality with Field -Check tables (cntd.) Table FC-8: Heaping on age at death A common problem in the collection of data on age at death is “heaping” at 12 months of age i.e. a large number of deaths are reported at 12 months relative to the number reported at months 9, 10, and 11, or at months 13, 14, and 15. Such heaping can result in the underestimation of the infant mortality rate (based on deaths in months 0-11) and overestimation of the child mortality rate (based on deaths in months 12-23 and years 2-4). Heaping of deaths at 12 months of age is the result of two frequently encountered interviewing situations. The first situation occurs when respondents report age at death as "one year", even though the death may have occurred at 10 months, 16 months, etc. Some interviewers will record "1 year" (incorrectly) or (also incorrect) simply convert "1 year" to 12 months and record that without probing. The second situation in which heaping occurs is when a respondent initially reports that she does not the know the age but, when encouraged to recall the age, reports in terms of a preferred number of months (i.e., 12 rather than 11 or 13).
Monitoring data quality with Field -Check tables (cntd.) Table FC-9 : Underreporting of Infant deaths Underreporting of births and deceased children seriously undermines data quality. This table is useful in determining whether gross underreporting of infant deaths is occurring. However, there is no certain way to determine whether an individual interviewer or team is omitting births of deceased children, because sampling fluctuations and genuine regional differences can produce differences among teams and individuals that are unrelated to data quality. Generally, if the neonatal to infant mortality ratio falls below 0.45, or is significantly lower in one or more teams relative to the others, then omission of neonatal deaths is suspected. Also, if the infant deaths to total birth ratio is substantially lower in one or more teams than in other teams, then omission in infant deaths is suspected.
Monitoring data quality with Field -Check tables (cntd.) Table FC-14: Maternal Mortality Module Although the DHS Maternal Mortality Module is an optional “add-on” to the survey, it is widely used. Field-check Table 14 provides data on three key elements of the module: (1) the proportion of a respondent’s sisters who have died but for whom no age at death is given; (2) the proportion of sisters who died at age 12 or older but for whom there is no information as to whether the death should be classified as a pregnancy-related death; and (3) the proportion of sisters who died at age 12 and above for whom the timing of the death in terms of number of years ago is missing. The target values for the three elements are <2%, 0%, and 0%, respectively
Quality control at data entry stage Double entry verification CSPro supports both dependent and independent verification (double keying) to ensure the accuracy of the data entry operation. For each PSU the is usually entered twice in separate computers. Data entry supervisor uses CSPro menu to run verification program. This program takes as input the original main data file and the verification data file, and scans them for differences. When differences are found, they are output to a file and printed out by the supervisor Each difference is checked against the questionnaire to determine which is the correct entry. Both the verification and main data entry files are corrected as necessary, and the verification program re-run to ensure that no differences remain
Quality control at data entry stage (contd.) Secondary editing After the raw data is backed up , the supervisor run the secondary editing application built in CSPro data entry system . The data entry application in CSPro can be designed in such a way that it will be able to identify any inconsistency at the time of entering the data. However, there are some inconsistencies that require a more in-depth analysis, as well as the attention of subject-matter specialists or senior staff to properly resolve them. DHS chooses to make these checks after data entry is complete in a secondary editing application. The result of running this application is an output reporting the inconsistencies identified. The office editors & subject-matter specialists analyze the messages and decide whether data needs to be changed or not according to “Secondary Editing Guidelines” manual.
Quality control at data entry stage (contd.) EVENT table As part of the secondary editing program, the DHS software compiles an event table. The event table facilitates the consistency checking of the date and interval responses in the questionnaire. It allows up to 30 entries, beginning with the woman's date of birth, followed by the date of her first marriage/union, the birth dates of each of her children from the birth history, the date of sterilization (if appropriate), the date of conception of a current pregnancy (if appropriate) and completed date of interview Resolving inconsistencies in the responses, particularly those involving date and interval information, requires a detailed understanding of the nature and overall objectives of the survey questionnaire as well as the interrelationships among specific questions. Senior staff &subject matter specialist are employed to resolve the inconsistencies.
Quality control at data entry stage (contd.) The crucial events for a woman for which consistency checks usually carried out include: Date of respondent's birth Date of first union Date of birth of each child Date of sterilization Date of conception of current pregnancy(date of interview minus months of current pregnancy Date of interview Age of the respondent Age of child at last birthday (if child alive) Age of child at death (if child died)) Time since last period Duration of use of current method (calendar Col1/) Duration of amenorrhea (live births in last five years) Duration of abstinence (live births in last five years)) Time since last sexual intercourse (if ever had sex)) Age at first sexual intercourse (if ever had sex)