290 likes | 696 Views
Data Quality Control. by Naila Baig Ansari Research Fellow Dept of Community Health Sciences The Aga Khan University Karachi, Pakistan. Who am I?. Education:
E N D
Data Quality Control by Naila Baig Ansari Research Fellow Dept of Community Health Sciences The Aga Khan University Karachi, Pakistan
Who am I? Education: MSc (Epidemiology), The Aga Khan University, 2001. Thesis: Care and feeding practices and their association with stunting among young children residing in Karachi-s squatter settlements BBA (Management), The College of William and Mary, Williamsburg, VA, USA, 1989 Research interest: Nutritional and behavioral epidemiology, methodological issues in dietary assessment methods, household food security and gender-related issues, care and feeding practices, management of data and questionnaire designing
Learning Objectives • To know the steps necessary for ensuring quality assurance and control of data at various stages of a study • To understand the difference between pilot testing and pre-testing • To understand the importance of designing data collection instruments • To understand how data can be managed using an audit trail and the various techniques that can be used to inspect your dataset after it has been entered
Performance Objectives • Know the difference between quality assurance and quality control and ways to ensure them • Know the objectives of a pilot test and a pre-test • Understand how data collection instruments should be designed and coded • Be able to manage data using an audit trail • Be able to inspect datasets for errors and rectify them
Quality Assurance Activities to ensure quality of data before data collection Quality Control Monitoring and maintaining the quality of data during the conduct of the study Data Quality Control • Data Management • Handling and processing of data throughout the study
Steps in Quality Assurance • Specify the study hypothesis • Specify general design to test study hypothesis Develop an overall study protocol • Choose or prepare specific instruments • Develop procedures for data collection and processing Develop operation manuals • Train staff Certify staff • User certified staff, pretest and pilot-study data collection and processing instruments and procedures
Quality Assurance: Standardization of procedures • Why is standardization important? • In order to achieve highest possible level of uniformity and standardization of data collection procedures in the entire study population • Preparation of written manual of operations • Detailed descriptions of exactly how the procedures specific to each data collection instrument are to be carried out (BP example) • Q by Q’s (question by question) instructions for interviews
Quality Assurance: Training of Staff • Aim to make each staff person thoroughly familiar with procedures under his/her responsibility • Training certification of the staff member to perform a specific procedure
Pretesting Involves assessing specific procedures on a sample in order to detect major flaws Pilot Testing Formal rehearsal of study procedures Attempts to reproduce the whole flow of operations in a sample as similar as possible to study participants Quality Assurance: Pretesting and Pilot testing
Pretesting and Pilot testing results • Pretesting of questionnaire used to assess: • flow of questions, • presence of sensitive questions, • appropriateness of categorization of variables, • clarity of the q by q instructions to the interviewer • Pilot testing • In addition to the above, flow of process
Quality Assurance: Data Management Designing data collection • Layout, questions to ask, sequence of questions, phrasing of questions, response categories, skip patterns • Collect and record “raw”, not processed information (eg. Age) • Codebook: link between the questionnaire and the data entered in the computer
Quality Assurance: Use of a Code book • Variable names • Up to 8 characters a-z and 0-9, must start with a letter • Combination of question number and description (eg. q3age) • Meaning: • short text description describing the meaning of the variable • SPSS software can incorporate this info as variable labels and display it in the output
Quality Assurance: Use of a Code book • Codes • Try and use numerical codes • Predecide codes for no response, missing values • Question could not be asked or not applicable (eg. pregnancy outcome) • Question was asked but respondent did not reply (eg salary) • Respondent replied “don’t know”
Quality Control Observation of procedures and performance of staff members for identification of obvious protocol deviations • Strategies include: • Over-the-shoulder observation of staff • Taping all interviews and reviewing a random sample • Ongoing field supervision • field editing by interviewer as well as field supervisor • Office editing which includes coding • log book maintenance • Statistical assessment of trends over time in the performance of each observer/interviewer/technician
Data Management: Audit trail • Researcher should be able to trace each piece of information back to the original document: • ID included in the original documents and in the dataset • All corrections must be documented and explained • All modifications to the dataset must be documented by command files • Each analysis must be documented by a command file • Purpose of audit is to • protect yourself against mistakes, errors, waste of time and loss of information • enable external audit (revision)
Data Management: Handling of Data • Entering data • Use professional data entry program like EpiData • Preparations • complete codebook • examine questionnaires for obvious inconsistencies, skip patterns
Data Management: Handling of Data • Error prevention: • Set up a data entry form resembling your questionnaire • Define valid values before entering data • double data entry by two different operators • compare contents to get list of discrepancies (EpiInfo) • correct errors in both files and run new comparison
First Inspection of data. Error Finding • Add variable and value labels to your data using a syntax command • Searching for errors • make printouts of codebook from the data, overview of variables, simple frequency tables of appropriate variables • compare codebook created with original codebook and see if label information is correct • Inspect the generated summary/frequency tables for illegal or improbable minimum and maximum values of variables and inconsistencies (eg. 250 years age, pregnant male; 23 yr woman with 19 yr son) • Calculate the error rate by • randomly select 10% or at least 40 of your questionnaires and re-enter them into new file
Correction of errors - Documentation • If errors are discovered • Make corrections in a command file (SPSS syntax file), this will provide full documentation of changes made to the dataset • If errors are discovered when comparing files after double data entry • you can make corrections directly in the data entered, provided you end this step with a comparison of the two files entered and corrected
Correction of errors - Documentation • Split the process into distinct and well-defined steps and that your documentation from one step to another is consistent • Archive • once you have a “clean” documented version of your primary data, save one copy in a safe place and do your work with another copy
Analysis • Make sure you use the right data set • recommend to create command files for analysis which start with the command reading the dataset • Late discovery of errors and inconsistencies
Backing up vs Archiving • Backing up • everyday activity • purpose to able you to restore your data and documents in case of destruction or loss of data • not only datasets, but also command files modifying your data, written documents such as the protocol, log book and other documenting information • Archiving • takes place once or a few times during the life of the project • purpose is to preserve your data and documents for a more distant future, maybe to even allow other researchers access to the information.