190 likes | 312 Views
Maintaining data quality: fundamental steps. Agenda. The whole process Questionnaire design Data collection Software design Data entry. The whole process. Questionnaire Design. Data collection. Asking the right questions, in the right way Structure the questionnaire effectively.
E N D
Agenda • The whole process • Questionnaire design • Data collection • Software design • Data entry
The whole process Questionnaire Design Data collection • Asking the right questions, in the right way • Structure the questionnaire effectively • Veracity • Quality of survey • Quality of filling questionnaires Software design Data entry and management • Minimize data entry errors • Check that data is errorless • Organize data in an effective way • Clean data
Agenda • The whole process • Questionnaire design • Data collection • Software design • Data entry
Questionnaire design • Clear skip patterns whenever needed. • The software designer will then need to include those in the data entry software. • Grids • Single/multiple options • Interviewer checkpoints • When coding your questions, make sure that all options are included. • For example, if there is a chance, even small, that people will say “I don’t know”, do include the code “-999” in the question.
Pilot and translate survey • Pilot: in non research areas, but similar setting • Depending on how ready questionnaire is, 30 to 40 pilots • Can also pilot some sections more intensively • Translation: back translation is MANDATORY
Agenda • The whole process • Questionnaire design • Data collection • Software design • Data entry
Data collection: surveyors • Selection • Training: before survey, and on-going • Before survey: • Classroom and field • Questionnaire + field instructions + behavior on field • Training on the issue of interest • Also, if you have time to do an instruction manual, it is useful • Keep going to the field with them and do reminder trainings (ex. You notice they prompt too much etc.) • Maintain motivation: go out with them, bonuses etc. • STAY IN THE FIELD WITH THEM
Data collection: quality checks • Team structure • One supervisor for five surveyors • A field monitor if your team is big to help you manage the team • Monitoring on the field • Accompaniments by supervisor: all the time • Accompaniments by monitor: 75% of the time • Accompaniments by yourself: maybe 15% of the time • Back-checks by field monitor: 15% of questionnaires, some sections (mandatory!) • Do some back-checks yourself • Analyse the data from back-checks right away! • If you use a survey company, you still need to do your own back-checks and some accompaniments
Questionnaire quality: scrutiny • Scrutinize questionnaires • Have surveyors, and supervisors do it • But also do it yourself! • If you have a project assistant, ask him to scrutinize 100% but still scrutinize 50% or so yourself (at least most tricky sections) • Examples of instances where only you can catch mistakes: codes for activity, logical consistency • When scrutinizing, write all codes, even if not pre-coded • “-777” for missing, or “-999” for “I don’t know” • If you find too many missing data, or data not consistent, send surveyors back to the field
Agenda • The whole process • Questionnaire design • Data collection • Software design • Data entry
Data management: goals • Quality • Timing • Timing is important, and you need to monitor the Data Entry Officers (DEO) or the Data Entry (DE) company carefully to make sure they stick to timelines, but by no mean you should sacrifice any steps related to quality check (if you save time on those steps, you’ll lose time later).
Data entry software • Software • Need to think about it as soon as questionnaire close to final • Could be done by survey company or outsourced to someone else (less expensive, or someone you trust better) • Goal is that DEO should be able to do as few mistakes as possible
Data entry software • Software developing: send the developer a detailed spreadsheet indicating instructions for each question (what is the range of acceptable values, logical checks, etc.). The more detailed this will be, the more time you’ll save later. • Software testing: When a software designer does the software, you need to test it your self by entering a bunch of questionnaires (for e.g pilot questionnaires, or also invent the responses, just make sure you test all the parts of the software). • Check output: Then look at the output carefully and make sure it looks fine, and also send it to the professors you work with to make sure they are satisfied with the output.
Checking output • When checking output try to imagine yourself analyze the data! • All field need to be numerical (except text fields, like comments or “others – specify”). Again, there is not much you can do with text fields when you analyse. • One example: when questions have multiple choice responses (let’s say the question is “where do you take your water from?” and there are 5 options “well, tap, etc.”) • This question should be considered as 5 questions (1. Do you take your water from the well? Yes or no 2. Do you take your water from the tap? Yes or no etc.). • The response for this question will be a binary variable (i.e either 1 (yes) or 0 (no). • This becomes obvious if you put your self in the shoes of the person who will analyse the data (among others, you!). If this is considered as only one question, and the DEO fills “1, 2, 5” in the unique response field, you can not do anything with that data!
Agenda • The whole process • Questionnaire design • Data collection • Software design • Data entry
Data entry • Timing: Data entry should start no as soon as possible after data collection start – and before collection is over! • Double entry: Mandatory. Must be written in contract. • One output • Two outputs, reconciled • Error checking: Check the error rate on a regular basis (batches of 200 or 300 questionnaires). And before you do any cleaning • Payment to DE company: In contract, clause that the first payment will be done only after 200 or so questionnaires have been given to you, the error rate checked by you, and less than 0.5%. Pay only after that. • Get bad data re-entered entirely: whatever is the nature of the errors
Error rate checking • What is it? For each batch, re-enter a sample of data fields and compare this data with the data given by the company (for those fields) • Need approximately 3000 by batch • How to do? • Divide your data in sub-sections (of about 25 questions) • In some cases you will receive your data split in tabs – you can use those tabs as sub-sections – if small enough • For each sub-section select 5% of questionnaires in your batch, randomly selected • Enter data from that section of the selected questionnaires (using an excel spreadsheet, or the data entry software) • Compare your dataset with original data (use stata, excel, or comparison software), and check on physical questionnaire who did the mistake • Error rate: numbers of errors made by the company/number of fields (one error is one field with a mistake, not one question!) • Calculate error rate for each section, and overall
Data cleaning and organizing • Clean your data in a different file • Rename and label variables • Check for logical errors • Look at ranges and outliers • Do basic data summaries • Check for duplicate data • Check for missing data • Look at distribution of data by surveyors/teams