220 likes | 389 Views
Survey Methodology Survey data entry/cleaning. EPID 626 Lecture 10. To do or not to do: Contracting the work. During study planning, you should decide whether to do the data entry, management, and analysis yourself, or whether to contract with someone else to do it
E N D
Survey MethodologySurvey data entry/cleaning EPID 626 Lecture 10
To do or not to do:Contracting the work • During study planning, you should decide whether to do the data entry, management, and analysis yourself, or whether to contract with someone else to do it • What are the advantages and disadvantages? • When might you want to? When might you not want to?
Contracting • Advantages • Specialized expertise • Potential ability to access national network of personnel • Reduction of load on study personnel • Third party (without financial or professional stake in results) increases legitimacy of the results
Contracting • Disadvantages • Generally more expensive • Is this true? Discuss profits vs. expertise and efficiency • Lose direct control over quality of data and study conduct • May be more difficult to interpret data without having done the analysis
DIY: Now what? • Data analysis plan • Data entry • Data diagnostics • Data cleaning • Data setup
Data analysis plan (DAP) • Design from the protocol and the survey instrument • Note: they may be discrepant • Aim: • Resolve discrepancies before you start working with the data • Establish a clear plan for data management and analysis
DAP elements • Summarize methods • For each survey objective, identify and describe the relevant variables • Identify the analysis methods • Software • Statistical methods, tests, significance levels, definitions
DAP elements (2) • Describe plan for handling: • missing values • out-of-range values • zeros if doing log transformations • data collapsing • Describe subgroup or by-group analyses
DAP elements (3) • Set up dummy tables and graphs • Review this DAP carefully and pass it around
Data entry • Design a database that resembles the survey instrument in layout and format • Pretest it extensively • Designer should be present at the beginning of data entry to fix bugs • Double data entry? • Avoid necessity of interpretation by entry personnel
You and Your Data Your first eight hours together
First things first • Virus-check the files • Write protect original data • Back up files and CRFs • On-site: hard drives, diskettes, safes • Off-site: safe deposit box
First things first (2) • Import data • Error prone; be very careful here • Validate and verify the data
Validating and verifying data • Run frequencies for categorical variables • Run univariate statistics for continuous variables • Examine key variables (those used in the evaluation of primary objectives) • Look at variables by group (sex, age, etc)
Validating and verifying data (2) • Recode missing values • Calculate checks for error prone variables • Ex. Check dates against time-to variables • Check anything that the interviewer had to calculate, such as a total score • Derive any key variables that need to be calculated from other variables, and verify them too
Validating and verifying data (3) • Rearrange, combine, or separate datasets as needed for analysis • Ex. Split demographic data, primary outcome, secondary outcome data • Annotate a survey instrument with variable names • Create a data dictionary • Include variable name, type, length, and description or label
Validating and verifying data (4) • Look for obvious errors • Ex. Spelling of medication or medical condition • Be very careful about correcting them • Document any changes • Think about a query system • May need interviewer to resolve errors
Validating and verifying data (5) • Run rough crosstabs for reference • Ex. Number by sex, group, and age • Use to track observations • Create data listings • Very useful for reference and to identify problems in the data • Check data coming from different sources • Be very careful with merging
Validating and verifying data (6) • Aside: Variable naming • Should be meaningful and descriptive • But be careful about overly descriptive names • Long variable names are difficult to manipulate • If meaning appears obvious, people won’t look it up • Back all of this up in the same way you backed up the original data