240 likes | 421 Views
Preparing Data for Analysis: Part II There’s More to do BEFORE you Analyze Data. MCRC Biostatistics Didactic Workshop Yona Keich Cloonan, PhD Senior Statistician Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Epidemiology Data Center,
E N D
Preparing Data for Analysis: Part IIThere’s More to do BEFORE you Analyze Data MCRC Biostatistics Didactic Workshop Yona Keich Cloonan, PhD Senior Statistician Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Epidemiology Data Center, Department of Epidemiology Graduate School of Public Health
OVERVIEW We previously discussed… • Data Storage • File Formats • Data Dictionaries • Special Missing Values These are all aspects of Data Management
OVERVIEW Today we will discuss additional Data Management Strategies • Detailed Documentation • Variable Naming • Data Entry • Applied to… • Laboratory Measures • Questionnaires
LAB MEASURES Review… Special Missing Value Codes Indicate WHY Items are Missing • For Laboratory Values • If below or above the limit of detection, provide a special missing value code • Avoid < or > signs (or other symbols) in numeric fields • If symbols are used, the item will be imported as text rather than as a number
LAB MEASURES MrOS Cytokines Dataset • Two Documentation Files • 1. Data Dictionary • 2. Details of Data File Contents • Brief Description of Study Design & Sample • Assay Descriptions • Detectable limits of assay • Useful for writing methods sections • Variable List • Explanation of Special Missing Values
LAB MEASURES MrOS Cytokines Dataset a b c • Three Variables per Cytokine: • Numeric Value (level of cytokine) • Categorical Variable (cytokine value is within, below or above detectable limits of assay) • Flag (identifies extrapolated values)
LAB MEASURES • Suppose CRP Levels are below detectable limits of assay • Suppose CRP Levels are missing due to inadequate specimen? • CYCRP = ? • CYCRPC = ? • Can we include subject in calculation of Means? Quartiles?
LAB MEASURES MrOS Cytokines Dataset Suffix C = Category E = Extrapolation Prefix CY = Cytokines Cytokine IL6SR, TNF
QUESTIONNAIRES Naming Conventions • Variable names should be… • Unique • Consistent • Descriptive • …yet Brief
QUESTIONNAIRES Naming Conventions • Descriptive Names vs Question Number • Question # may change with updated versions • Question # may be ok for standardized measures • Descriptive names may become too long
QUESTIONNAIRES Naming Conventions • Prefixes • Identify questionnaire or measure • qlc = PedsQL, Child Report • Suffixes • Abbreviations for commonly used data types • dt = Date, mm/dd/yyyy Example Date of PedsQL Child Interview, mm/dd/yyyy • qlcdt prefix suffix VARIABLE NAME VARIABLE DESCRIPTION
QUESTIONNAIRES Naming Conventions Example • qlcrun prefix descriptive name DESCRIPTIVE
QUESTIONNAIRES Naming Conventions Example • qlca2 Section A prefix Question # Assigned section letter/number Section B QUESTION #
QUESTIONNAIRES Naming Conventions • Physical Functioning • abbreviation:phy Section A • Emotional Functioning • abbreviation: emo Section B
QUESTIONNAIRES Naming Conventions Quick Notes • Record naming conventions with your data dictionary
QUESTIONNAIRES Naming Conventions Use the rules to name variables! Example • qlcpsyrun prefix descriptive item name section abbreviation
QUESTIONNAIRES Data Dictionary
QUESTIONNAIRES Annotation
QUESTIONNAIRES Naming Conventions Subject ID: __ __ __ __ Visit #: ___ • Don’t forget! • Multiple versions may be available Date(mm/dd/yyyy): __ __/__ __/__ __ __ __ qlcvernum qlcverage • We already incorporated respondent into the questionnaire prefix • qlc for child report • qlp for parent report
QUESTIONNAIRES Subject ID: __ __ __ __ Visit #: ___ Date(mm/dd/yyyy): __ __/__ __/__ __ __ __ Quick Notes • Label everypage • Label everyform • Label everysample • Include information which uniquely identifies each subject at each encounter • Subject ID • Date • Visit # (if >1 visit) • DO NOT include identifiable information
QUESTIONNAIRES Additional Considerations • Data Collection • Administration Guidelines • Interviewer- or Self-Administered • Data Entry • Data Entry Forms (Excel vs Access) • Double Entry • Data Manipulation • Reverse Scoring • Summary Scores • Missing Data
QUESTIONNAIRES Data Entry Some Basic Guidelines Be Straight-Forward • Enter data as it appears on the form in front of you Make It Easy • Create a step-by-step data entry protocol • Make your data entry form look like your paper Save Scoring for Later • Reverse Coding can be automated in the data clean-up phase • Use automated scoring algorithms • Never recode the original variables! Create new ones.
QUESTIONNAIRES Data Entry • Quality Control Basics • Double Entry • Acceptable Value Ranges - at time of entry • Data Checks - after time of data entry • Review ranges of data for outliers & impossible values • Document how to treat outliers • Do you create an indicator variable? • Do you set value to missing for analysis? • Logical Checks (e.g. height 5’6” for a 5 year-old?!)
DATA SHARING • General Guidelines for Sharing Data • Do you have IRB approval to share data? (If not, get it!) • Share data & password only with individuals who need to work directly with the data file • Encrypt emailed data files • Share password by phone or in person • Do NOT email the password • REMOVE identifiable information