1 / 24

Preparing Data for Analysis: Part II There’s More to do BEFORE you Analyze Data

Preparing Data for Analysis: Part II There’s More to do BEFORE you Analyze Data. MCRC Biostatistics Didactic Workshop Yona Keich Cloonan, PhD Senior Statistician Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Epidemiology Data Center,

mina
Download Presentation

Preparing Data for Analysis: Part II There’s More to do BEFORE you Analyze Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preparing Data for Analysis: Part IIThere’s More to do BEFORE you Analyze Data MCRC Biostatistics Didactic Workshop Yona Keich Cloonan, PhD Senior Statistician Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Epidemiology Data Center, Department of Epidemiology Graduate School of Public Health

  2. OVERVIEW We previously discussed… • Data Storage • File Formats • Data Dictionaries • Special Missing Values These are all aspects of Data Management

  3. OVERVIEW Today we will discuss additional Data Management Strategies • Detailed Documentation • Variable Naming • Data Entry • Applied to… • Laboratory Measures • Questionnaires

  4. LAB MEASURES Review… Special Missing Value Codes Indicate WHY Items are Missing • For Laboratory Values • If below or above the limit of detection, provide a special missing value code • Avoid < or > signs (or other symbols) in numeric fields • If symbols are used, the item will be imported as text rather than as a number

  5. LAB MEASURES MrOS Cytokines Dataset • Two Documentation Files • 1. Data Dictionary • 2. Details of Data File Contents • Brief Description of Study Design & Sample • Assay Descriptions • Detectable limits of assay • Useful for writing methods sections • Variable List • Explanation of Special Missing Values

  6. LAB MEASURES MrOS Cytokines Dataset a b c • Three Variables per Cytokine: • Numeric Value (level of cytokine) • Categorical Variable (cytokine value is within, below or above detectable limits of assay) • Flag (identifies extrapolated values)

  7. LAB MEASURES • Suppose CRP Levels are below detectable limits of assay • Suppose CRP Levels are missing due to inadequate specimen? • CYCRP = ? • CYCRPC = ? • Can we include subject in calculation of Means? Quartiles?

  8. LAB MEASURES MrOS Cytokines Dataset Suffix C = Category E = Extrapolation Prefix CY = Cytokines Cytokine IL6SR, TNF

  9. QUESTIONNAIRES Naming Conventions • Variable names should be… • Unique • Consistent • Descriptive • …yet Brief

  10. QUESTIONNAIRES Naming Conventions • Descriptive Names vs Question Number • Question # may change with updated versions • Question # may be ok for standardized measures • Descriptive names may become too long

  11. QUESTIONNAIRES Naming Conventions • Prefixes • Identify questionnaire or measure • qlc = PedsQL, Child Report • Suffixes • Abbreviations for commonly used data types • dt = Date, mm/dd/yyyy Example Date of PedsQL Child Interview, mm/dd/yyyy • qlcdt prefix suffix VARIABLE NAME VARIABLE DESCRIPTION

  12. QUESTIONNAIRES Naming Conventions Example • qlcrun prefix descriptive name DESCRIPTIVE

  13. QUESTIONNAIRES Naming Conventions Example • qlca2 Section A prefix Question # Assigned section letter/number Section B QUESTION #

  14. QUESTIONNAIRES Naming Conventions • Physical Functioning • abbreviation:phy Section A • Emotional Functioning • abbreviation: emo Section B

  15. QUESTIONNAIRES Naming Conventions Quick Notes • Record naming conventions with your data dictionary

  16. QUESTIONNAIRES Naming Conventions Use the rules to name variables! Example • qlcpsyrun prefix descriptive item name section abbreviation

  17. QUESTIONNAIRES Data Dictionary

  18. QUESTIONNAIRES Annotation

  19. QUESTIONNAIRES Naming Conventions Subject ID: __ __ __ __ Visit #: ___ • Don’t forget! • Multiple versions may be available Date(mm/dd/yyyy): __ __/__ __/__ __ __ __ qlcvernum qlcverage • We already incorporated respondent into the questionnaire prefix • qlc for child report • qlp for parent report

  20. QUESTIONNAIRES Subject ID: __ __ __ __ Visit #: ___ Date(mm/dd/yyyy): __ __/__ __/__ __ __ __ Quick Notes • Label everypage • Label everyform • Label everysample • Include information which uniquely identifies each subject at each encounter • Subject ID • Date • Visit # (if >1 visit) • DO NOT include identifiable information

  21. QUESTIONNAIRES Additional Considerations • Data Collection • Administration Guidelines • Interviewer- or Self-Administered • Data Entry • Data Entry Forms (Excel vs Access) • Double Entry • Data Manipulation • Reverse Scoring • Summary Scores • Missing Data

  22. QUESTIONNAIRES Data Entry Some Basic Guidelines Be Straight-Forward • Enter data as it appears on the form in front of you Make It Easy • Create a step-by-step data entry protocol • Make your data entry form look like your paper Save Scoring for Later • Reverse Coding can be automated in the data clean-up phase • Use automated scoring algorithms • Never recode the original variables! Create new ones.

  23. QUESTIONNAIRES Data Entry • Quality Control Basics • Double Entry • Acceptable Value Ranges - at time of entry • Data Checks - after time of data entry • Review ranges of data for outliers & impossible values • Document how to treat outliers • Do you create an indicator variable? • Do you set value to missing for analysis? • Logical Checks (e.g. height 5’6” for a 5 year-old?!)

  24. DATA SHARING • General Guidelines for Sharing Data • Do you have IRB approval to share data? (If not, get it!) • Share data & password only with individuals who need to work directly with the data file • Encrypt emailed data files • Share password by phone or in person • Do NOT email the password • REMOVE identifiable information

More Related