1 / 25

Preparing Data for Analysis What to do BEFORE you Analyze Data

Preparing Data for Analysis What to do BEFORE you Analyze Data. MCRC Biostatistics Didactic Workshop Yona Keich Cloonan, PhD Senior Statistician Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Epidemiology Data Center, Department of Epidemiology

peyton
Download Presentation

Preparing Data for Analysis What to do BEFORE you Analyze Data

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Preparing Data for AnalysisWhat to do BEFORE you Analyze Data MCRC Biostatistics Didactic Workshop Yona Keich Cloonan, PhD Senior Statistician Multidisciplinary Clinical Research Center for Rheumatic and Musculoskeletal Diseases Epidemiology Data Center, Department of Epidemiology Graduate School of Public Health

  2. OUTLINE • What Not to Do • Data Storage • Documentation • Sharing Data • Consultation Services

  3. What Not to Do

  4. DATA STORAGE Data is like Laundry. Separate It. Identifiable Information Study Key Screening Database Study Data

  5. DATA STORAGE Data is like Laundry. Separate It. Identifiable Information Name Address Phone Number Fax Number • Stored as separate password-protected file • Accessed & used by limited number of study personnel • Password-protected • For example: • Contact Information is kept in a stand-alone database • Name, Parent name(s), Address, Phone number, Email address, Pediatrician • Needed only for recruitment and scheduling • Accessed only by recruitment and scheduling staff • NOT needed by analysts; NOT accessed by analysts Email Address Social Security # Health Plan # Medical Record # Account Number License Number Vehicle Identifier Device Identifier URL or IP Address Biometric Identifier Full Face Photos

  6. DATA STORAGE Data is like Laundry. Separate It. Study Key • Links Study IDs to identifiable data (e.g. medical record #, patient names) • Stored as a separate password-protected data file • Used by limited staff • Accessed on an as-needed basis • NOT accessed by analysts • The Study Key may be deleted after study completion • Permanently breaks link between identifiable information and study data • IRB protocol may specify timeline for destruction of study key • (e.g. 5 years after the final manuscript associated with the grant is completed)

  7. DATA STORAGE Data is like Laundry. Separate It. Screening Database • Stored as separate password-protected data file or database • Individuals are tracked using unique (non-identifiable) Screening IDs • Store information on eligibility • Track reasons for non-participation • Compare demographic characteristics of participants and non-participants • Keep track of your screening process to calculate participation rates. • This is often forgotten & creates a huge headache • Participation rates are needed for manuscripts (and grants)

  8. DATA STORAGE Data is like Laundry. Separate It. Screening Database • Why Bother? • Estimate Staffing Requirements • How many contact attempts are required to reach people? • How many people must you approach in order to enroll a study participant? • For instance, • For every 10 people you approach, you may screen only 5 • For every 3 people you screen, you may enroll 1 • Track & Improve Recruitment • Are any sites or staff particularly successful at recruiting? • Are any recruitment sites better able to reach specific demographics? • Are they using recruitment methods that other sites could implement? • Will you choose to drop certain sites that are less successful?

  9. DATA STORAGE Data is like Laundry. Separate It. Screening Database • Individuals are assigned Screening IDs • A Screening ID is NOT a Study ID • Screening ID: assigned to all individuals who are screened • Study ID: assigned only to individuals who are enrolled • 8 individuals were screened • Of these, 6 individuals were enrolled & assigned a Study ID • -888 is used for two individuals who were screened but NOT enrolled • -888 is a special missing value code defined as ‘n/a’

  10. DATA STORAGE Screening Database Flowchart Approached Contact Made Unable To Reach • The exact definition of an ‘approach’ is study-specific. • Examples include: • Mailing study invitations until individual calls in • Mailing introductory study invitation, followed by phone call(s) • Approaching individuals at a specific clinic to ascertain interest & complete screening and consent processes

  11. DATA STORAGE Screening Database Flowchart Approached Contact Made Unable To Reach • The max # of ‘approach attempts’ is also study-specific • For Example: • Non-response after 2 mailings and 3 follow-up phone messages defines ‘Unable to Reach’ • Due to limited resources, or • Due to concern that patients may feel harassed

  12. DATA STORAGE Screening Database Flowchart Basic Demographics Case Status Approached Age Sex Expanded Demographics Recruitment Site Case Status Contact Made Unable To Reach Age Sex Screened Not Screened Recruitment Site Race Eligible Ineligible Ethnicity Insurance Status Enrolled Not Enrolled

  13. DATA STORAGE Screening Database Flowchart Approached Reason  Not Interested Contact Made Unable To Reach  Time Constraints  Travel/Distance Screened Not Screened  Privacy/Confidentiality  Too Risky*  Unwilling to be Randomized* Eligible Ineligible  Other  Refused Enrolled Not Enrolled *if Clinical Trial

  14. DATA STORAGE Data is like Laundry. Separate It. Study Data • Questionnaire Items, Laboratory Values, Medical Abstraction Data • Stored as a separate password-protected database • Does NOT include identifiable information • Individuals are tracked by unique (non-identifiable) Study IDs

  15. DATA STORAGE Data File ‘Do’s • DO… • Include unique ID in every data file or spreadsheet (so you can link multiple files together) • Include variable names in the first row (i.e. header row) • Have 1 record per subject (for cross-sectional studies) • Have 1 record per visit per subject (if multiple visits) • Use missing value codes • Create numeric variables to identify groups (1 = case; 0 = control) • Use as little text as possible • Substitute text with numeric value codes • Take the time to set up a database ahead of time • Provide variable descriptions (i.e. data dictionary) • Password-Protect your data files

  16. DATA STORAGE Data File Don’ts • DON’T… • Include symbols or spaces in variable names ( _ is ok) • Leave blank cells • Insert spaces between groups of subjects • Add sub-headers to identify groups of subjects • Color-code (e.g. pink = female; blue = male) • Display descriptive statistics within your data file

  17. What Not to Do

  18. What Not to Do

  19. DOCUMENTATION Types of Documentation • Operations manuals • Protocols • Assignment of IDs • Questionnaire administration • Data collection procedures • Data entry guidelines • Recruitment scripts • Rules for file storage • Paper filing system • Server subdirectories • File naming conventions • Data dictionaries • Variable naming conventions • Scoring algorithms

  20. DOCUMENTATION Don’t wait until the end of the study! • Start BEFORE you collect data • Why bother? • To make sure… • you are gathering the information that you need • you do not have repeated items on different forms • question items are clear and easily administered • items are not missing from standardized measures • You’ll save time in the end

  21. DOCUMENTATION • Create a Data Dictionary • Assign Unique Variable Names • Assign numeric values to categorical data • Give description for each variable • Specify range of valid data values • Define missing value codes

  22. DOCUMENTATION Use Missing Value Codes • Indicate why an item is missing • Help identify questions which were accidentally skipped from those that were not completed for a valid reason • Example of codes used for null data and laboratory data • About Laboratory Values • For laboratory results that are below or above the level of detection, provide a numeric code • Avoid < or > signs (or other symbols) in numeric fields • If symbols are used, the item will be imported as text rather than as a number

  23. DATA SHARING • General Guidelines for Sharing Data • Do you have IRB approval to share data? (If not, get it!) • Share data & password only with individuals who need to work directly with the data file • Encrypt emailed data files • Share password by phone or in person • Do NOT email the password • REMOVE identifiable information

  24. CONSULTATION SERVICES • Consults Include… • Sample Size / Power Calculations • Data Management • Data Entry, Form Design, Database Development • Analysis • Analytic Plan, Statistical Analysis, Methods Section, Results Section

  25. CONSULTATION SERVICES • Consultation is… • Available to Faculty & Trainees • Provided by MCRC Methodology Core • Steven H Belle, PhD Director • Bob Boudreau, PhD Associate Director • Yona Keich Cloonan, PhD Senior Statistician • Sharon Lawlor, MBA Director of Data Management • Tamara Haller Data Manager To request services, please click on the link MCRC Consultation Request Form and follow the instructions. http://www.dom.pitt.edu/mcrc/request_form.aspx

More Related