530 likes | 610 Views
Educational Research 101: How to Manage Your Data and Prepare for the Statistical Consultation. Francis S. Nuthalapaty, MD H. Lee Higdon III, PhD. 2009 APGO Faculty Development Seminar. Case Study: The wrong way. Case Study: The wrong way.
E N D
Educational Research 101:How to Manage Your Data and Prepare for the Statistical Consultation Francis S. Nuthalapaty, MD H. Lee Higdon III, PhD 2009 APGO Faculty Development Seminar
Case Study: The wrong way • Statistician was consulted after the data had been collected. • Study question was not clearly defined. • Variables were not defined. • Data Dictionary was not developed. • Data were not cleaned/validated. • Result: a statistician that is asked to perform a miracle!
Case Study: Lesson Arrangements to consult with a statistician should be made before you start enrolling and collecting data on patients! In fact, they should be made before protocol development to prevent issues downstream.
Learning Objectives • Describe the continuum of data management • List data collection instruments / approaches • Understand how to create a data dictionary • Describe methods to validate data • Describe various data analytic tools • Describe how to decide on statistical approaches
Question Where does data management fit into the research process?
The Research Process • Question • Literature search • Objective / Hypothesis • Study design • IRB • Study conduct • Data analysis • Dissemination of results
Data Management Pearl “No study is better than the quality of its data” - Friedman “…get it right the first time” - Crerand
Analysis Steps in Data Management • Definition • Acquisition • Data Entry • Validation
Data Definitions • Identifying your data • Identifying your data types • Naming your data variables • Creating a data dictionary
Data Types Types of Variables Qualitative Quantitative Interval Nominal Ratio Ordinal
Data Variable Names • Make the name descriptive (easier to remember) • Keep it short (less than 10 characters) • User lower case • Avoid spaces – use “underscore” • Use numbers to indicate sequences
Data Variable Formats • Variable formats: • Numeric • String
Data Variable Values • Possible responses for a variable • Numeric format: • 0 = no / 1 = yes • String format: • a = no / b = yes
Note on Missing Values • What about variables with no response? • Leave it blank • Assign a period “.” • Assign a value (usually out of the expected response range) • Avoid text
Data Dictionaries / Code Books • Brings together all data elements: • Data types / formats • Variable names • Expected response values (range) • Comments • Self-generated vs. computer generated • “Rosetta Stone” for the database
Data Acquisition Pick the best method for the environment
Data Acquisition Methods • Interviews • Questionnaires • Assessments • MCQ examinations • OSCE / OSAT • Laboratory studies
Data Acquisition Environments • Observational encounters • Structured research encounters • Self-report
Data Acquisition Problems • Major types of data issues: • Missing data • Incorrect data • Excess variability
Data Acquisition Problems • Reasons for poor data quality: • Researcher-dependent data: • Insufficient time • Inadequate training • Lack of focus on study tasks • Poor communication • Protocol deviation
Data Acquisition Problems • Reasons for poor data quality: • Subject-dependent data: • Inadequate instruction • Poor comprehension • Sensitive or stigmatized behaviors
Data Acquisition Options • Paper forms • Direct entry • Computer assisted data acquisition
Advantages Controlled distribution and return Comments Double data entry Disadvantages Anonymity Manual quality checks Data entry time / errors Data Acquisition: Paper Forms
Data Acquisition: Direct Entry • Options: • MS Excel, MS Access • Epi Info – free on the web • Direct entry into statistical software • Pros / Cons: • No data transcription • Errors
Data Acquisition • Computer assisted data acquisition: • Automated data collection • OCR forms • Computer-based case report forms / questionnaires • Computer-assisted self-interviews • Mobile computing device diaries
Data Acquisition: CASI • Special Focus: Health Behaviors • Factors which may affect reporting: • Sensitive or stigmatized behaviors • Age discrepancy between participant and interviewer • Lack of privacy • Lack of comprehension of self-administered questionnaires
Data Acquisition: CASI • Computer-assisted self-interview (CASI): • Computer-based interview • Can incorporate audio, video, and text • Respondent listens to or reads questions on screen • Submits answers through keypad or touch screen
Data Acquisition: CASI • Benefits of CASI: • Interview conducted in privacy • Standardized interview • Computer controlled branching • Automated consistency and range checking • Multilingual administration
Analysis Steps in Data Management • Definition • Acquisition • Data Entry • Validation
Data Validation • Is all of the data present? • Are the responses within the expected range? • Does the data make sense?
Data Validation • Is all of the data present? • Visually examine the data cells • Frequencies
Data Validation • Are the responses within the expected range? • Frequencies • Maximum / minimum values • Descriptive statistics • Means • Standard deviations
Once the outlier is found, one can reference the chart for clarification
Data Distribution Definitions by SPSS 16.0
Who is Represented in the Data? • Sample test of proportions • Percent of gender • Percent of ethnicity • Sample test of means • Age • BMI • Does our data reflect the population at large or a subset?
Who is not? • Compare data of the included and excluded individuals • Are they similar for: • Age (continuous – Student t test) • BMI (continuous – Student t test) • Ethnicity (discrete/categorical – Chi-square test) • Gender (discrete/categorical – Chi-square test)
Analysis Steps in Data Management • Definition • Acquisition • Data Entry • Validation
Data Analysis • Choose the right tool for the job • Commonly used statistical tests: • If the data are normally distributed (i.e. bell-shaped curve) then we use parametricstatistical test • If the data are (1) not “bell-shaped”, or (2) have small sample sizes, generally less than 30 per group or (3) contain “outliners”, then we use nonparametric statistical tests.