370 likes | 496 Views
Data Analysis and Surveying 101: Basic research methods and biostatistics as they apply to the. Theresa Jackson Hughes, MPH American College Health Association December 2006. What we will cover today. Research Methods Sampling Frame and Sampling Generalizability Bias
E N D
Data Analysis and Surveying 101:Basic research methods and biostatistics as they apply to the Theresa Jackson Hughes, MPH American College Health Association December 2006
What we will cover today • Research Methods • Sampling Frame and Sampling • Generalizability • Bias • Reliability and Validity • Levels of measurement • Biostatistics • Statistical significance • Other key terms • Appropriate statistical tests • Fun examples from the Spring 2005 dataset! Get excited! It’s data time!!!
“To do successful research, you don't need to know everything, you just need to know of one thing that isn't known.” • Arthur Schawlow • “That's the nature of research - you don't know what in hell you're doing.” • Harold "Doc" Edgerton • “If we knew what it was we were doing, it would not be called research, would it?” • Albert Einstein
What exactly is research? • “Scientific research is systematic, controlled, empirical, and critical investigation of natural phenomena guided by theory and hypotheses about the presumed relations among such phenomena.” • Kerlinger, 1986 • Research is an organized and systematic way of finding answers to questions
Important Components of Empirical Research • Problem statement, research questions, purposes, benefits • Theory, assumptions, background literature • Variables and hypotheses • Operational definitions and measurement • Research design and methodology • Instrumentation, sampling • Data analysis • Conclusions, interpretations, recommendations
Sampling • What is your population of interest? • To whom do you want to generalize your results? • All students (18 and over) • Undergraduates only • Greeks • Athletes • Other • Can you sample the entire population?
Sampling • A sample is “a smaller (but hopefully representative) collection of units from a population used to determine truths about that population” (Field, 2005) • Why sample? • Resources (time, money) and workload • Gives results with known accuracy that can be calculated mathematically • The sampling frame is the list from which the potential respondents are drawn • Registrar’s office • Class rosters • Must assess sampling frame errors
Types of Samples • Probability (Random) Samples • Simple random sample • Systematic random sample • Stratified random sample • Proportionate • Disproportionate • Cluster sample • Non-Probability Samples • Convenience sample • Purposive sample • Quota
Sample Size • Depends on expected response rate • Average 85% for paper • FINAL SAMPLE DESIRED / .85 = SAMPLE • Average 25% for web • FINAL SAMPLE DESIRED / .25 = SAMPLE
Bias and Error • Systematic Error or Bias: unknown or unacknowledged error created during the design, measurement, sampling, procedure, or choice of problem studied • Error tends to go in one direction • Examples: Selection, Recall, Social desirability • Random • Unrelated to true measures • Example: Momentary fatigue
Reliability and Validity • Reliability • The extent to which a test is repeatable and yields consistent scores • Affected by random error/bias • Validity • The extent to which a test measures what it is supposed to measure • A subjective judgment made on the basis of experience and empirical indicators • Asks "Is the test measuring what you think it’s measuring?“ • Affected by systematic error/bias
Reliability vs. Validity • In order to be valid, a test must be reliable; but reliability does not guarantee validity.
Nominal Gender Male, Female Vaccinations Yes, No, Unsure Ordinal Personal health status Excellent, Very good, Good, Fair, Poor Last 30 days Never used, Not in last 30 days, 1-2 days, 3-5 days, 6-9 days, 10-19 days, 20-29 days, All 30 days Interval Body Mass Index (BMI) Ratio Number of drinks Number of sexual partners Perception percentages Blood alcohol concentration (BAC) Levels of Measurement
“It is commonly believed that anyone who tabulates numbers is a statistician. This is like believing that anyone who owns a scalpel is a surgeon.” • R. Hooke • “Torture numbers, and they'll confess to anything.” • Gregg Easterbrook • “98% of all statistics are made up.” • Author Unknown
Types of Statistics • Descriptive statistics • Describe the basic features of data in a study • Provide summaries about the sample and measures • Inferential statistics • Investigate questions, models, and hypotheses • Infer population characteristics based on sample • Make judgments about what we observe
Descriptive Statistics • Mode • Median • Mean • Central Tendency • Variation • Range • Variance • Standard Deviation • Frequency
Descriptive Statistics Examples • Categorical Variables (Nominal/Ordinal)
Descriptive Statistics Examples • Categorical Variables (Nominal/Ordinal)
Descriptive Statistics Examples • Continuous Variables (Interval/Ratio)
Hypotheses • Null hypotheses • Presumed true until statistical evidence in the form of a hypothesis test indicates otherwise • There is no effect/relationship • There is no difference in means • Alternative hypotheses • Tested using inferential statistics • There is an effect/relationship • There is a difference in means
Alpha, Beta, Power, Effect Size • Alpha – probability of making a Type I error • Reject null when null is true • Level of significance, p value • Beta – probability of making a Type II error • Fail to reject null when null is false • Power – probability of correctly rejecting null • 1 – Beta • Effect Size • Measure of the strength of the relationship between two variables
Test of the mean of one continuous variable • College students report drinking an average of 5 drinks the last time they “partied”/socialized • Hypotheses • Ho: µ = 5 • HA: µ ≠ 5 • Test: Two-tailed t-test • Result: Reject null
Test of a single proportionof one categorical variable • 20% of college students report their health is excellent • Hypotheses • Ho: p = 20 • HA: p ≠ 20 (one-tailed) • Test: Z-test for a single proportion • Result: Reject null
Test of a relationship between two continuous variables • There is a relationship between the number of drinks students report drinking the last time they drank and the number of sex partners they have had within the last school year • Hypotheses • Ho: ρ = 0 • HA: ρ ≠ 0 • Test: Pearson Product Moment Correlation • Result: Reject null
Test of the difference between two means • Men and women report significantly different numbers of sexual partners over the past 12 months • Hypotheses • µ1 = µ2 • µ1 ≠ µ2 • Test: Independent Samples t-test OR One-way ANOVA • Result: Reject null
Test of the difference between two or more means • Mean BAC reported differs across student residences • Hypotheses • µ1 = µ2 = µ3 =µ4 = µ5 = µ6 • µi ≠ µj for at least one pair i, j • Test: One-way ANOVA • Result: Reject null
Test for a relationshipbetween two categorical variables • Is there an association between being a member of a fraternity/sorority and ever being diagnosed with depression? • Hypotheses • Ho: There is no association between being a member of a fraternity/sorority and ever being diagnosed with depression. • HA: There is an association between being a member of a fraternity/sorority and ever being diagnosed with depression. • Test: Chi-square test for independence • Result: Fail to reject null
Important Points to Remember • An significant association does not indicate causation • Statistical significance is not always the same as practical significance • Multiple factors contribute to whether your results are significant • It gets easier and easier as you practice!