1 / 58

Katheryne Downes, MPH Statistical Data Analyst/Research Specialist Office of Clinical Research/GME

Katheryne Downes, MPH Statistical Data Analyst/Research Specialist Office of Clinical Research/GME. A Little Study Design Terminology- Descriptive Studies. Case Study : Single patient is reviewed in detail. Case Series : Similar to above- just expand the number to a small handful.

lonato
Download Presentation

Katheryne Downes, MPH Statistical Data Analyst/Research Specialist Office of Clinical Research/GME

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Katheryne Downes, MPH Statistical Data Analyst/Research Specialist Office of Clinical Research/GME

  2. A Little Study Design Terminology- Descriptive Studies • Case Study: Single patient is reviewed in detail. • Case Series: Similar to above- just expand the number to a small handful. • *Ecological Studies: Describes what’s going on at the population (summary) level. All data are collected at the same time- no individual data are collected. • *Cross-Sectional Studies: Similar in many ways to ecological, but examines individual level data instead of population level. *These studies have potential for some weak analytic statistics.

  3. A Little Study Design Terminology- Analytic Studies **Cohort Studies: This study identifies people on their exposure status (yes/no) and follows them to determine if they developed the outcome (yes/no). Great study for unusual or rare exposures. a.Retrospective: b. Prospective: Past… E ------ O? Present… E ------ O? *Cohort studies are sometimes used for purely descriptive purposes when we aren’t sure what phenomenon may occur.

  4. A Little Study Design Terminology- Analytic Studies 2. Case-Control Studies: identify subjects by their disease/outcome status and then look backward to determine if they had the exposure of interest. 3. Randomized-Controlled Trials: Ah, yes… The Golden Child of research. Expose------------------------ Outcome? Randomize-- Don’t Expose ---------------- Outcome? E?----O

  5. Study Design Quiz A young epidemiologist (& budding statistician!) was assigned to investigate an outbreak of an unusual fungus in the lungs of patients undergoing bronchoscopy. There’s about 15 patients and she’ll need to do a thorough review of the patient’s records to gather information to determine how these events may have taken place. (She’ll eventually spend DAYS in the medical record department and countless hours crawling through ventilation duct work and the hospital roof…but that’s another story…) Is it: A: A prospective cohort study B: A case-control Study C: A cross-sectional study D: A case-series E: A study on crazy epidemiologists Why did you select your answer?

  6. Recap: 15 patients underwent bronchoscopy and ended up with really weird fungus growing in their lungs. In-depth review of charts. A: A prospective cohort study- NO. We need both exposed & unexposed groups for B: A case-control Study NO. We’d need both disease AND no disease groups. C: A cross-sectional study NO.We’d need everyone that underwent bronchoscopy D: A case-seriesYES!! E: A prospective study on crazy epidemiology interns NO. It would be a case study on crazy epidemiology interns.

  7. Another Example… A group of researchers is interested in whether VAP is associated with the use of a particular tube type. They begin by identifying all patients diagnosed with VAP in 2007 and also identify a similar group that did NOT develop VAP. They then look at the frequency of tube types among these two groups. Is it: A: A retrospective cohort Study B: A case-control study C: A prospective cohort study Why did you select your answer?

  8. Recap: IS VAP associated with a certain ET tube type? Start by looking at patients with and without VAP…then look at their tube type. A: A retrospective cohort NO.Pts need to be identified by exposure status in a cohort study. B: A case-control studyYES!! D: A prospective cohort study NO.Again, pts would be identified by exposure status

  9. Almost done! (with this section, anyway) A research group is interested in the impact of methadone use during pregnancy on baby outcomes. They have decided to follow a large group of pregnant women classified as either methadone users or non-users and will later gather information on GA, birthweight, Apgar Scores, etc. Is it: A: A prospective cohort study B: A case series C: An ecological study D: A case-control study Why did you choose your answer?

  10. Recap: LOTS of pregnant women- some are on drugs. What happens to all the babies? A: A prospective cohort studyYES!!! B: A case series NO.This is for small groups, unusual phenomenon. Maybe a small group on extremely high doses? C: An ecological study NO.We have individual level data here and we have a timeline. D: A case-control study NO.That’s identification by outcome status- We’re identifying on exposure status.

  11. We’ve now made it to…The Stats Section! Questions So Far?

  12. Basic Stats: Data Types *Data Types* Categorical: the data have “categories” instead of numeric values. (ex: male/female, disease/no disease, red/orange/yellow) • Dichotomous: Categorical variable with only two possible categories. Continuous: this means the variable can take on a range of possible values. (weight, bp, height, etc)

  13. Categorical and Continuous Data Remember… Categorical data: yes/no, male/female, disease/no disease Continuous data: weight, height, scores, blood values, etc.

  14. Drill! • BMI • Disease (Yes/No) • Temperature • Test Score (1-10) Continuous Categorical Continuous Continuous

  15. Drill! • Test (positive/negative) • Height • Survival (months) • Gender Categorical Continuous Continuous Categorical

  16. Descriptive Statistics

  17. Describing the data… Data: 2, 7, 7, 8, 9, 11, 15 Mode: most frequently occurring number (7) Mean: average (9) Median: put numbers in order, middle number or average of two middle numbers (8) -AKA: 50th Percentile.

  18. Drill! Data: 1, 1, 3, 5, 6 Mean, Median, Mode? 3.2 3 1

  19. Basic Stats: Descriptive Stats for continuous data • N, or n: We need to know how many people were in the sample. Results drawn from a sample with n=5 aren’t very likely to be reliable. However, a sample of n=100 will make you feel a little more comfortable. • Central tendency: Mean, median, mode • Variation: Standard deviation, variance, standard error

  20. Descriptive stats: Continuous Normally distributed? Normal: mean, SD Not Normal: median, range or 95% CI * Special Case: Survival data are usually described with median and confidence interval.

  21. The Empirical Rule How do we know if a distribution is “normal”?? -Visual Inspection (boxplots are very helpful) -Kolmogorov-Smirnov (sorry, no vodka involved) -Other tests

  22. Basic Descriptive Stats for Categorical Data • Remember- you can’t take an average of yes/no (maybe?). (well, some people have tried to put that in papers…) • So, how do we describe categorical data? • N, or n • Frequencies • Percentages

  23. Question: Best approximation of the actual value for non-normally distributed data? A: mean +/- standard error of the mean B: median +/- standard deviation C: median +/- confidence interval

  24. Before we move onto statistical tests- some basic terminology… Independent Variable: a predictor, a variable of interest Dependent Variable: the thing you’re trying to predict or the outcome of interest Ex: I’m conducting a study to determine whether administering antibiotic “x” approximately 12hrs before surgery reduces post-operative infection rates. What’s the independent variable? What’s the dependent variable?

  25. Another Example… Does serum albumin level pre-surgery affect the 90 day survival of patients receiving an LVAD? What’s the dependent variable? What’s the independent variable?

  26. Statistical Tests: Continuous (Student’s) T-test: compares 2 groups on a continuous variable Paired t-test: compares 1 group, before and after on continuous variable ANOVA: Compares 3+ groups on a continuous variable *Post-hoc tests REQUIRED*

  27. How the Guinness Brewery Changed History…

  28. Statistical Tests: Non-Parametric (the rebels!) Mann-Whitney U: compares 2 groups on a continuous variable (non-parametric version of t-test) Wilcoxon Signed Ranks: compares 1 group, before and after on continuous variable (non-parametric version of paired t-test) Kruskal-Wallis: Compares 3+ groups on a continuous variable (non-parametric version of ANOVA) *Post-hoc tests REQUIRED*

  29. Statistical tests: Categorical Chi-Square: used with categorical data with expected cell values 5+ McNemar: paired proportions Fisher Exact: categorical data with expected cell values <5.

  30. Details, Details…. If all cell values are “5” or higher, you can use the Chi-Square. If you have at least one cell with a value of “4” or lower, you should use the Fisher Exact test. ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Q: Umm, what’s a “cell?” Phospholipid bi-layers!?!? A: In this case, a cell refers to this the compartments of this 2x2 table--- > Group 1 Group 2 Males Females

  31. Analytic Statistics for Categorical variables Q: But what about normality & that crazy Kolmo-whatchamacallit vodka test??!?! A: The tests for categorical variables don’t have any normality assumptions built in so your data can look as crazy as can be and you will be fine!

  32. Drill ! A study is being conducted to evaluate the effectiveness of a new diet pill. There are two groups- one is receiving a placebo, the other the experimental drug. The outcome is BMI and is assumed to be normally distributed. What type of data? How would you summarize the data? What type of statistical test would you run? Continuous Mean +/- SD Two group t-test

  33. Other Stats: Relative Risk/Odds Ratios Relative Risk: Used in cohort studies when you have the incidence. (IR in exposed/IR in unexposed) (a/a+b) / (c/c+d) Odds Ratio: Used in case-control studies to approximate relative risk. (ad/bc) D ND E NE

  34. Drill ! A group of patients are identified based off their exposure status to the H1N1 vaccine. They are being followed to determine whether they successfully develop antibodies to the novel virus. Q: What type of study is this? Q: What type of data does the outcome represent? Q: Name two statistical tests that could be used to evaluate this association. A: Prospective Cohort Study A: Categorical (yes/no for outcome) A: Chi-square (or fisher exact) and Relative Risk

  35. Drill continued… So, here’s the data: Calculate the RR !!! 90/100 10/50 RR = 4.5

  36. Drill ! A group of patients are classified based on whether they have stomach cancer or not. They are then asked questions about their hot pepper consumption habits in the past 5 years (high consumption vs. low consumption). Q: What type of study is this? Q: What type of statistics could be used to evaluate the association? A: Case-Control Study A: Chi-Square/Fisher Exact or Odds Ratio

  37. Drill continued… So, here’s the data: Calculate the OR !!! 90*40 10*10 OR = 36

  38. Regression Analysis Regression Analysis? • Everything we’ve look at so far is termed “univariate analysis” – meaning, we just look at the effect of ONE variable at a time, but what if there’s a lot of different risk factors? What if they interact with each other? • Regression analysis is used when we want to look at the complex interaction between different predictive variables on the outcome of interest. This analysis allows us to determine the effect of each variable on the outcome when ALL the others are controlled.

  39. Regression Analysis… Regression Type is based on OUTCOME type (not predictor variables) • Two Basic Types • LOGISTIC Regression: Outcome is “dichotomous” • LINEAR Regression: Outcome is “continuous” • In both types of regression, you can enter BOTH continuous and categorical predictors.

  40. Hypothesis Testing… • Null hypothesis: assumes that all the groups will behave similarly- no meaningful differences. • Alternate hypothesis: There IS a difference • One-sided: Group A is better than B • Two-sided: Group A is different than B Note: This is the main type of hypothesis testing. There are some variations in which logic is flipped on it’s head: equivalence testing & non-inferiority testing are just two of them…

  41. Hypothesis Testing

  42. Hypothesis Testing Type I Error: Incorrectly reject the null, alpha (0.05 or 0.01) Type II Error: Incorrectly fail to reject the null, beta (1-beta = power) (power = 80%) • 1: Sample size too small !!! • 2: Observed difference was smaller than specified difference P-value: probability of observing the event if it occurred by chance.

  43. Drill ! Large randomized multicenter trial where no difference is seen. Why? A: Too strict inclusion criterion B: Too different populations because of different centers C: The clinical difference is smaller than the expected difference

  44. Hypothesis Testing: 95% CI 95% CI: provides an estimate of the true value. In hypothesis testing, we’re looking for a certain value in the interval that corresponds to the null… Sooooo….in Relative Risk or Odds Ratios, we’re looking at the ratio of risks for two groups. Q: If the risk is the same between the two groups, the ratio = ? Q: What value are we looking for in the associated 95% CI? A: Yes, we’re looking for the value of “1” If that value is in the confidence interval, than “no difference” is in the range of true values and the result wouldn’t be significant.

  45. Hypothesis Testing: 95% CI What about a paired t-test? Q: What type of data is the test used for? Q: What’s the null value in this case? Q: So, what value are we looking for in the CI? A: Remember, this is generally used for before/after tests. So, if before = after, then after - before = 0. Therefore, we’re looking for a value of “0” in the CI. If we find it, the result is considered non-significant.

  46. Trials and Studies RCT: Reduces bias, evens distribution of confounding factors, but sometimes can’t be used. Double Blind: doctor/patient don’t know what the patient is getting. Reduces observational bias. Cohort Study: Patients identified by exposure status and followed for outcome Case-Control Study: Patients identified by outcome status (case or control) and look back for exposures.

  47. Drill ! For a cohort study, what type of ratio can be calculated? A: Relative Risk For a case-control study, what type of ratio can be calculated? A: Odds Ratio

  48. Drill ! What’s the formula for Relative Risk? A: IR in exposed/IR in unexposed What’s the formula for Odds Ratio? A: ad/bc

  49. Misc… Meta-Analysis: combines the data from several different studies. Often used when individual sample sizes are too small and underpowered. Be careful when the studies are too different from each other. Prevalence: # of current cases/total population Incidence: # of new cases/total population at risk

  50. Test Diagnostics Sensitivity: positive/ all diseased (a/a+c)= 90% Specificity: negative/all not diseased (d/b+d) = 90% PPV: diseased/all positive (a/a+b) = 50% NPV: no disease/all negative (d/c+d) = 98.8% Accuracy: correct results/all (a+d/ a+b+c+d)= 90%

More Related