1 / 126

Lecture 5 – Categorical Data and Survival Analyses

Lecture 5 – Categorical Data and Survival Analyses. OUTLINE. Definition Common CDA Descriptive summaries Tests of Association Modeling Extensions Other examples in CDA. What is Categorical Data Analysis?. Statistical analysis of data that are categorical

aggie
Download Presentation

Lecture 5 – Categorical Data and Survival Analyses

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Lecture 5 – Categorical Data and Survival Analyses

  2. OUTLINE • Definition • Common CDA • Descriptive summaries • Tests of Association • Modeling • Extensions • Other examples in CDA

  3. What is Categorical Data Analysis? • Statistical analysis of data that are categorical (cannot be summarized with mean +/- SD) • Includes dichotomous, ordinal, nominal outcomes • Examples: Disease prevalence, Discharge location, Treatment adherence (yes/no)

  4. Examples of Studies with CDA • MI after CABG • Diagnostic studies looking Sensitivity, Specificity of a new test/procedure • Discharge location after new surgical intervention.

  5. How to analyze words? • Order vs. no order • Breakdown mean +/- SD for two groups • Do the same: Breakdown Outcome %’s for two groups

  6. How to analyze words? • Comparing length of stay after CABG: • New Trt = 19.2 +/- 2.7 • SOC = 21.3 +/- 3.3 • Comparing prevalence of MI: • New Trt = 16% • SOC = 24% • Are these differences statistically significant? clinically significant?

  7. Choice of End Point • Some designs have a binary response variable • MI after 3 years • Overall Survival • Time to CVD • Time to recurrent MI • Can Dichotomize as 1 year rate (Yes/No)

  8. What is Categorical Data Analysis? • paper example

  9. Common CDA • Descriptive summaries • Tests for association • Modeling

  10. Descriptive summaries

  11. Let’s Talk Data…

  12. Descriptive Summaries in CDA Nominal – Categorical Data Measured in unordered categories Ordinal – Categorical Data Measured in orderedcategories Continuous – Quantitative Data Measured on a continuum (summarize with %’s) (summarize with %’s) summarize with many measures

  13. Types of Data Nominal – Categorical data measured in unorderedcategories Race Blood Type Ordinal – Categorical data measured in ordered categories Cancer Stages Socio-economic Status (low, medium, high) Continuous – Quantitative data measured on a continuum Serum Creatinine Height/Weight/BMI • Gender • Likert (unlikely, neutral, likely) • Diastolic Blood Pressure • Tumor measurements

  14. What the data might look like…

  15. Compare Categorical Outcomes between groups • How to assess if a predictor is associated with a categorical outcome? • Intuitive?: Get the %’s of the outcome prevalence within each predictor group. • Example: New Trt and MI. • New Trt response rate = 16% • SOC response rate = 24%

  16. Contingency Tables Group Group

  17. CDA Summary with Contingency Table • Research question Is there a relationship between Group and Attacked Heart? • Better to convert the table into percentages (easier to see)

  18. What the data might look like…

  19. Step 1. Breakdown the frequencies

  20. Step 2. Get the different %’s

  21. Row vs. Column %’s: It’s your choice • Row %’s: • 40% of New trt patients had MI vs. 80% of Old trt patients had MI • Col %’s: • 75% of No MI were in the New trt group vs. 33% of MI were in New trt group • P-value for test of association is the same!

  22. Tests for Association

  23. CDA tests for Association • Is there a significant association between Group and MI? • What is a good way to test for an association between the two?

  24. Test for significant differences • The most common tests are the Chi-square test and Fisher’s Exact test. • Research question: Is there an association between treatment group and MI? • To answer this: Compare what you would expect if there was noassociation to what you observed

  25. Expect if no relationship?

  26. Expect if no relationship?

  27. Same % with MI by Group

  28. Test for significant differences • Have exact same response % would favor “no association” • There is another general way to calculate what you “expect” • Use Row totals, Column totals, Grand total to calculate “Expected” frequencies

  29. Observed vs. Expected Frequencies • Observed frequencies = actual counts • “Expected” frequencies: = Row total x Column total / Grand total (why?)

  30. What you actually observed in Study

  31. “Expected” frequencies

  32. Chi-square test • Quantify if the actual frequencies are far enough away from the Expected (assuming no association) • We can quantify using the Chi-square test statistic • We can get the p-value to determine if there is a significant association.

  33. Chi-square test for association in RxC table • H0: There is no association between row and columns • The classic Pearson’s chi-squared test of independence • For a 2x2 table, df = (2-1) x (2-1) = 1 • Conservatively, we require expected ≥ 5 for all i, j

  34. Chi-square Test • Associated P-value for this Chi-square value is p=0.0098. • Thus, we conclude group and MI are significantly associated (given α = 0.05).

  35. “Expected” frequencies

  36. Fisher’s Exact Test • Fisher’s Exact test will test similar hypotheses as the Chi-square test. • Use Fisher’s Exact test when assumptions of Chi-square test are not satisfied. • That is, when you have Expected < 5 (basically implying when cell sample size is small).

  37. Confidence Intervals for %’s

  38. Confidence Interval for %’s You conduct your follow-up after CABG study and accrue 40 patients. After 3 years 20 out of all 40 patients have had a MI. Q1. What is your best guess at the true (population) MI rate at 3 years? A. Based on your sample, 20/40 = 50%

  39. Sampling Variability MI at 3 yrs = ? Inference MI = 50% Sample Population

  40. Sampling Variability MI at 3 yrs = ? Inference MI = 44% Sample Population

  41. Confidence Interval for %’s A good way to make inference about what the range of plausible values of the population % is to calculate a Confidence Interval (CI). Q2. How much precision do you have in terms of estimating the MI rate at 3 yrs. in the population based on your sample?

  42. 95% Confidence Intervals • 95% Confidence Interval for Mean: • 95% Confidence Interval for Proportion (Standard “Wald” CI):

  43. Confidence Interval for %’s Q2. How much precision do you have in terms of estimating the MI rate in the population based on your sample? (Remember, 20 of 40 total had MI) A. A 95% Wilson CI for population MI rate is (35.2%, 64.8%). Thus, if we have repeated our study over and over again, each time drawing a sample of 40 patients, then the true population MI rate at 3 yrs. would be between 35.2% and 64.8% approximately 95% of the time.

  44. Confidence Interval for %’s What’s interesting is that there are “lucky” and “unlucky” combinations of p (response rate) and N (sample size) That is, for a given sample size: * for some p you may higher ability to make inference * for some p you may have less ability! Not to scald the Wald, but not all CI’s are created equal Paper

  45. Modeling in CDA

  46. Modeling in CDA • Modeling is done with variations of Logistic Regression: • Dichotomous • Ordinal (Proportional odds) • Nominal (Generalized logit) • Conditional (Matched-pairs) • Exact (small sample size/rare outcome) • Longitudinal (GEE, GLMM) • Simple (1 predictor) vs. Multivariable (>1 predictor/adjusted)

  47. Why use adjusted analysis? • Do you think patient demographics or clinical characteristics at baseline would affect MI? • What if half of the patients are all <30 yrs. old and half are all >80 yrs. old? • What are some possible confounders of response? Effect modifiers? • These are testable in adjusted analyses.

  48. You may not need adjusted. • Typically have well-defined specific patient populations of interest. • Thus, inclusion/exclusion criteria might have removed variability from potential confounders • A well designed, well executed trial usually does not require intensive and complex analysis.

  49. What is Logistic Regression? • In a nutshell: A statistical method used to model dichotomous or binary outcomes (but not limited to) using predictor variables. Used when the research method is focused on whether or not an event occurred, rather than when it occurred (time course information is not used).

More Related