1 / 50

Applications of Statistics in Research

Applications of Statistics in Research. Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University. Begin at the conclusion. Type of the study outcome: Key for selecting appropriate statistical methods. Study outcome

renef
Download Presentation

Applications of Statistics in Research

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Applications of Statistics in Research Bandit Thinkhamrop, Ph.D.(Statistics) Department of Biostatistics and Demography Faculty of Public Health Khon Kaen University

  2. Begin at the conclusion

  3. Type of the study outcome: Key for selecting appropriate statistical methods • Study outcome • Dependent variable or response variable • Focus on primary study outcome if there are more • Type of the study outcome • Continuous • Categorical (dichotomous, polytomous, ordinal) • Numerical (Poisson) count • Even-free duration

  4. Continuous outcome • Primary target of estimation: • Mean (SD) • Median (Min:Max) • Correlation coefficient: r and ICC • Modeling: • Linear regression The model coefficient = Mean difference • Quantile regression The model coefficient = Median difference • Example: • Outcome = Weight, BP, score of ?, level of ?, etc. • RQ: Factors affecting birth weight

  5. Categorical outcome • Primary target of estimation : • Proportion or Risk • Modeling: • Logistic regression The model coefficient = Odds ratio(OR) • Example: • Outcome = Disease (y/n), Dead(y/n), cured(y/n), etc. • RQ: Factors affecting low birth weight

  6. Numerical (Poisson) count outcome • Primary target of estimation : • Incidence rate (e.g., rate per person time) • Modeling: • Poisson regression The model coefficient = Incidence rate ratio (IRR) • Example: • Outcome = Total number of falls Total time at risk of falling • RQ: Factors affecting tooth elderly fall

  7. Event-free duration outcome • Primary target of estimation : • Median survival time • Modeling: • Cox regression The model coefficient = Hazard ratio (HR) • Example: • Outcome = Overall survival, disease-free survival, progression-free survival, etc. • RQ: Factors affecting survival

  8. Continuous Categorical Count Survival Mean Median Proportion (Prevalence Or Risk) Rate per “space” Median survival Risk of events at T(t) Linear Reg. Logistic Reg. Poisson Reg. Cox Reg. The outcome determine statistics

  9. Parameter estimation [95%CI] Hypothesis testing [P-value] Statistics quantify errors for judgments

  10. Parameter estimation [95%CI] Hypothesis testing [P-value] Statistics quantify errors for judgments 7

  11. Selection bias Information bias Confounding bias • Research Design • Prevent them • Minimize them Caution about biases

  12. If data available: SB & IB can be assessed CB can be adjusted using multivariable analysis Caution about biases Selection bias (SB) Information bias (IB) Confounding bias (CB)

  13. Generate a mock data set • General format of the data layout

  14. Mean (SD) Generate a mock data set • Continuous outcome example

  15. Common types of the statistical goals • Single measurements (no comparison) • Difference (compared by subtraction) • Ratio (compared by division) • Prediction (diagnostic test or predictive model) • Correlation (examine a joint distribution) • Agreement (examine concordance or similarity between pairs of observations)

  16. Continuous Categorical Count Survival Answer the research question based on lower or upper limit of the CI Back to the conclusion Appropriate statistical methods Mean Median Proportion (Prevalence or Risk) Rate per “space” Median survival Risk of events at T(t) Magnitude of effect 95% CI P-value

  17. Always report the magnitude of effect and its confidence interval • Absolute effects: • Mean, Mean difference • Proportion or prevalence, Rate or risk, Rate or Risk difference • Median survival time • Relative effects: • Relative risk, Rate ratio, Hazard ratio • Odds ratio • Other magnitude of effects: • Correlation coefficient(r), Intra-class correlation (ICC) • Kappa • Diagnostic performance • Etc.

  18. 2+2+0+2+14 = 20 2+2+0+2+14 = 20 = 4 5 5 0 2 2 2 14 Variance = SD2 X X X Standard deviation = SD Touch the variability (uncertainty) to understand statistical inference

  19. Measure of central tendency X X X Measure of variation Touch the variability (uncertainty) to understand statistical inference

  20. Degree of freedom Standard deviation (SD) = The average distant between each data item to their mean

  21. Same mean BUT different variation Heterogeneous data Skew distribution Heterogeneous data Symmetry distribution Homogeneous data Symmetry distribution

  22. Facts about Variation • Because of variability, repeated samples will NOT obtain the same statistic such as mean or proportion: • Statistics varies from study to study because of the role of chance • Hard to believe that the statistic is the parameter • Thus we need statistical inference to estimate the parameter based on the statistics obtained from a study • Data varied widely = heterogeneous data • Heterogeneous data requires large sample size to achieve a conclusive finding

  23. The Histogram

  24. The Frequency Curve

  25. Area Under The Frequency Curve

  26. Right Skew X1 X2 X3 X1 XX Xn Symmetry Left Skew Normally distributed  Central Limit Theorem

  27. X1 X2 X3 X1 XX Xn Distribution of the raw data Distribution of thesampling mean Central Limit Theorem 

  28. X1 XX Xn Distribution of thesampling mean  Large sample (Theoretical) Normal Distribution Central Limit Theorem Distribution of the raw data

  29. X1 XX Xn X Many , , SE X XX  Large sample Standard deviation of the sampling mean Standard error (SE) Estimated by Standardized for whatever n, Mean = 0, Standard deviation = 1 SE = SD n  Central Limit Theorem Many X, , SD

  30. (Theoretical) Normal Distribution

  31. (Theoretical) Normal Distribution

  32. 99.73% of AUC Mean ± 3SD

  33. 95.45% of AUC Mean ± 2SD

  34. 68.26% of AUC Mean ± 1SD

  35. n = 25 X = 52 SD = 5 Population Parameter estimation [95%CI] Hypothesis testing [P-value] Sample

  36. 5 = 1 5 Z = 2.58 Z = 1.96 Z = 1.64

  37. n = 25 X = 52 SD = 5 SE = 1 Sample Z = 2.58 Z = 1.96 Z = 1.64 Population Parameter estimation [95%CI] : 52-1.96(1) to 52+1.96(1) 50.04 to 53.96 We are 95% confidence that the population mean would lie between 50.04 and 53.96

  38. n = 25 X = 52 SD = 5 SE = 1 Hypothesis testing Z = 55 – 52 1 3 Sample Population H0 :  = 55 HA :  55

  39. Z = 55 – 52 1 52 55 -3SE +3SE Hypothesis testing H0 :  = 55 HA :  55 If the true mean in the population is 55, chance to obtain a sample mean of 52 or more extreme is 0.0027. 3 P-value = 1-0.9973 = 0.0027

  40. P-value is the magnitude of chance NOT magnitude of effect • P-value < 0.05 = Significant findings • Small chance of being wrong in rejecting the null hypothesis • If in fact there is no [effect], it is unlikely to get the [effect] = [magnitude of effect] or more extreme • Significance DOES NOT MEAN importance • Any extra-large studies can give a very small P-value even if the [magnitude of effect] is very small

  41. P-value is the magnitude of chance NOT magnitude of effect • P-value > 0.05 = Non-significant findings • High chance of being wrong in rejecting the null hypothesis • If in fact there is no [effect], the [effect] = [magnitude of effect] or more extreme can be occurred chance. • Non-significance DOES NOT MEAN no difference, equal, or no association • Any small studies can give a very large P-value even if the [magnitude of effect] is very large

  42. P-value vs. 95%CI(1) An example of a study with dichotomous outcome A study compared cure rate between Drug A and Drug B Setting: Drug A = Alternative treatment Drug B = Conventional treatment Results: Drug A: n1 = 50, Pa = 80% Drug B: n2 = 50, Pb = 50% Pa-Pb = 30%(95%CI: 26% to 34%; P=0.001)

  43. P-value vs. 95%CI(2) Pa > Pb Pb > Pa Pa-Pb = 30% (95%CI: 26% to 34%; P< 0.05)

  44. P-value vs. 95%CI(3) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99

  45. Tips #6 (b)P-value vs. 95%CI(4) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 There were statistically significant different between the two groups.

  46. Tips #6 (b)P-value vs. 95%CI(5) Adapted from: Armitage, P. and Berry, G. Statistical methods in medical research. 3rd edition. Blackwell Scientific Publications, Oxford. 1994. page 99 There were no statistically significant different between the two groups.

  47. P-value vs. 95%CI(4) • Save tips: • Always report 95%CI with p-value, NOT report solely p-value • Always interpret based on the lower or upper limit of the confidence interval, p-value can be an optional • Never interpret p-value > 0.05 as an indication of no difference or no association, only the CI can provide this message.

  48. Q & A Thank you

More Related