1 / 59

Biostatistics: Common Methods and Interpretation in Clinical Medicine

This course covers the basic concepts and common statistical methods in biostatistics, with a focus on their interpretation in clinical medicine. Topics include error checking, data transformation, descriptive analysis, hypothesis testing, and more.

ericksonb
Download Presentation

Biostatistics: Common Methods and Interpretation in Clinical Medicine

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. 臨床醫學常用統計方法 郭浩然 成功大學醫學院環境醫學研究所 成大醫院職業及環境醫學部

  2. Course Contents • Basic concepts in biostatistics. • Common statistical methods. • Interpretation of statistical tests.

  3. What is statistics? • Statistics The science whereby inference are made about specific random phenomenon on the basis of relatively limited material. • Mathematical statistics. • Applied statistics.

  4. What is biostatistics? The branch of applied statistics that concerns the application of statistical methods to medical and biological problems. • Figures (numbers). • Data.

  5. Evaluation of the Vita-Stat • Vita-Stat is an automatic blood pressure measuring device • Origin of the problem: A friend of Dr. Rosner has his diastolic pressure measured as 115 mm Hg several times and even up to 130 mm Hg by a machine. • A measurement at a clinic was 90 mm Hg.

  6. Basic Steps of Data Analysis • Error checking. • Data transformation. • Descriptive analysis. • Hypothesis testing. • Univariate analysis. • Multi-variate analysis.

  7. Error Checking • Errors. • Measurement error. • Transcription error. • Coding error. • Key-in error. • Errors Checking. • Frequency table or plot. • Range check. • Logic check. • Repeat measurement.

  8. Variables • Dichotomous data. • Based on count data. • “Yes” vs. “no” in most cases. • Categorical data. • Nominal data. • Ordinal data. • Continuous data.

  9. Data Transformation • Continuous to categorical. • Related to hypothesis. • Natural or biological scale. • Convenience scale. • Equal participants. • Log transformation.

  10. Descriptive Statistics • Measurement of central location. • Mean. • Median. • Mode. • Measurement of spread. • Standard deviation. • Range. • Q1 to Q3. • Coefficient of variation.

  11. Coefficient of Variation 100% x (s/mean) • Relate the arithmetic mean and the standard deviation. • No unit.

  12. Variance and Standard Deviation • Variance: The sum of the squares of the deviations divided by (n-1). • Standard deviation: The square root of variance. • Standard error?

  13. Measurements • Frequency. • Effect. • Association. • Stability.

  14. 頻率(frequency)的指標 • 盛行率(prevalence) • 發病率(incidence) • 累積發病率(cumulative incidence) • 死亡率(mortality) • 致死率(case fatality) • 存活率(survival) • 比例(ratio)

  15. Measurements of Effect • Absolute risk. • Risk difference. • Regression coefficient of a linear model. • Relative risk. • Rate ratio • Odds ratio. • Risk: cumulative incidence

  16. Fourfold (Two by Two) Table • The ultimate form of data in an analytical epidemiologic study. • By convention, the columns indicate the outcome status and the rows indicate the exposure status. disease (outcome) PT exposure yes no total yes a b a+b PTe no c d c+d PTu a+c b+d T(a+b+c+d)

  17. 世代研究 outcome (disease) PT exposure yes no total yes a b a+b PTe no c d c+d PTu a+c b+d a+b+c+d • (incidence) rate difference= a/PTe – c/Ptu • risk difference=a/(a+b) – c/(c+d) • risk ratio=a(a+b)/c(c+d) • (incidence) rate ratio= aPTe/cPTu

  18. 病例對照研究 disease exposure case control total yes a b a+b no c d c+d a+c b+d a+b+c+d • odds ratio=ac/bd=ad/bc

  19. 勝算比(odds ratio) disease exposure case control yes a b no c d • odds = p/ (1-p) • exposure odds in cases=a(a+c)/c(a+c)=ac • exposure odds in controls=bd • exposure odds ratio=ac/bd=ad/bc

  20. 橫斷式研究 outcome (disease) exposure yes no total yes a b a+b no c d c+d a+c b+d a+b+c+d • prevalence rate ratio=a(a+b)/c(c+d) • odds ratio= ad/bc

  21. Odds Ratio vs. Rate Ratio outcome (disease) exposure yes no total yes a b a+b no c d c+d a+c b+d a+b+c+d • rate ratio=a(a+b)/c(c+d) • odds rate ratio= ad/bc • when a<<b and c<<d, OR=RR

  22. R by C Contingency Table Column RowCol1 Col2 . . total R1 C11 C12 . . Rm1 R2 C21 C22 . . Rm2 . . . . . . . . . . . . total Cm1 Cm2 . . grand total

  23. Linear Regression Y = a + bX Y: outcome X: exposure (predictor; continuous or dichomotous) a: intercept (the baseline risk in many cases) b: regression coefficient • Indicates the incremental change of Y associated with each one-unit increase in X. • When Y is the risk of an outcome, b indicates the risk difference associated with one-unit increase in X.

  24. Logistic Regression logit(p) = a + bX logit(p) = p/(1-p) X: exposure (predictor) a: intercept (indicating baseline risk in many cases) b: regression coefficient • eb indicates the odds ratio associated with each one-unit increase in X.

  25. Measurements of Association • All measurements of effect. • Correlation coefficient. • Kappa statistics. (Po-Pe) / (1-Pe) Po=observed probability of concordance Pe=expected probability of concordance

  26. Correlation Coefficient ( r ) • A value from –1 to 1. • A negative value indicates a negative association, and positive value indicates a positive association. • |r| < 0.4 indicates a poor correlation. • 0.4 < |r| < 0.75 indicates a fair to good correlation. • |r| > 0.75 indicates excellent correlation. • 0 indicates no association.

  27. Kappa Statistics (k) 2nd survey 1st survey yes no yes a b a+b no c d c+d a+c b+d T a1=(a+b)/T, a2=(c+d)/T, b1=(a+c)/T, b2=(b+d)/T Pe=a1b1+a2  b2 Po= (a+d)/T k=(Po-Pe)/(1-Pe)

  28. Hypothesis Testing • To test the statistical significance of the study result. • The test is on the basis of observed data, but the hypothesis is formulated before the data collection. • An approach to reach an objective interpretation of data. • Statistical significance does not necessarily in concordance to clinical significance.

  29. Types of Hypotheses • Null hypothesis: There is no association between the predictor and the outcome tested. • difference in parameter =0 • relative risk=1 • regression slope=0 • correlation coefficient=0 • Alternative hypothesis: There is an association between the predictor and the outcome tested.

  30. Null v.s. Alternative Hypotheses • The null hypothesis is tested. • The alternative hypothesis is not tested directly. • We reject the null hypothesis if the statistical test result is significant, but we do not accept the alternative hypothesis by rejecting the null hypothesis. • The validity of the alternative hypothesis is built up by repetitive rejection of the null hypothesis.

  31. Types of Hypotheses • One-tailed (sided) hypothesis: specify the direction of the association • relative risk>1 • relative risk<1 • Two-tailed (sided) hypothesis: not specify the direction of the • difference in parameter≠0 • relative risk≠1 • regression slope≠0 • correlation coefficient≠0

  32. One vs. Two Tailed Hypotheses • In one tailed test, the p value is twice for the same test statistics as in one-sided testing. • We should have a scientific basis in conducting one tailed test. • Whether one or two tailed test should be conducted should also be determined before the testing.

  33. Types of Statistical Tests • Parametric tests. • Based on assumptions of the distribution of the variables. • May lead to wrong conclusions. • Non-parametric tests. • Not based on assumptions of the distribution of the variables. • Can always be applied. • May loss power. • Less sensitive to extreme values.

  34. Variables • Dichotomous • based on count data • “yes” vs. “no” in most cases • OR and RR • Categorical data • nominal data • ordinal data • Continuous data

  35. Dichotomous Variable • 2 by 2 • R by C • Chi-square test • McNemar’s test (for correlated proportion)

  36. Dichotomous Variable Patients Controls Smokers 60 20 Non-Smokers 40 80 N=200 • Case-Control Study: OR=60x80/20x40=6 • Cohort Study: RR= (60/80) ÷ (40/120)=2.25 • Chi-square test

  37. Categorical • Contingency Table (chi-square test) • Kappa statistics (for reproducibility) • Rank correlation

  38. Continuous • Two groups • two-sample t test • paired t test • correlation coefficient • Multiple groups • ANOVA (analysis of variance)

  39. Non-parametric tests • Chi-square test: Fisher’s exact test • Two sample-t test: Wilcoxon rank sum test • paired t test: Wilcoxon sign rank test • Person correlation coefficient: Spearman rank correlation • ANOVA: Kruskal-Wallis test

  40. Multi-variate (Multiple) Regession • linear regression • logistic regression • poisson regression • Cox proportion hazard models • adjust for effects of other predictors • evaluate the effects of more than one predictor at the same time

  41. Right Time to Select Study Groups prospective: A retrospective: B,C,D cross-sectional: B,C (A,D) Exposure Outcome Time A B D C

  42. Measurements of stability • p value. • Confidence interval.

  43. Normal Distribution • If X follows a normal distribution with a mean of μ and a variance of σ2 and Z = (X-μ) ÷ σ, then Z follows a normal distribution with a mean of 0 and a variance of 1; Z~N(0,1). • If x is a sample mean of X from n subjects, then x follows a normal distribution with a mean of μ and a variance of σ2 /n; x~N(μ, σ2 /n). • Z = (x-μ) ÷ √σ2/n = (x-μ) ÷σ√n .

  44. t Distribution • When the variance of the underlying normal distribution (σ2)is unknown, it can be estimated by the variance of the sample s, but the value (x-μ) ÷ s√n = t does not follow a normal distribution. • The value t follows a t distribution with a degree of freedom df = n-1. • The test on the basis of a t distribution is called “t test.” • The test that uses Z as the test statistic is called “z test.”

  45. Central Limit Theorem • Let X1, … Xn be a random sample from a population with mean μ and variance σ2. Then, for a large n, the mean of X ~ N(μ, σ2/n) even if the underlying distribution of individual observations in the population is not normal.

  46. Critical-value Method

  47. Critical-value Method p(t≦tn-1, α) = p(t≧tn-1,1-α) p(t≦t99, 0.05) = p(t≧t99, 0.95) N=60, u=1.671 N=120, u=1.658 N=99, u=X (1.671-X) ÷ (X-1.658) = (99-60) ÷ (120-99) X=1.663 t99, 0.05 = -1.663

  48. The p Value • The α level at which we would be indifferent between accepting or rejecting Ho given the sample data at hand. • The α level at which the given value of the test statistic would be on the borderline between the acceptance and rejection regions. • The probability of obtaining a test statistic as extreme as or more than the actual test statistic obtained given that the null hypothesis is true.

  49. The p Value • It is a measurement of the statistical significance of the test, but not the clinical significance. • It is a measurement of the stability of the point estimate, not the magnitude of the effect. • Direct comparisons between p values have little value.

More Related