590 likes | 600 Views
This course covers the basic concepts and common statistical methods in biostatistics, with a focus on their interpretation in clinical medicine. Topics include error checking, data transformation, descriptive analysis, hypothesis testing, and more.
E N D
臨床醫學常用統計方法 郭浩然 成功大學醫學院環境醫學研究所 成大醫院職業及環境醫學部
Course Contents • Basic concepts in biostatistics. • Common statistical methods. • Interpretation of statistical tests.
What is statistics? • Statistics The science whereby inference are made about specific random phenomenon on the basis of relatively limited material. • Mathematical statistics. • Applied statistics.
What is biostatistics? The branch of applied statistics that concerns the application of statistical methods to medical and biological problems. • Figures (numbers). • Data.
Evaluation of the Vita-Stat • Vita-Stat is an automatic blood pressure measuring device • Origin of the problem: A friend of Dr. Rosner has his diastolic pressure measured as 115 mm Hg several times and even up to 130 mm Hg by a machine. • A measurement at a clinic was 90 mm Hg.
Basic Steps of Data Analysis • Error checking. • Data transformation. • Descriptive analysis. • Hypothesis testing. • Univariate analysis. • Multi-variate analysis.
Error Checking • Errors. • Measurement error. • Transcription error. • Coding error. • Key-in error. • Errors Checking. • Frequency table or plot. • Range check. • Logic check. • Repeat measurement.
Variables • Dichotomous data. • Based on count data. • “Yes” vs. “no” in most cases. • Categorical data. • Nominal data. • Ordinal data. • Continuous data.
Data Transformation • Continuous to categorical. • Related to hypothesis. • Natural or biological scale. • Convenience scale. • Equal participants. • Log transformation.
Descriptive Statistics • Measurement of central location. • Mean. • Median. • Mode. • Measurement of spread. • Standard deviation. • Range. • Q1 to Q3. • Coefficient of variation.
Coefficient of Variation 100% x (s/mean) • Relate the arithmetic mean and the standard deviation. • No unit.
Variance and Standard Deviation • Variance: The sum of the squares of the deviations divided by (n-1). • Standard deviation: The square root of variance. • Standard error?
Measurements • Frequency. • Effect. • Association. • Stability.
頻率(frequency)的指標 • 盛行率(prevalence) • 發病率(incidence) • 累積發病率(cumulative incidence) • 死亡率(mortality) • 致死率(case fatality) • 存活率(survival) • 比例(ratio)
Measurements of Effect • Absolute risk. • Risk difference. • Regression coefficient of a linear model. • Relative risk. • Rate ratio • Odds ratio. • Risk: cumulative incidence
Fourfold (Two by Two) Table • The ultimate form of data in an analytical epidemiologic study. • By convention, the columns indicate the outcome status and the rows indicate the exposure status. disease (outcome) PT exposure yes no total yes a b a+b PTe no c d c+d PTu a+c b+d T(a+b+c+d)
世代研究 outcome (disease) PT exposure yes no total yes a b a+b PTe no c d c+d PTu a+c b+d a+b+c+d • (incidence) rate difference= a/PTe – c/Ptu • risk difference=a/(a+b) – c/(c+d) • risk ratio=a(a+b)/c(c+d) • (incidence) rate ratio= aPTe/cPTu
病例對照研究 disease exposure case control total yes a b a+b no c d c+d a+c b+d a+b+c+d • odds ratio=ac/bd=ad/bc
勝算比(odds ratio) disease exposure case control yes a b no c d • odds = p/ (1-p) • exposure odds in cases=a(a+c)/c(a+c)=ac • exposure odds in controls=bd • exposure odds ratio=ac/bd=ad/bc
橫斷式研究 outcome (disease) exposure yes no total yes a b a+b no c d c+d a+c b+d a+b+c+d • prevalence rate ratio=a(a+b)/c(c+d) • odds ratio= ad/bc
Odds Ratio vs. Rate Ratio outcome (disease) exposure yes no total yes a b a+b no c d c+d a+c b+d a+b+c+d • rate ratio=a(a+b)/c(c+d) • odds rate ratio= ad/bc • when a<<b and c<<d, OR=RR
R by C Contingency Table Column RowCol1 Col2 . . total R1 C11 C12 . . Rm1 R2 C21 C22 . . Rm2 . . . . . . . . . . . . total Cm1 Cm2 . . grand total
Linear Regression Y = a + bX Y: outcome X: exposure (predictor; continuous or dichomotous) a: intercept (the baseline risk in many cases) b: regression coefficient • Indicates the incremental change of Y associated with each one-unit increase in X. • When Y is the risk of an outcome, b indicates the risk difference associated with one-unit increase in X.
Logistic Regression logit(p) = a + bX logit(p) = p/(1-p) X: exposure (predictor) a: intercept (indicating baseline risk in many cases) b: regression coefficient • eb indicates the odds ratio associated with each one-unit increase in X.
Measurements of Association • All measurements of effect. • Correlation coefficient. • Kappa statistics. (Po-Pe) / (1-Pe) Po=observed probability of concordance Pe=expected probability of concordance
Correlation Coefficient ( r ) • A value from –1 to 1. • A negative value indicates a negative association, and positive value indicates a positive association. • |r| < 0.4 indicates a poor correlation. • 0.4 < |r| < 0.75 indicates a fair to good correlation. • |r| > 0.75 indicates excellent correlation. • 0 indicates no association.
Kappa Statistics (k) 2nd survey 1st survey yes no yes a b a+b no c d c+d a+c b+d T a1=(a+b)/T, a2=(c+d)/T, b1=(a+c)/T, b2=(b+d)/T Pe=a1b1+a2 b2 Po= (a+d)/T k=(Po-Pe)/(1-Pe)
Hypothesis Testing • To test the statistical significance of the study result. • The test is on the basis of observed data, but the hypothesis is formulated before the data collection. • An approach to reach an objective interpretation of data. • Statistical significance does not necessarily in concordance to clinical significance.
Types of Hypotheses • Null hypothesis: There is no association between the predictor and the outcome tested. • difference in parameter =0 • relative risk=1 • regression slope=0 • correlation coefficient=0 • Alternative hypothesis: There is an association between the predictor and the outcome tested.
Null v.s. Alternative Hypotheses • The null hypothesis is tested. • The alternative hypothesis is not tested directly. • We reject the null hypothesis if the statistical test result is significant, but we do not accept the alternative hypothesis by rejecting the null hypothesis. • The validity of the alternative hypothesis is built up by repetitive rejection of the null hypothesis.
Types of Hypotheses • One-tailed (sided) hypothesis: specify the direction of the association • relative risk>1 • relative risk<1 • Two-tailed (sided) hypothesis: not specify the direction of the • difference in parameter≠0 • relative risk≠1 • regression slope≠0 • correlation coefficient≠0
One vs. Two Tailed Hypotheses • In one tailed test, the p value is twice for the same test statistics as in one-sided testing. • We should have a scientific basis in conducting one tailed test. • Whether one or two tailed test should be conducted should also be determined before the testing.
Types of Statistical Tests • Parametric tests. • Based on assumptions of the distribution of the variables. • May lead to wrong conclusions. • Non-parametric tests. • Not based on assumptions of the distribution of the variables. • Can always be applied. • May loss power. • Less sensitive to extreme values.
Variables • Dichotomous • based on count data • “yes” vs. “no” in most cases • OR and RR • Categorical data • nominal data • ordinal data • Continuous data
Dichotomous Variable • 2 by 2 • R by C • Chi-square test • McNemar’s test (for correlated proportion)
Dichotomous Variable Patients Controls Smokers 60 20 Non-Smokers 40 80 N=200 • Case-Control Study: OR=60x80/20x40=6 • Cohort Study: RR= (60/80) ÷ (40/120)=2.25 • Chi-square test
Categorical • Contingency Table (chi-square test) • Kappa statistics (for reproducibility) • Rank correlation
Continuous • Two groups • two-sample t test • paired t test • correlation coefficient • Multiple groups • ANOVA (analysis of variance)
Non-parametric tests • Chi-square test: Fisher’s exact test • Two sample-t test: Wilcoxon rank sum test • paired t test: Wilcoxon sign rank test • Person correlation coefficient: Spearman rank correlation • ANOVA: Kruskal-Wallis test
Multi-variate (Multiple) Regession • linear regression • logistic regression • poisson regression • Cox proportion hazard models • adjust for effects of other predictors • evaluate the effects of more than one predictor at the same time
Right Time to Select Study Groups prospective: A retrospective: B,C,D cross-sectional: B,C (A,D) Exposure Outcome Time A B D C
Measurements of stability • p value. • Confidence interval.
Normal Distribution • If X follows a normal distribution with a mean of μ and a variance of σ2 and Z = (X-μ) ÷ σ, then Z follows a normal distribution with a mean of 0 and a variance of 1; Z~N(0,1). • If x is a sample mean of X from n subjects, then x follows a normal distribution with a mean of μ and a variance of σ2 /n; x~N(μ, σ2 /n). • Z = (x-μ) ÷ √σ2/n = (x-μ) ÷σ√n .
t Distribution • When the variance of the underlying normal distribution (σ2)is unknown, it can be estimated by the variance of the sample s, but the value (x-μ) ÷ s√n = t does not follow a normal distribution. • The value t follows a t distribution with a degree of freedom df = n-1. • The test on the basis of a t distribution is called “t test.” • The test that uses Z as the test statistic is called “z test.”
Central Limit Theorem • Let X1, … Xn be a random sample from a population with mean μ and variance σ2. Then, for a large n, the mean of X ~ N(μ, σ2/n) even if the underlying distribution of individual observations in the population is not normal.
Critical-value Method p(t≦tn-1, α) = p(t≧tn-1,1-α) p(t≦t99, 0.05) = p(t≧t99, 0.95) N=60, u=1.671 N=120, u=1.658 N=99, u=X (1.671-X) ÷ (X-1.658) = (99-60) ÷ (120-99) X=1.663 t99, 0.05 = -1.663
The p Value • The α level at which we would be indifferent between accepting or rejecting Ho given the sample data at hand. • The α level at which the given value of the test statistic would be on the borderline between the acceptance and rejection regions. • The probability of obtaining a test statistic as extreme as or more than the actual test statistic obtained given that the null hypothesis is true.
The p Value • It is a measurement of the statistical significance of the test, but not the clinical significance. • It is a measurement of the stability of the point estimate, not the magnitude of the effect. • Direct comparisons between p values have little value.