1 / 23

Medical Biometry I

Medical Biometry I. (Biostatistics 511) Discussion Section Week 9 Phillip Keung. Discussion Outline. Brief Review on Two Definitions: Rejection Region (or equivalently, Hypothesis Testing) Confidence Interval (CI) Two-sample t Test: Paired t Test and the corresponding CI

deliz
Download Presentation

Medical Biometry I

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Medical Biometry I (Biostatistics 511) Discussion Section Week 9 Phillip Keung Biostat 511

  2. Discussion Outline • Brief Review on Two Definitions: • Rejection Region (or equivalently, Hypothesis Testing) • Confidence Interval (CI) • Two-sample t Test: • Paired t Test and the corresponding CI • Unpaired t Test and the corresponding CI • Binomial Proportion: • One-sample test and the CI • Two-sample (unpaired) test and the CI Biostat 511

  3. Discussion Outline • Brief Review on Two Definitions: • Rejection Region (or equivalently, Hypothesis Testing) • Confidence Interval (CI) • Two-sample t Test: • Paired t Test and the corresponding CI • Unpaired t Test and the corresponding CI • Binomial Proportion: • One-sample test and the CI • Two-sample (unpaired) test and the CI Biostat 511

  4. Hypothesis Testing • The main goal is to determine a Rejection Region • Given a null hypothesis, the Rejection Region tells the observed values for which we should reject the null hypothesis • E.g. • We observed • We are given that, under the null hypothesis, is approximately the N(0, 1) distribution, and is known. • We would reject the null hypothesis if the observed or (i.e. Rejection Region). Biostat 511

  5. Confidence Interval • The CI does not require a null hypothesis. • Given the observed value (e.g. average of n observations), the CI gives an interval estimate for the parameter of interest (e.g. mean). • E.g. • We are interested in a parameter. • We observed • We are given that is (approximately) the N(0, 1) distribution, and is known. • T). Biostat 511

  6. Intuitive Comparison between Rejection Region and CI • For a hypothesis test, we use the null value, m0, to judge whether the sample estimate of the mean is “extreme”, given the null hypothesis is true. We “center” our interval about the null value, m0. • For a confidence interval, we do not necessarily have a “null value” when we construct the interval, so we use our best estimate of the parameter (i.e., ) to center our interval. • For a two-sided a-level hypothesis test (remember the alternative hypothesis!) and a two-sided 100(1-a)% confidence interval, the length of the intervals will be the same! Biostat 511

  7. Discussion Outline • Brief Review on Two Definitions: • Rejection Region (for hypothesis tests) • Confidence Intervals (CI) • Two-sample t Test: • Paired t Test and the corresponding CI • Unpaired t Test and the corresponding CI • Binomial Proportion: • One-sample test and the CI • Two-sample (unpaired) test and the CI Biostat 511

  8. Two-sample t-tests: Paired t-test • We have two sets of observations. The first set of observations (of size n1) is a random sample for some population on a characteristic, X. The second set of observations (of size n2) is also a random sample for another population on the same characteristic, X. • The two sets of data are paired. That is, the ith observation in the first set is paired with the ith observation in the second set (e.g. two measurements on the same patient). Note that the sample size is the same in both sets due to the pairing. • We are interested in comparing the means between the two populations. We typically wish to investigate one of the two items below • H0: or H0: = 0 • CI: estimate . Biostat 511

  9. Two-sample t-test: Paired t-test • Hypothesis test: H0: 0 • the ithobservation in the first dataset, the ith observation in the second dataset. We first compute differences . Then • = average of , and sample variance of • If we assume the statistic, has t-distribution with df = n – 1, then • we can perform our paired two-sample hypothesis test using the one sample t-test on these differences. Biostat 511

  10. Two-sample t-test: Paired t-test • 9 • Consider it as an one-sample case with samples of d • T). Biostat 511

  11. Discussion Outline • Brief Review on Two Definitions: • Rejection Region (or equivalently, Hypothesis Testing) • Confidence Interval (CI) • Two-sample t Test: • Paired t Test and the corresponding CI • Unpaired t Test and the corresponding CI • Binomial Proportion: • One-sample test and the CI • Two-sample (unpaired) test and the CI Biostat 511

  12. Two-sample t-test: Unpaired t-test • We have two independent sets of observations. The first set consists of a random sample of size from one population and the second set consists of a random sample of size another population. Note that the sample size is not required to be the same in the two samples. • The samples are (approximately) normally distributed OR the sample size is reasonably large in each dataset. • We are typically interested in comparing the means for the two populations. • Hypothesis test: H= 0, or perhaps • CI: an estimate for . Biostat 511

  13. Two-sample t Test: Unpaired t Test • Hypothesis test: H0: 0? • We assume, or are approximately normal and and are independent. • Recall that sums of independent normal rv’sare normal with mean equal to the sum of the component means and variance equal to the sum of the component variances. • Hence, Biostat 511

  14. Two-sample t-test: Unpaired t-test • Hypothesis test: H0: 0? • For a = 0.05, reject the null hypothesis if or if , where the 97.5% quantile of a t-distribution with some degrees of freedom, df. • We will let Stata calculate the df for us. Biostat 511

  15. Two-sample t-test: Unpaired t-test • 9 • Recall that • T, • ). Biostat 511

  16. Comments on Two-sample t-tests • It is easy to see that the paired two-sample t Test could not be applied to the unpaired data, since the samples sizes are different in the two sets of observations. • We do not apply unpaired t-tests to paired data, because the observations are not independent. Oftentimes paired t-tests are more efficient than unpaired t-tests. • The choice of tests should depend on the study designs. • There are a few variations of the unpaired two-sample t-test. The most robust version of the test, that we use here, is the one that does not make restrictive assumptions of the variances for the two populations (i.e., the unequal variances t-test). Biostat 511

  17. Discussion Outline • Brief Review on Two Definitions: • Rejection Region (or equivalently, Hypothesis Testing) • Confidence Interval (CI) • Two-sample t Test: • Paired t Test and the corresponding CI • Unpaired t Test and the corresponding CI • Binomial Proportion: • One-sample test and the CI • Two-sample (unpaired) test and the CI Biostat 511

  18. Binomial Proportion: One-sample Test • Remember that a Binomial Proportion is an average of a set of Binary(p) (or Bernoulli(p)) variables (e.g. Non-diseased/Diseased patients with disease prevalence equal to p). • We want to test: • With an observed Binomial Proportion and the data , you can calculate the sample variance and then apply the previous one-sample test formula to this hypothesis testing question • But, recall the normal approximation to a binomial distribution and the mean-variance relationship for a binomial distribution (i.e. the variance of a binomial distribution entirely depends on its mean). • , , where Biostat 511

  19. Binomial Proportion: One-sample Test • Therefore, under the null hypothesis, is (approximately) a N(0, 1) distribution • For a two-sided 5% a-level test, we reject the null hypothesis if we observe a calculated Z-value less than -1.96 or greater than +1.96. • The test statistic involves the sum of the “successes” and the associated binomial variance (assuming the null hypothesis is true). A test statistic that investigates binomial probability would yield an identical value for the test statistic. Biostat 511

  20. Binomial Proportion: One-sample CI • A 95% CI for p can be obtained based on • Intuitively, the 95% “CI” for p is, • (). • Unfortunately, it is not a CI, since p is unknown here. • Recall that the example in which the sample variance is used to estimate the unknown population variance, and that is a very good estimate of p. • We replace all unknown p with and obtain a valid 95% CI for p: ( Biostat 511

  21. Discussion Outline • Brief Review on Two Definitions: • Rejection Region (or equivalently, Hypothesis Testing) • Confidence Interval (CI) • Two-sample t Test: • Paired t Test and the corresponding CI • Unpaired t Test and the corresponding CI • Binomial Proportion: • One-sample test and the CI • Two-sample (unpaired) test and the CI Biostat 511

  22. Binomial Proportion: Two-sample Test and Confidence Interval • We have two sets of Binary(pi) (or Bernoulli(pi)) samples, where i=1,2. • We want to test: • Once you realize that the Binomial Proportions are just averages, you can immediately apply two-sample unpaired t Test here, and obtain the corresponding 95% CI for . • You need the following (approximate) distribution for the test and the CI: Biostat 511

  23. Binomial Proportion: Two-sample Test and Confidence Interval • In practice, you just ask STATA to do the algebra work for you. Do not try to calculate them using your calculator. • Recall that we make a big assumption which is the sample size is large. If we cannot have a large sample size, Fisher’s Exact Test can be an alternative. Again, ask STATA or other statistical software to do it for you. The computation takes a lot of time. • Questions? Biostat 511

More Related