1 / 60

MT2004

MT2004. Olivier GIMENEZ Telephone: 01334 461827 E-mail: olivier@mcs.st-and.ac.uk Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html. 9. Distributions derived from normal distributions. In the previous section, we assume that the variance of the whole population was known

Download Presentation

MT2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MT2004 Olivier GIMENEZ Telephone: 01334 461827 E-mail: olivier@mcs.st-and.ac.uk Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html

  2. 9. Distributions derived from normal distributions In the previous section, we assume that the variance of the whole population was known Unlikely to be the case… So we need methods to deal with both mean and variance of the whole population are unknown To develop the theory underlying such methods, we need to introduce first some other distributions but related to the normal distribution Namely, the 2, t and F distributions

  3. 9.1 2 distributions

  4. 9.1 2 distributions

  5. 9.1 2 distributions Upper quantile = value above which some specified proportion of the area of a p.d.f. lies

  6. 9.1 2 distributions The 5% upper quantile of a 25 is x such Pr(25 x) = 0.05

  7. 9.1 2 distributions The 5% upper quantile of a 25 is x such Pr(25 x) = 0.05 or alternatively Pr(25 x) = 0.95 i.e. the lower 95% quantile

  8. 9.1 2 distributions Pr(25 x) = 0.95 (the lower 95% quantile) is obtained using the R command: > qchisq(0.95,5) # cumulative d. f. [1] 11.07050

  9. 9.1 2 distributions Example: Suppose that X, Y, and Z are coordinates in 3-dimensional space which are independently distributed as N(0,1), with all measurements in cm. What is the probability that the point (X,Y,Z) lies more than 3 cm from the origin?

  10. 9.1 2 distributions Example: Suppose that X, Y, and Z are coordinates in 3-dimensional space which are independently distributed as N(0,1), with all measurements in cm. What is the probability that the point (X,Y,Z) lies more than 3 cm from the origin?

  11. 9.1 2 distributions Example: Suppose that X, Y, and Z are coordinates in 3-dimensional space which are independently distributed as N(0,1), with all measurements in cm. What is the probability that the point (X,Y,Z) lies more than 3 cm from the origin?

  12. 9.1 2 distributions

  13. 9.1 2 distributions

  14. 9.1 2 distributions

  15. 9.2 The Fdistributions

  16. 9.2 The Fdistributions The 5% upper quantile of a Fdf1,df2 is x such Pr(Fdf1,df2 x) = 0.05 Use Tables or R command qf(0.95,df1,df2) (lower 95% quantile)

  17. 9.2 The Fdistributions So if we have a table with the upper quantiles, we can also get the lower quantiles as follows. Remember that: Upper quantile = value above which some specified proportion of the area of a p.d.f. lies Lower quantile = value below which some specified proportion of the area of a p.d.f. lies

  18. 9.2 The Fdistributions So if we have a table with the upper quantiles, we can also get the lower quantiles as follows.

  19. 9.2 The Fdistributions So if we have a table with the upper quantiles, we can also get the lower quantiles as follows. i.e. upper (1-) quantile of Fn,k or lower  quantile of Fn,k is the inverse of the upper  quantile of the Fk,n

  20. 9.2 The Fdistributions Example: Given that F3,2;0.025 = 39.17, find F2,3;0.975 (i.e. lower 0.025 = 1-0.975 quantile of the F2,3 distribution) F2,3;0.975 = 1/ F3,2;0.025 = 1/39.17 = 0.0255

  21. 9.2 The Fdistributions Example: Given that F3,2;0.025 = 39.17, find F2,3;0.975 (i.e. lower 0.025 = 1-0.975 quantile of the F2,3 distribution) F2,3;0.975 = 1/ F3,2;0.025 = 1/39.17 = 0.0255 R commands > par(mfrow=c(2,1)) > plot(x,df(x,2,3),xlab="",ylab="",type='l') > title("pdf F(2,3)") > plot(x,df(x,3,2),xlab="",ylab="",type='l') > title("pdf F(3,2)")

  22. 9.3 The tdistributions

  23. 9.3 The tdistributions The shape of the p.d.f. of tn depends on n

  24. 9.3 The tdistributions Looks like a normal distribution, but more of the probability is in the centre and the tails, see the graph for t1 e.g. (top left)

  25. 9.3 The tdistributions

  26. 9.3 The tdistributions tn; is the upper  quantile of the t distribution with n degrees of freedom

  27. 9.3 The tdistributions Use tables or R, e.g. qt(0.95,30) (=1.859548) gives the lower 95% quantile of the t distribution with 8 degrees of freedom (upper 5% quantile) (qt(0.95,5000) = 1.645158…)

  28. 10 Using tdistributions To derive the distribution of the statistic testing hypotheses about the mean of a normal population with unknown variance, we need a key result on the joint distribution of the sample mean and the sample variance Remember that:

  29. 10 Using tdistributions To derive the distribution of the statistic testing hypotheses about the mean of a normal population with unknown variance, we need a key result on the joint distribution of the sample mean and the sample variance

  30. 10 Using tdistributions The quantity T depends on the population mean  but not on the unknown variance 2. So this statistic will be useful to test hypotheses about the mean population of normal populations with unknown variance

  31. 10.2 One-sample t-testsand confidence intervals One sample t-tests: 39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219. We assume normality. Question: at the 1% significance level, could this data set be considered as a random sample from a population with mean 75. In other words (Step 1 of hypothesis testing strategy): H0:  = 75 against H1 75 Your turn. Perform step 2 (find a ‘good test statistic’) and step 3 (derive its distribution)

  32. 10.2 One-sample t-testsand confidence intervals One sample t-tests: 39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219. Step 1: H0:  = 75 against H1 75 Step 2: Xi/n - 0 is a good candidate since it takes ‘extreme’ values if H1 is true, and moderate values if H0 is true. Step 4: it’s a 2-sided test, so we will reject H0 if tobs –tn-1;/2 or tobs tn-1;/2 (graphical representation)

  33. 10.2 One-sample t-testsand confidence intervals One sample t-tests: 39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219. Step 1: H0:  = 75 against H1 75 Step 2: Xi/n - 0 is a good candidate since it takes ‘extreme’ values if H1 is true, and moderate values if H0 is true. If one-sided test, H1: <0, we reject if tobs –tn-1; If one-sided test, H1: >0, we reject if tobs tn-1;

  34. 10.2 One-sample t-testsand confidence intervals One sample t-tests: 39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219. So we will reject if tobs 2.7045 or if tobs -2.7045 P-value using R: > 2*pt(tobs,38) # (tobs<0 so need to double the c.d.f. of tobs – 2-sided test) > 0.003799049

  35. 10.2 One-sample t-testsand confidence intervals Confidence interval: 39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219. We’d like to build up a 99% confidence interval for , we’re looking for values of  for which we would accept H0 We know that:

  36. 10.2 One-sample t-testsand confidence intervals Confidence interval: 39 observations on pulse rates (heart beats/minute) of Indigenous Peruvians had sample mean 70.31 and sample variance 90.219. So we would accept any value of  such that 75 is outside the confidence interval, so we would reject H0 at the 1% significance level

  37. 10.2 One-sample t-testsand confidence intervals Confidence interval: With R, a 95% confidence interval is obtained as follows: > cil = 70.31 + qt(0.975,38)*90.219/sqrt{39} > cil = 70.31 - qt(0.975,38)*90.219/sqrt{39} > c(cil,ciu) > [1] 67.23099 73.38901 And the 99% confidence interval is obtained as > c(70.31 + qt(0.995,38)*90.219/sqrt{39}, 70.31 + qt(0.995,38)*90.219/sqrt{39}

  38. 10.3 Paired t-tests Consider two samples of observations (Xi,Yi) Consider the case: the two measurements (Xi,Yi) are made on the same unit i We wish to test if the two population means are equal Example: measurement of left and right wing length of birds Should not be treated as independent!!!!! Obviously, length of left wing and length of right wing both tend to be large for large birds: dependent measurements Idea: work with the differences between the two measurement on each unit, i.e. Xi-Yi, in order to go back to a one-sample t-test e.g.

  39. 10.3 Paired t-tests • Example: corneal thickness in microns for both eyes of patients who have glaucoma in one eye • Glaucoma 488 478 480 426 440 410 458 460 • Healthy 484 478 492 444 436 398 464 476 • Obviously, the corneal thickness is likely to be similar in the two eyes of any patient – dependent observations • Consider di = glaucomai – healthyi. We will assume that this new random sample is drawn from a normal distribution N(d,2), and we wish to test: H0: d=0 vs H1: d0 • di = -32 ; di2 = 936 and

  40. t -t/2 t/2 10.3 Paired t-tests • Example: corneal thickness in microns for both eyes of patients who have glaucoma in one eye • H0: d=0 vs H1: d0 • di = -32 ; di2 = 936, s2 = 115.43 and t7;0.025 = 2.3646 (see Tables) tobs > - t7;0.025 and tobs < t7;0.025 meaning that tobs is in the region of acceptance of H0

  41. 10.3 Paired t-tests • Example: corneal thickness in microns for both eyes of patients who have glaucoma in one eye • H0: d=0 vs H1: d0 • di = -32 ; di2 = 936, s2 = 115.43 and t7;0.025 = 2.3646 (see Tables) tobs > - t7;0.025 and tobs < t7;0.025 meaning that tobs is in the region of acceptance of H0 At the 5% significance level, we fail to reject H0, so there is apparently no difference between the good eye and the diseased eye

  42. 10.4 Two-sample t-tests Now, we want to deal with two sets of data and compare, e.g., their means We consider that the two random samples are drawn from normal distributions with unknown but same variances. More formally

  43. 10.4 Two-sample t-tests We consider that the two random samples are drawn from normal distributions with unknown but same variances. We know that the distributions of the sample means of the two samples are: so that (using results on sums of normal r.v’s) As usual, we’d like to relate this distribution to a standard normal random variable…

  44. 10.4 Two-sample t-tests We consider that the two random samples are drawn from normal distributions with unknown but same variances. We have that: Obviously, if we assume that  is known, we can test hypotheses about the difference in means between the two groups (see the one-sample case – z-test). But we assume that  is unknown. So we need to do again what we’ve done for the t-test (one-sample test about the mean with unknown variance).

  45. 10.4 Two-sample t-tests More precisely, first find the distribution of: We note that: where

  46. 10.4 Two-sample t-tests Similarly, we have that: where

  47. 10.4 Two-sample t-tests Putting the two latter results together, we have that, using the additivity of 2 r.v’s: Note that the above quantity can be written as where: is called the pooled sample variance.

  48. 10.4 Two-sample t-tests Remember that we have:

  49. 10.4 Two-sample t-tests So let the test statistic T be which is actually the ratio of following distributions: i.e. a t distribution with n+m-2 degrees of freedom!

  50. 10.4 Two-sample t-tests Now we can see that T can be re-written as follows: or: The quantity T depends on the population means X and Y but not on the unknown variance 2. This statistic is thus useful to test hypotheses about the difference in means between the 2 populations.

More Related