1 / 142

MT2004

MT2004. Olivier GIMENEZ Telephone: 01334 461827 E-mail: olivier@mcs.st-and.ac.uk Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html. So far, data-driven statistical methods i.e. use data to answer questions in direct ways The rest of the module - from Section 7 - deals with

Download Presentation

MT2004

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MT2004 Olivier GIMENEZ Telephone: 01334 461827 E-mail: olivier@mcs.st-and.ac.uk Website: http://www.creem.st-and.ac.uk/olivier/OGimenez.html

  2. So far, data-driven statistical methods i.e. use data to answer questions in direct ways The rest of the module - from Section 7 - deals with CAPTURING PATTERNS IN THE DATA USING MODELS: Modelling Step ANALYSING THESE MODELS TO ANSWER QUESTIONS: Estimation and Inference Step

  3. 7. Basic Normal theory & the Central Limit Theorem 7.1 Basic properties of normal distributions See Section 2.2.2

  4. 7.1.1 Linear transformation of a normal r.v. Let X be a random variable with mean  and variance 2 Let Y = a + b X, then E(Y) = ? V(Y) = ?

  5. 7.1.1 Linear transformation of a normal r.v. Let X be a random variable with mean  and variance 2 Let Y = a + b X, then E(Y) = a + b E(X) = a + b  V(Y) = b2 V(X) = b22

  6. 7.1.1 Linear transformation of a normal r.v. Now, if X is normally distributed with mean  and variance 2, then Y = a + b X is normally distributed too. In other words, any linear combination of a normal distribution is a normal distribution. And more precisely, according to the previous slide, X  N(,2)  Y  N (a + b  , b2 2) Demonstration: homework

  7. 7.1.1 Linear transformation of a normal r.v. Now, suppose that X  N(,2), and consider What is the distribution of Z ?

  8. and remember that X  N(,2)  Y = a + b X  N (a + b  , b2 2) so by identification, we obtain Write Finally 7.1.1 Linear transformation of a normal r.v.

  9. 7.1.1 Linear transformation of a normal r.v. Result:

  10. 7.1.1 Linear transformation of a normal r.v. Very useful result for working out probabilities associated with any normal distributions. Idea: transform to the standard normal distribution N(0,1) and use the published tables for probabilities associated with N(0,1). E.g.: z 0.0 0.5 1.0 2.0 2.5 3.0 Pr(Z>z) 0.5000 0.30854 0.15866 0.02275 0.00621 0.00135 (See Table 5 of the K & Y Tables)

  11. 7.1.1 Linear transformation of a normal r.v. Example: Calculate the probability that a random variable X  N(3,4) takes a value between 4 and 5

  12. 7.1.1 Linear transformation of a normal r.v. Example: Calculate the probability that a random variable X  N(3,4) takes a value between 4 and 5 We wish to compute Pr(4  X  5). Using the result above, we have that Z = (X-3)/2  N(0,1) So Pr(4  X  5) = Pr(4-3  X-3  5-3) = Pr(1/2  Z  1) Finally Pr(4  X  5) = Pr(Z  1) - Pr(Z  1/2) = (1-0.15866) - (1-0.30854) = 0.14988

  13. Sums of (normal) r.v's occur frequently in statistical theory (e.g. mean, variance...). Distribution? If X1, X2 independent with Xi N(i,i2) i=1,2, then Extension: X1,...,Xn independent r.v's with Xi N(i,i2) i=1,...,n and a1,...,an constants, then 7.1.2 Sums of independent normal random variables

  14. 7.1.2 Sums of independent normal random variables

  15. 7.1.2 Sums of independent normal random variables

  16. 7.1.2 Sums of independent normal random variables

  17. 7.2 The Central Limit Theorem We've just seen that the mean of n independent identically normally distributed r.v's is itself normally distributed. The Central Limit Theorem (CLT) states that the mean of n i.i.d. r.v's from any distribution is approximately normally distributed for large enough n.

  18. 7.2 The Central Limit Theorem The Central Limit Theorem (CLT) states that the mean of n i.i.d. r.v's from any distribution is approximately normally distributed for large enough n.

  19. 7.2 The Central Limit Theorem The Central Limit Theorem (CLT) states that the mean of n i.i.d. r.v's from any distribution is approximately normally distributed for large enough n.

  20. 7.2 The Central Limit Theorem Example: A bridge can hold at most 400 vehicles if they are bumper-to-bumper and stationary. The mean weight of vehicles using the bridge is 2.5 tonnes with a standard deviation of 2.0 tonnes. What is the probability that the maximum design load of 1100 tonnes will be exceeded in a traffic jam?

  21. 7.2 The Central Limit Theorem Example: A bridge can hold at most 400 vehicles if they are bumper-to-bumper and stationary. The mean weight of vehicles using the bridge is 2.5 tonnes with a standard deviation of 2.0 tonnes. What is the probability that the maximum design load of 1100 tonnes will be exceeded in a traffic jam? Let Xi be the weight of a vehicle, i=1,...n. Here, we have that n = 400,  = 2.5t and  = 2.0t. The probability that the maximum design load of 1100 tonnes will be exceeded in a traffic jam is given by Pr(iXi > 1100). We'd like to use the CLT: X1,...,Xn i.i.d. r.v's with mean  and variance 2:

  22. 7.2 The Central Limit Theorem Example: A bridge can hold at most 400 vehicles if they are bumper-to-bumper and stationary. The mean weight of vehicles using the bridge is 2.5 tonnes with a standard deviation of 2.0 tonnes. What is the probability that the maximum design load of 1100 tonnes will be exceeded in a traffic jam?

  23. How to use Tables? Table of the Standard Normal Distribution values inside the table = areas under Z  N(0,1) between - and z i.e. (z) = P(Z  z) Example 1: to determine (1.96)=P(Z1.96), i.e. the area under the curve between - and 1.96, look in the intersecting cell for the row labelled 1.90 and the column labelled 0.06. The area under the curve is 0.975. Example 2: Find z such as (z) = 0.95. P(Z1.64)=0.9495 and P(Z1.65)=0.9505 so that z = 1.645. Example 3: (-1.23) = 1-(1.23)=1-.8907=0.1093

  24. 7.3 Approximating other distributions by normal distributions The CLT also provides the justification for approximating several other distributions by a normal distribution. We consider two examples, the Binomial and the Poisson distributions. The Binomial probability distribution is: It becomes hard to evaluate it for large n as the factorials in the binomials coefficient 'explode'. However, the CLT can be used to overcome this problem

  25. 7.3 Approximating other distributions by normal distributions We first note that if X  Bin(n,p), then X can be written as a sum of n independent binomials r.v's: X = X1 + ... + Xn, where Xi Bin(1,p). Each Xi has mean p and variance p(1-p). Thus the CLT implies that Alternatively: In real life, the approximation will be good enough when

  26. 7.3 Approximating other distributions by normal distributions X  Bin(n,p); X = X1 + ... + Xn, where Xi Bin(1,p); each Xi has mean p and variance p(1-p). Alternatively: Example: The probability of annual survival of a bird species is 0.4. Suppose we are studying a population of n = 200 individuals. What is the probability that less than 50% of the population survives the current year.

  27. 7.3 Approximating other distributions by normal distributions Example: The probability of annual survival of a bird species is 0.4. Suppose we are studying a population of n = 200 individuals. What is the probability that less than 50% of the population survives the current year. Let Xi be the random variable 'individual i survives the year', we have that  Bin(1,0.4); each Xi has mean 0.4 and variance 0.4(1-0.4). Then X = X1 + ... + Xn is the total of surviving individuals, X  bin(200,0.4). As: Via the CLT: So that

  28. So that 7.3 Approximating other distributions by normal distributions Example: The probability of annual survival of a bird species is 0.4. Suppose we are studying a population of n = 200 individuals. What is the probability that less than 50% of the population survives the current year. with Z  N(0,1). Using tables for the standard normal distribution, we have that P(Z<3)=0.9987. Without invoking the CLT, we would need to compute P(X<100) = P(X=0) + P(X=1) + ... + P(X=99)

  29. 7.3 Approximating other distributions by normal distributions The Poisson probability function is: It becomes hard to evaluate it for high values of  as x gets huge. However, the CLT can be used to overcome this problem. We note first that if X1, X2 independent with Xi Pois(i) i = 1,2 then X1 + X2 Pois(1+2) (to be proved in Honours)

  30. 7.3 Approximating other distributions by normal distributions We note that if X  Pois(), then X can be written as a sum of n independent Poisson r.v's: X = X1 + ... + Xn, where Xi Pois(/n). Each Xi has mean /n and variance /n (mean = variance: homework) Thus the CLT implies that So that

  31. 7.3 Approximating other distributions by normal distributions Example: Find the probability that a Poisson distributed r.v. with mean 25 takes a value in the range 26 to 30.

  32. 7.3 Approximating other distributions by normal distributions Example: Find the probability that a Poisson distributed r.v. with mean 25 takes a value in the range 26 to 30. If X  Pois(25), we need to calculate P(26  X  30) The CLT tells us that Using tables for the standard normal distribution, we have that P(26  X  30) = P(0.2  Z  1) = P(Z  1) - P(Z  0.2) = 0.8413 – 0.5793 = 0.262

  33. 8. Practical Applications of Normal Distributions Why are normal distributions so important? 1 – The CLT shows that sums of i.i.d r.v’s tend towards normality, even if the r.v’s are non-normal 2 – Many data sets for which a normal distribution provides a good model (describe adequately the data): heights of people, IQ scores… 3 – Easy to work with mathematically (integrals, tables…) 4 – Statistical procedures based on normality assumption are often insensitive to small violations of the assumption (ANOVA e.g., see future section) 5 – Non-normal distributions can be transformed to approximate normality

  34. 8.1 Testing for normality Before using the normal distribution as a model of data to perform test about the mean of a population e.g., we need to decide whether or not the random sample under investigation could have been drawn from a normal distribution. There a analytical tests (Pearson, Kolmogorow…) but we will focus on a graphical method here. We won’t be able to prove normality, but only fail to reject the hypothesis that the data come from the normal distribution (hypothesis testing philosophy, finite random sample)

  35. 8.1 Testing for normality First idea: use a histogram, and compare with what we would expect for a normal distribution, i.e. bell-shaped, symmetric with a single peak (unimodal) …

  36. 8.1 Testing for normality Histograms of random samples (n=30) from N(3,var=25) vs Density curve of N(xi/n,s2) Difficult to conclude for normality, because of variability

  37. 8.1 Testing for normality Second idea: Remember that any normal r.v. is a linear transformation of a standard normal r.v. So if y1,…,yn is a random sample from any normal r.v. (N(,2) say) and z1,…,zn a random sample from a N(0,1) Then plot the sortedy values against the sortedz values We would get something close to a straight line because Y =  +  Z

  38. 8.1 Testing for normality Plots of random samples (n=30) from N(3,var=25) against N(0,1) Difficult to conclude for normality, because of variability

  39. 8.1 Testing for normality Third idea: to overcome the problem of variability, use an ‘idealised’/theoretical average sample from N(0,1), the normal scores

  40. 8.1 Testing for normality If Z  N(0,1), by definition of the cumulative distribution function/(lower) quantile, we have that: P(Z  z10% quantile) = Φ(z10% quantile) = 0.10 P(Z  z20% quantile) = Φ(z20% quantile) = 0.20 … P(Z  z100% quantile) = Φ(z100% quantile) = 1.00 Meaning that, on average, we expect 10% of the data points to lie below the 10% quantile of the c.d.f., 20% below the 20% quantile, …, and 100% below the 100% quantile.

  41. 8.1 Testing for normality Consider a sample of 10 points e.g. from N(0,1) We’ve got 10 probability intervals corresponding to the quantiles: [0,0.1], [0.1,0.2], [0.2,0.3], [0.3,0.4], [0.4,0.5] [0.5,0.6], [0.6,0.7], [0.7,0.8], [0.8,0.9], [0.9,1.0] For convenience, consider the mid-point of each interval (i-0.5)/10, i=1,…,10 The normal scores are obtained by computing Φ-1((i-0.5)/10), i=1,…,10, where  is the c.d.f. of the N(0,1)

  42. 8.1 Testing for normality Consider a sample of 10 points e.g. from N(0,1) 0.05 Φ-1((1-0.5)/10)= Φ-1(0.05) = -1.645

  43. 8.1 Testing for normality Consider a sample of 10 points e.g. from N(0,1) 0.15 Φ-1((2-0.5)/10)= Φ-1(0.15) = -1.036

  44. 8.1 Testing for normality Consider a sample of 10 points e.g. from N(0,1) Finally…

  45. 8.1 Testing for normality Idea: Plot the observed y sorted values against the normal scores, then check visually for linearity Example: Early in the 20th century, a Colonel L.A. Waddell collected 32 skulls from Tibet. He collected 2 groups: 17 from graves on Sikkim and 15 from a battlefield near Lhasa. Here are maximum skull length measurements (in mm) for the Lhasa group: 182, 180, 191, 184, 181, 173, 189, 175, 196, 200, 185, 174, 195, 197, 182. Before doing anything with these data (e.g. testing for a difference in the mean skull length between the 2 groups), we need to check for normality first.

  46. Example

  47. Example

  48. Example

  49. Example

  50. Example x=seq(1,15) y=qnorm((x-0.5)/15)) # calculates Φ-1((i-0.5)/15), theoretical quantiles o=c(173,174,175,180,181,182,182,184,185,189,191,195,196,197,200) plot(y,o,xlab="Theoretical quantiles",ylab="Sample quantiles")

More Related