Importance of the normal distribution

Importance of the normal distribution (Session 09)

Learning Objectives At the end of this session you will be able to: • discuss reasons why the normal probability distribution is important • state the Central Limit Theorem and its value in approximating Binomial and Poisson probabilities by normal probabilities • explain how the assumption of normality for a given random variable can be checked

Importance of Normal Distribution • Many measurements can be closely approximated by the normal distribution since many variables show normal variation as a resultant of many minor influences up and down • Data which are not normal, can often be transformed into a normal random variable • The normal distribution underpins a lot of inference ideas. We have seen that probability statements about any normally distributed variable can be done via N(0,1)

The Central Limit Theorem (CLT) • One of the key reasons why the normal distribution is important is because of the Central Limit Theorem (CLT). • This theorem states that the sample mean of any random variable has an approximate normal distribution, provided that the sample size is sufficiently large.

Consequences of the Central Limit Theorem • Many statistical techniques are based on the assumption that the mean of the distribution follows a normal distribution • As a consequence of the Central Limit Theorem, the above assumption is not invalidated as long as the sample size is large enough, e.g. say > about 30. • The CLT also implies that the binomial and Poisson probabilities approach the normal probabilities as n becomes large (see below).

Normal approximation to the binomial distribution • Recall that the form of the binomial distribution for p=0.5 closely resembles the normal distribution • This is because the binomial probabilities are symmetric when p=0.5 • However, even with p0.5, the normal approximation holds for large n because a binomial random variable is the mean of several Bernoulli random variables and then the CLT applies

Normal approximation to the Poisson distribution • Recall from previous session (slides 8-12) that as the Poisson parameter  becomes large, the shape of the Poisson distribution becomes bell-shaped and symmetrical • This is again a consequence of the CLT since  is the mean of the Poisson distribution

More formally… If is an average of a series of n Bernoulli random variables (0,1 variables), then has a normal distribution with mean 0 and variance 1 (standard normal) when the sample size n is large. Note that = r/n, where r=number of successes in n trials, i.e. r is a binomial random variable.

and further … The same result is true for the Poisson average, i.e. Z defined below can be approximated by the standard normal distribution for large values of .

Checking for normality Thus the normal distribution plays an important role in statistics. Most of the techniques covered in Modules H2 and H8 are based on assuming that the key response of interest follows a normal distribution. We therefore need to be able to check whether measurements on a given random variable follows a normal distribution. This is done by producing a normal probability plot.

Normal Probability Plot Statistics software packages generally have a facility for producing this plot. Below is the plot for maize cob weights. In this plot, the Y-axis corresponds to values you would expect from an actual normal distribution. The X-axis corresponds to your data. This implies that a straight line indicates the normality assumption is valid. What do you deduce from graph below?

Importance of the normal distribution