The ‘Normal’ Distribution

The ‘Normal’ Distribution

Objectives • Review the Normal Distribution • Properties of the Standard Normal Distribution • Review the Central Limit Theorem • Use Normal Distribution in an inferential fashion

Theoretical Distribution • Empirical distributions • based on data • Example: empirical distribution for a bootstrapped regression coefficient • Theoretical distribution • based on mathematics • derived from model or estimated from data • Example: Standard Normal

The Normal Distribution • What is it? Why do we care? • The important thing is that distributions are tied to probabilities, and it is the probability which will be of interest to us • If we know something about the distribution of events, then we can estimate the likelihood of our particular event of interest (data)

What’s the big deal with the normal one? • We believe that the variables of interest to us are normally distributed in the population • This may actually be a rather bold assumption • See Micerri, Wilcox • Assuming a normal distribution allows us to take advantage of its properties and make inferences from our sample to the population • The theoretical sampling distribution of various statistics do seem to be normally distributed • Central limit theorem regards the sampling distribution • Most of the stats we use have normality as an assumption in some form • Though many researchers misunderstand it1,2

Normal Probability Distribution • Symmetrical, bell-shaped curve • Also known as Gaussian distribution • Point of inflection = 1 standard deviation from mean • This is, despite what some seem to think, all a ‘normal’ distribution is: a continuous probability distribution

Normal Probability Distribution • Since we know the shape of the curve, we can (using calculus) calculate the area under the curve • The percentage of that area can be used to determine the probability that a given value could be pulled from a given distribution • The area under the curve tells us about the probability- in other words we can obtain an observed p-value for our result (data) by treating it as a normally distributed outcome • Issue: • Each normal distribution with its own values of m and s would need its own calculation of the area under various points on the curve

Normal Probability DistributionStandard Normal Distribution – N(0,1) • We often use the standard normal distribution as a result • “Bell-shaped” • Mean of 0 •  • Standard deviation of 1 •  • Possesses an infinite number of possible values.

Normal Probability Distribution • The probability of any one of those values occurring is essentially zero (but never quite) • Curve has a total area or probability = 1 • For normal distributions+ 1 SD ~ 68%+ 2 SD ~ 95%+ 3 SD ~ 99.9% • Note: not all bell shaped symmetrical distributions are normal distributions

Normal Distribution • The standard normal distribution will allow us to make claims about the probabilities of values related to our own data • How do we apply the standard normal distribution to our data?

Z-score • If we know the population mean and population standard deviation, for any value of X we can compute a z-score by subtracting the population mean and dividing the result by the population standard deviation

Important z-score info • Z-score tells us how far above or below the mean a value is in terms of standard deviations • It is a linear transformation1 of the original scores • Multiplication (or division) of and/or addition to (or subtraction from) X by a constant • Relationship of the observations to each other remains the same • Z = (X-m)/s • X = sZ + m

Example: GRE • Say we have GRE scores (Verbal) that are normally distributed with mean 500 and standard deviation 100.1 • Find the probability that a randomly selected GRE score is greater than 620. • We want to know what’s the probability of getting a score 620 or beyond. • p(z > 1.2) • Result: The probability of randomly getting a score of 620 or greater is ~.12

Extension: Standard Scores • Often units based on z-scores are presented instead of the z-score itself • First convert whatever score you have to a z score. Then: • New score = new s.d.(z) + new mean • Example- T scores = mean of 50 s.d. 10 • Then T = 10(z) + 50 • Examples of standard scores: IQ, GRE, SAT

Extension: Interval Estimates • With the standard normal we can create interval estimates for particular scores of interest • Note that Howell’s wording on p.77 is not typically how we are using confidence intervals and would be incorrect unless we are dealing with the population of scores (which he is in his example) • The reason is that our methods provide one of an infinite number of CIs x% of which ‘capture’ the parameter. • Our typical methods assume a fixed parameter and ‘random intervals’, not a fixed interval into which a random parameter might fall. • However the formula for an interval estimate there is one you’ll see a lot of variations on

Summary Normal Distribution • Assuming our data is normally distributed allows for us to use the properties of the normal distribution to assess the likelihood of some outcome • This gives us a means by which to determine whether we might think one hypothesis is more plausible than another (even if we don’t get a direct likelihood of either hypothesis)

The ‘Normal’ Distribution