150 likes | 244 Views
Using the 68-95-99.7 Rule Normal Quantile Plots. Learning Objectives. By the end of this lecture, you should be able to: Do various calculations involving areas under the density curve using the 68-95-99.7 rule
E N D
Learning Objectives By the end of this lecture, you should be able to: • Do various calculations involving areas under the density curve using the 68-95-99.7 rule • Identify the mathematical technique used to help confirm (thought not guarantee!) that our distribution is indeed Normal.
A few numbers worth memorizing(though not just yet) • Because we use the Normal distribution SO much, it is worth memorizing the approximate areas from the Normal table that correspond to a few different z-scores. • I say approximate, because the values are rounded off. • Look at the areas shown here – but don’t memorize them just yet. z = -2 about 2.2% z = -1 about 16% z = +1 about 84% z = +2 about 98% • What I do want you to memorize are the 3 numbers shown in a famous ‘rule’ on the next slide.
The 68-95-99.7% Rule for Normal Distributions • This is essentially a “shortcut” for a mental ballpark of the areas under the normal curve. It is definitely worth memorizing. • The area between-1 and +1 standard deviations corresponds to about 68% of the observations. • The area between-2 and +2 standard deviations corresponds to about 95% of the observations. • The area between-3 and +3 standard deviations corresponds to about 99.7% of the observations. You WILL be asked to use these numbers on quizzes and exams. Please note that on your exams you will not be provided with the three numbers (68, 95, 99.7).
Examples: The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions The z=0 line (black line) is very helpful in doing many of these calculations. • Now let’s play around with these numbers by answering some questions. All numbers refer to z-scores (i.e. standard deviations): • What percentage of observations lie between -1 and +1? • Answer: As we just discussed, the number of observations between -1 and +1 standard deviations is 68%. • What percentage lie between 0 and +1? • Answer: Recall that z=0 represents 50%. So, if -1 to +1 is 68%, then 0 to +1 is half of that, which is 34%. • This is an important one. Make sure you understand how to do it!! • There are a few ways to think of it: Look at the area between z=0 (the black line) and z=+1. Note that is is half of the area between -1 and +1. • If you need to visualize it (and you should!!), then shade in the area between z=0 and z=+1. • What percentage of observations lie below +1? • Answer: To do this, look at your z=0 line. Make sure you recognize that the area to the left of z=0 represents 50% of observations. Now, how many observations are between 0 and +1? Recall from the previous question that this is 34%.Therefore, from 0 to +1 = 34, and below 0 is 50, so the area to the left of +1 represents 84% of observations.
Examples: The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions • More examples: • What percentage of observations lies between -2 and +1? • Answer: Use your midline! I would solve this by adding the area between -2 and 0 (half of 95%) to the area between 0 and +1 (half of 68%) 47.5%+ 34% = 81.5% • What percentage of observations lies between 0 and +3? • Answer: Half of the area between -3 and +3 (99.7) which is 49.85%. • What percentage of observations lies below -2? • Answer: While this too can be answered in a few different ways, I would like you to make sure you can do it this way: • Look at the area between -2 and +2. Our ‘shortcut’ tells us that this contains 95% of observations. • This means that the area above +2 and below -2 together compromise 5% of observations. So the area above +2 = 2.5% of observations, and the area below -2 also comprises 2.5% of observations. • Answer: 2.5% • What percentage of observations lies above +3? • Answer: Use the same technique as was just discussed: • Between -3 and +3 makes up 99.7. • Therefore below -3 and above +3 makes up 0.3%. • Therefore below -3 is 0.15% and above +3 = 0.15%
Examples: The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions • One more! • What percentage of observations lies below +2 standard deviations? • Answer: Repeat the process from before to determine the area on either side of +2 and -2. That value was 2.5%. If 2.5% of values lie above +2, then 97.5% of observations lie below it. • Answer: 97.5%
The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions • What percentage of women are between 62 and 67 inches tall? • Answer: Corresponds to -1 to +1 SDs, that is, about 68% • What is the range of heights between which about 95% of women fall? • Answer: About -2 to +2 SDs, so, about 59.5 to 69.5 inches tall. • What is the range of heights between which nearly all (over 99%) of women fall? • Answer: A quick answer would simply to pick the -3 to +3 SD range (57-72). Inflection point mean µ = 64.5 standard deviation s = 2.5 N(µ, s) = N(64.5, 2.5)
The 68-95-99.7% ‘Shortcut’ Rule for Normal Distributions • More Examples: • What percentage are taller than 67 inches? • Answer: If 68% of all women are between 62 and 67 inches tall, this means that 32% are outside of that range. In other words, 16% are shorter than 62 inches, and 16% are taller than 67. • What percentage are shorter than 59.5 inches? • Answer: If 95% of all women are between 59.5 and 69.5”, then 5% are outside of that range. In other words, 2.5% are shorter than 59.5 and 2.5% are taller than 69.5”. Inflection point mean µ = 64.5 standard deviation s = 2.5 N(µ, s) = N(64.5, 2.5)
* Is the distribution truly Normal? • Deciding whether data does indeed show a Normal (or, close to Normal) distribution is a very important question. • All the examples we’ve been discussing above involving z-scores assume that the data is Normal. If the data was not Normal, all of our answers and calculations would be flawed. • Recall that there are many other types of distributions that are not Normal. Some examples include skewed, bimodal, Binomial (later in the quarter), Poisson, etc, etc • Each type of distribution has its own characteristic formulas, calculations, inference techniques, etc. Again, because the Normal distirbution is one of the most commonly encountered distributions, we will spend lots of time discussingit. • So how to you decide if a distribution is Normal? • You might be tempted to say “look at a graph”. And this is not entirely false: When examining data, a chart is a great (if not the BEST) place to start! • However, as humans, we are easily fooled. There are many histograms (and related density curves) that look Normal, but in fact, are not. • Fortunately, we do have a statistical test that can help confirm (thought not guarantee) that our dataset does indeed appear to be Normal.
Normal Quantile Plot • The Normal Quantile plot is a graph that helps us determine if a distribution is indeed Normal • It is a mathematical plot that we can create using our statistical software package of choice. • Here is the method (which is provided for interest only): • The data points are ranked and the percentile ranks are converted to z-scores with Table A. The z-scores are then used for the x axis against which the data are plotted on the y axis of the normal quantile plot. • If the distribution is indeed normal the plot will show a straight line, indicating a good match between the data and a normal distribution. • Systematic deviations from a straight line indicate a non-normal distribution. Outliers appear as points that are far away from the overall pattern of the plot.
Good fit to a straight line: the distribution of rainwater pH values is close to normal. Curved pattern: the data are not normally distributed. Instead, it shows a right skew: a few individuals have particularly long survival times. Normal quantile plots are complex to do by hand, but they are standard features in most statistical software.
The normal quantile test supports normality, but does NOT guarantee it! • Two key points here: • If the plot IS straight, then you have supported the idea that your dataset is normal. However, you have NOT guaranteed it! • However, if the plot is NOT straight, then your data is NOT normal! • This concept (supportive tests) will come up with certain other statistical tests that we discuss down the road.
Shortcut Rule or Z-Table? • Students have often been confused as to which should be used. • Whenever possible, use your z-table as you will get a much more accurate result. In particular, if you are given z-scores that are not anywhere near whole numbers (e.g. 2.332), then there is no shortcut to use! The shortcut can only be used with whole (integer) numbers between -3 and +3. • The main purpose of learning the ‘shortcut’ rule (in addition to the fact that they come up on all kinds of exams), is to encourage you develop an undersatnding of what you are trying to do rather than just jumping to calculators and z-tables. • For this course, you will be asked to do both.