The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE

Chapter 2: Modeling Distributions of Data Section 2.2 Normal Distributions The Practice of Statistics, 4th edition - For AP* STARNES, YATES, MOORE

Normal Distributions *One particularly important class of density curves are the Normal curves, which describe Normal distributions. *All Normal curves are symmetric, single-peaked, and bell-shaped. *A Specific Normal curve is described by giving its mean µ and standard deviation σ. Two Normal curves, showing the mean µ and standard deviation σ.

Definition: • A Normal distribution is described by a Normal density curve. Any particular Normal distribution is completely specified by two numbers: its mean µ and standard deviation σ. • The mean of a Normal distribution is the center of the symmetric Normal curve. • The standard deviation is the distance from the center to the change-of-curvature points on either side. • We abbreviate the Normal distribution with mean µ and standard deviation σ as N(µ,σ). Normal Distributions Normal distributions are good descriptions for some distributions of real data. Normal distributions are good approximations of the results of many kinds of chance outcomes. Many statistical inference procedures are based on Normal distributions.

The 68-95-99.7 Rule Although there are many Normal curves, they all have properties in common. • Definition:The 68-95-99.7 Rule (“The Empirical Rule”) • In the Normal distribution with mean µ and standard deviation σ: • Approximately 68% of the observations fall within σ of µ. • Approximately 95% of the observations fall within 2σ of µ. • Approximately 99.7% of the observations fall within 3σ of µ.

Example 1: The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7th grade students in Gary, Indiana, is close to Normal. Suppose the distribution is N(6.84, 1.55). a) Sketch the Normal density curve for this distribution. b) What percent of ITBS vocabulary scores are less than 3.74? Show your work. c) What percent of the scores are between 5.29 and 9.94? Show your work.

The Standard Normal Distribution All Normal distributions are the same if we measure in units of size σ from the mean µ as center. Definition: The standard Normal distribution is the Normal distribution with mean 0 and standard deviation 1. If a variable x has any Normal distribution N(µ,σ) with mean µ and standard deviation σ, then the standardized variable has the standard Normal distribution, N(0,1).

Because all Normal distributions are the same when we standardize, we can find areas under any Normal curve from a single table. The Standard Normal Table • Definition:The Standard Normal Table • Table A is a table of areas under the standard Normal curve. The table entry for each value z is the area under the curve to the left of z. Suppose we want to find the proportion of observations from the standard Normal distribution that are less than 0.81. We can use Table A: P(z < 0.81) = .7910

Finding area to the right What if we wanted to find the proportion of observations from the standard Normal distribution that are greater than −1.78? To find the area to the right of z = −1.78, locate −1.7 in the left-hand column of Table A, then locate the remaining digit 8 as .08 in the top row. The corresponding entry is .0375. (See the excerpt from Table A below.) This is the area to the left of z = −1.78. To find the area to the right of z = −1.78, we use the fact that the total area under the standard Normal density curve is 1. So the desired proportion is 1 − 0.0375 = 0.9625. A common student mistake is to look up a z-value inTable A and report the entry corresponding to that z-value, regardless of whether the problem asks for the area to the left or to the right of that z-value. To prevent making this mistake, always sketch the standard Normal curve, mark the z-value, and shade the area of interest. And before you finish, make sure your answer is reasonable in the context of the problem.

Example 2: Find the proportion of observations from the standard Normal distribution that are between –1.25 and 0.81. Can you find the same proportion using a different approach? 1 –(0.1056+0.2090) = 1 – 0.3146 = 0.6854

How to Solve Problems Involving Normal Distributions • State: Express the problem in terms of the observed variable x. • Plan: Draw a picture of the distribution and shade the area of interest under the curve. • Do: Perform calculations. • *Standardizex to restate the problem in terms of a standard Normal variable z. • *Use Table A and the fact that the total area under the curve is 1 to find the required area under the standard Normal curve. • Conclude: Write your conclusion in the context of the problem.

Example 3: On the driving range, Tiger Woods practices his swing with a particular club by hitting many, many balls. When Tiger hits his driver, the distance the ball travels follows a Normal distribution with mean 304 yards and standard deviation 8 yards. What percent of Tiger’s drives travel at least 290 yards? • STATE: Let x = the distance that Tiger’s ball travels. The variable x has a Normal distribution with μ = 304 and σ = 8. We want the proportion of Tiger’s drives with x ≥ 290. • PLAN: • DO:DO: N(304, 8) • Normalcdf(290, 999999, 304, 8) = 0.9599 • CONCLUDE: About 96% of Tiger Woods’s drives on the range travel at least 290 yards.

In a Normal distribution, the proportion of observations with x ≥ 290 is the same as the proportion with x > 290. There is no area under the curve exactly above the point 290 on the horizontal axis, so the areas under the curve with x ≥ 290 and x > 290 are the same. This isn’t true of the actual data. Tiger may hit a drive exactly 290 yards. The Normal distribution is just an easy-to-use approximation, not a description of every detail in the actual data. The key to doing a Normal calculation is to sketch the area you want, then match that area with the area that the table gives you.

Example 4: When Tiger Woods hits his driver, the distance the ball travels can be described by N(304, 8). What percent of Tiger’s drives travel between 305 and 325 yards? STATE: As in the previous example, we let x = the distance that Tiger’s ball travels. PLAN: DO: N(304, 8) Normalcdf(305, 325, 304, 8) = 0.4459 CONCLUDE: About 45%of Tiger’s drives travel between 305 and 325 yards.

Finding a value when given a proportion The previous two examples illustrated the use of Table A to find what proportion of the observations satisfies some condition, such as “Tiger’s drive travels between 305 and 325 yards.” Sometimes, we may want to find the observed value that corresponds to a given percentile. To do this, use Table A backwards. Find the given proportion in the body of the table, read the corresponding z from the left column and top row, then “unstandardize” to get the observed value.

Example 5: High levels of cholesterol in the blood increase the risk of heart disease. For 14-year-old boys, the distribution of blood cholesterol is approximately Normal with mean μ = 170 milligrams of cholesterol per deciliter of blood (mg/dl) and standard deviation σ = 30 mg/dl. What is the first quartile of the distribution of blood cholesterol? STATE: Let x = the cholesterol level of a 14-year-old boy. PLAN: DO: N(170, 30) invNorm(0.25, 170, 30) = 149.77 CONCLUDE: The first quartile of blood cholesterol levels in 14-year-old boys is about 150 mg/dl.

Assessing Normality The Normal distributions provide good models for some distributions of real data. Many statistical inference procedures are based on the assumption that the population is approximately Normally distributed. Consequently, we need a strategy for assessing Normality. • Plot the data. • *Make a dotplot, stemplot, or histogram and see if the graph is approximately symmetric and bell-shaped. • Check whether the data follow the 68-95-99.7 rule. • *Count how many observations fall within one, two, and three standard deviations of the mean and check to see if these percents are close to the 68%, 95%, and 99.7% targets for a Normal distribution.

Normal Probability Plots Most software packages can construct Normal probability plots. These plots are constructed by plotting each observation in a data set against its corresponding percentile’s z-score. AP EXAM TIP: Normal probability plots are not included on the AP Statistics course outline. However, these graphs are very useful tools for assessing Normality. You may use them on the AP exam if you wish—just be sure that you know what you’re looking for (linear pattern). Interpreting Normal Probability Plots If the points on a Normal probability plot lie close to a straight line, the plot indicates that the data are Normal. Systematic deviations from a straight line indicate a non-Normal distribution. Outliers appear as points that are far away from the overall pattern of the plot.

How can we determine shape from a Normal probability plot? Look at the Normal probability plot of the guinea pig survival data. Imagine drawing a line through the leftmost points, which correspond to the smaller observations. The larger observations fall systematically to the right of this line. That is, the right-of-center observations have much larger values than expected based on their percentiles and the corresponding z-scores from the standard Normal distribution. This Normal probability plot indicates that the guinea pig survival data are strongly right-skewed. In a right-skewed distribution, the largest observations fall distinctly to the right of a line drawn through the main body of points. Similarly, left-skewness is evident when the smallest observations fall to the left the line.

The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE