630 likes | 877 Views
Section 1.3. The Normal Distribution. Strategy for Exploring Data. Always plot your data make a graph usually a histogram or a stemplot Look for the overall pattern and for major deviations, such as outliers Remember: Shape , Center , and Spread
E N D
Section 1.3 The Normal Distribution
Strategy for Exploring Data • Always plot your data • make a graph usually a histogram or a stemplot • Look for the overall pattern and for major deviations, such as outliers • Remember: Shape, Center, and Spread • Calculate a numerical summary to briefly describe center and spread • Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve
Density Curves • A density curve is a curve that: • 1) always sits on or above the horizontal axis • 2) has area exactly 1 underneath it • A density curve describes the overall pattern of a distribution and is a mathematical model for the distribution • a mathematical model is an idealized description • A mathematical model gives a compact picture of the overall pattern of the data but ignores minor irregularities as well as any outliers
Density Curves • It is easier to work with a smooth curve than with a histogram • Remember, histograms depend on our choice of classes (i.e., bin width) • The area under the curve and above any range of values is the proportion of all observations that fall in that range • Density curves, like distributions, come in many shapes • Symmetric, skewed (left or right), etc.
The mean and median of a Density Curve • The median of a density curve is the equal-areas point • The point that divides the area under the curve in half • The mean of a density curve is the balance point • The point at which the curve would balance if made of solid material
Word of Caution • Because a density curve is an idealized description of the distribution of data, we need to distinguish between the mean and standard deviation of the density curve and the mean and standard deviation computed from the actual (sample) observations actual observations density curve Mean standard deviation
Normal Distributions • Normal curves are density curves that are: • symmetric • single-peaked • bell-shaped • describe Normal distributions • Normal distributions are described by giving its mean and its standard deviation • The mean is equal to the median (which property makes this true?)
Normal Distributions • Changing without changing moves the Normal curve along the horizontal axis without changing its spread (location) • The standard deviation controls the spread of a Normal curve • the larger is, the larger the spread of the curve • We abbreviate the Normal distribution with mean and standard deviation as N(, )
Normal distributions • Normal distributions are good descriptions for some distributions of real data • examples: manufacturing fill rates, crop yields, etc. • Normal distributions are good approximations to the results of many kinds of chance outcomes • examples: tossing a coin 1,000 times • Many statistical inference procedures based on Normal distributions work well for other roughly symmetric distributions
The 68 - 95 - 99.7 Rule • In the Normal distribution with mean and standard deviation : • 68% of the observations fall within 1 of • 95% of the observations fall within 2 of • 99.7% of the observations fall within 3 of • By remembering these numbers, you can think about Normal distributions without constantly making detailed calculations
Example • The distribution of weights of 9 oz bags of potato chips is approximately Normal with mean = 9.12 oz and standard deviation = 0.15 oz • N( 9.12 , 0.15 ) • range for 68% of data: • 9.12 - 0.15 = 8.97 and 9.12 + 0.15 = 9.27 → ( 8.97 , 9.27 ) • range for 95% of data: • 8.97 – 0.15 = 8.82 and 9.27 + 0.15 = 9.42 → ( 8.82 , 9.42 ) • range for 99.7% of data: • 8.82 – 0.15 = 8.67 and 9.42 + 0.15 = 9.57 → ( 8.67 , 9.57 )
Example Cont. • About what percent of bags weigh more than 9.12 ounces? • About what percent of bags weigh more than 9.42 ounces? • About what percent of bags weigh less than 8.67 ounces? We expect that 50 % of the bags weigh more than 9.12 oz We expect that 2.5 % of the bags weigh more than 9.42 oz We expect that 0.15 % of the bags weigh less than 8.67 oz
The z-score • If x is an observation from a distribution that has mean and standard deviation , the standardized value of x is often called a z-score • equation:
The z-score • A z-score tells us how many standard deviations the original observation falls away from the mean, and in what direction • observations larger than the mean are positive when standardized • observations smaller than the mean are negative when standardized
Chip Example continued… • Standardized weight: • z-score for bag weighing 9.3 ounces: • z-score for bag weighing 8.7 ounces:
Standard Normal Distribution • Standardizing a variable that has a Normal distribution produces a new variable that has the standard Normal distribution • Normal distribution N(0, 1) with mean 0 and standard deviation 1 • if a variable x has a Normal distribution N(, ) with mean and standard deviation , then the variable has the standard Normal distribution
Normal Distribution Calculations • An area under a density curve is a proportion of the observations in a distribution • any question about what proportion of observations lies in some range of values can be answered by finding an area under the curve • because all Normal distributions are the same when we standardize, we can find areas under any Normal curve from a single table (Table A)
The Standard Normal Table • Table A is a table of areas under the standard Normal curve • the table entry for each value z is the area under the curve to the left of z • example: z = 2.56 has an area of 0.0052 to the right of it
Finding Normal Proportions • Step 1: state the problem in terms of the observed variable x • Step 2: standardize x to restate the problem in terms of a standard Normal variable z • Remember to draw a picture • Step 3: find the required area under the standard Normal curve using Table A and the fact that the total area under the curve is 1
Chip Example continued… • What proportion of all 9-ounce bags of potato chips weighs less than 9.3 ounces? • N(9.12, 0.15) • standardized weight corresponding to 8.7 ounces: • See Graph on page 62, figure 1.23(a) • area from Table A:0.8849 (about 88.49 %)
Chip Example continued… • What proportion of all 9-ounce bags of potato chips weighs less than 8.7 ounces? • N(9.12, 0.15) • standardized weight corresponding to 8.7 ounces: • area from Table A:0.0026 (about 0.26 %)
Example 1.20 • The annual rate of return on stock indexes is approximately Normal • Since 1945, the S&P’s 500-stock index has had a mean yearly return of about 12%, with a standard deviation of 16.5% • The market is down for the year if the return on the index is less than zero • In what proportion of years is the market down?
Step 1: • State problem in terms of the observed variable x • The annual rate of return for the S&P 500 is our variable x, which has the N(12 ,16.5) distribution. We want to find the proportion of years with x < 0.
Step 2: • Standardize x to restate the problem in terms of a standard Normal variable z x < 0 Draw the picture!
Step 3: • Find the required area under the standard Normal curve using Table A and the fact that the total area under the curve is 1 • Find the z-score to the first decimal place in the left-hand column labeled “z” • Follow that row to the right until you are under the column that equals the second decimal place of z. • This value is the proportion of all values from the distribution that are less than your observed z-score. z = -0.73 → Area = 0.2327
Conclusion: • Interpret the result from Step 3 in terms of the original question of interest • The S&P 500 is down on an annual basis about 23.3% of the time. • By simply taking 100 % - 23.3 % = 76.7 %, we can also conclude that this stock index is up on an annual basis about 76.7% of the time.
Example 1.21 • What percent of years have annual rates of return between 12% and 50%?
Step 1: • State problem in terms of the observed variable x • We want the proportion of years with 12 ≤ x ≤ 50
Step 2: • Standardize x to restate the problem in terms of a standard Normal variable z 12 ≤ x ≤ 50 Draw the picture!
Step 3: • Find the required area under the standard Normal curve using Table A and the fact that the total area under the curve is 1 • the area between 0 and 2.30 is the area below 2.30 minus the area below 0. Use the picture to visualize! area between 0 and 2.30 = area below 2.30 – area below 0.00 = 0.9893 – 0.5000 = 0.4893
Conclusion: • Interpret the result from Step 3 in terms of the original question of interest • About 49 % of years have annual rates of return between 12 % and 50 %
General Information • The proportion of observations with x < 0 is the same as the proportion with x 0 (property of continuous curves) • There is no area under the curve at an exact value • for example: the proportion of years with 0% return is 0, even if there is such a year in the actual data • Sometimes we encounter a value of z more extreme than those appearing in Table A • for practical purposes, we can act as if there is zero area outside the range of Table A
“Backward” Normal Calculations • We may want to find the observed value with a given proportion of the observations above or below it • To do this: • find the given proportion in the inside of the table, read the corresponding z from the left column and top row, then “unstandardize” to get the observed value • general formula to unstandardize a z-score
Example 1.22 • Miles per gallon ratings of compact cars (2001 model year) follow approximately the N(25.7, 5.88) distribution • How many miles per gallon must a vehicle get to place in the top 10% of all 2001 model year compact cars?
State the problem • We want to find the miles per gallon rating x with area 0.1 to its right under the Normal curve with mean µ = 25.7 and standard deviation σ = 5.88. (That’s the same as finding the miles per gallon rating x with area 0.9 to its left)
Use the table • Look in the body of Table A for the entry closest to 0.9. • This is the entry corresponding to z = 1.28 • So z = 1.28 is the standardized value with area 0.9 to its left
Unstandardize • Transform the solution from the z back to the original x scale. x = 25.7 + (1.28)(5.88) = 33.2
Conclusion • Interpret the result in terms of the original question of interest • A compact car must receive a rating of at least 33.2 miles per gallon to place in the highest 10% of all 2001 model year compact cars.