290 likes | 298 Views
Learn how to describe and analyze distributions using graphical and numerical tools. Explore the concept of density curves and understand the center and spread of distributions. Discover the characteristics of normal distributions and their importance in statistical analysis.
E N D
Introduction So far we two types of tools for describing distributions…graphical and numerical. We also have a strategy for exploring data on a single quantitative variable: Always plot your data: make a histogram. Remember to label and scale these for good communication! Look for (and verbally describe) the overall pattern (shape, center, spread, peaks) and for striking deviations such as outliers/gaps of the variable’s distribution. Based on the results of the graphical analysis, choose either the five-number summary or the mean and standard deviation to briefly describe center and spread in the numbers. Be aware of the numerical summary limitations. NOW we add, if the overall pattern of a large number of observations is very regular we can describe it by a smooth curve.
Density Curves Think of drawing a curve through the tops of the bars in a histogram, smoothing out the irregular ups and downs of the bars. • Most histograms show the counts of observations in each class by the heights of their bars and therefore by the areas of the bars. We set up curves to show the proportion of observations in any region by areas under the curve. • Choose the scale so that the total area under the curve is exactly 1. We then have a density curve. • A histogram is a plot of data obtained from a sample. We use this histogram to understand the actual distribution of the population from which the sample was selected. • The density curve is intended to reflect the idealized shape of the population distribution.
Center and Spread of Density Curves • Density curves help us better understand our measures of center and spread. Areas under a density curve represent proportions of the total number of observations. • The median is the point with half the observations on either side. So the median of a density curve is the equal-areas point, the point with half the area under the curve to its left and the remaining half of the area to its right. • The quartiles divide the area under the curve into quarters. One-fourth of the area under the curve is to the left of the first quartile, and three-fourths of the area is to the left of the third quartile. • You can roughly locate the median and quartiles of any density curve by eye by dividing the area under the curve into four equal parts.
Center and Spread of Density Curves • If we think of the observations as weights stacked on a seesaw, the mean is the point at which the seesaw would balance. This fact is also true of density curves. The mean is the point at which the curve would balance if made of solid material. • A symmetric curve balances at its center because the two sides are identical. • The mean and median of a symmetric density curve are equal. We know that the mean of a skewed distribution is pulled toward the long tail. The mean of a density curve is the point at which it would balance.
Center and Spread of Density Curves Median and Mean of a Density Curve The medianof a density curve is the equal-areas point, the point that divides the area under the curve in half. The meanof a density curve is the balance point, or center of gravity, at which the curve would balance if made of solid material. The median and mean are the same for a symmetric density curve. They both lie at the center of the curve. The mean of a skewed curve is pulled away from the median in the direction of the long tail.
The mathematical, ideal versions of Normal distributionsare perfectly symmetrical, bell-shaped distributions with a single peak. In the ideal version, the peak corresponds to the mean, median, and mode of the distribution. There are an infinite number of Normal Distributions. • BE CAUTIOUS! Not all symmetric, bell-shaped distributions are Normally distributed, so do not assume that every bell shaped curve in Normally distributed. This is addressed in courses in Statistics. Keep in mind that: • No real world data set matches these idealized curves exactly. • You cannot judge normality of data on the basis on visual examination, we will use the methods discussed here only when TOLD that the variable is normally distributed. Normal Distributions
Normal Distributions Normal curves are symmetric, single-peaked, and bell-shaped. Their tails fall off quickly, so that we do not expect outliers. Because Normal distributions are symmetric, the mean and median lie together at the peak in the center of the curve. Normal curves have the special property that giving the mean and the standard deviation completely specifies the curve. The mean fixes the center of the curve, and the standard deviation determines its shape. • Changing the mean of a Normal distribution does not change its shape, only its location on the axis. • Changing the standard deviation does change the shape of a Normal curve.
Characteristics of Normal Distributions The curve drops smoothly on both sides, flattening near but never touching the x-axis. The points of inflection (where the curve changes from concave down to concave up) occur on either side of the mean, median and mode value, at about 60% of the height of the highest point and enclose about 2/3 of the total area. The inflection points are located horizontally at 1 standard deviation, σ (sigma), on either side of the mean, μ (mu)
Each specific Normal curve is described completely by its mean, μ and standard deviation, σ. μ give location along the x axis, σ determines shape The total area under the curve is 1…probabilities/ percentages are found by determining areas in intervals. There are Infinite Normal Distributions
Normal Density Curves Here is a summary of basic facts about Normal curves. • Normal Density Curves • The Normal curves are symmetric, bell-shaped curves that have these properties: • A specific Normal curve is completely described by giving its mean and its standard deviation. • The mean determines the center of the distribution. It is located at the center of symmetry of the curve. • The standard deviation determines the shape of the curve. It is the distance from the mean to the change-of-curvature points on either side.
The 68-95-99.7 Rule There are many Normal curves, each described by its mean and standard deviation. All Normal curves share many properties. In particular, the standard deviation is the natural unit of measurement for Normal distributions. This fact is reflected in the following rule. • The 68-95-99.7 Rule • In any Normal distribution, approximately • 68% of the observations fall within one standard deviation of the mean. • 95% of the observations fall within two standard deviations of the mean. • 99.7% of the observations fall within three standard deviations of the mean.
The 68-95-99.7 Rule • The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7th-grade students in Gary, Indiana, is Normal with mean 6.84 and standard deviation 1.55. • Sketch the Normal density curve for this distribution. • What percent of ITBS vocabulary scores are less than 3.74? Given that the ITBS vocabulary scores for 7th graders in Gary Indiana are Normal with µ = 6.84 and σ = 1.55, we would expect about 2.5% of their ITBS vocabulary scores to be less than 3.74.
The 68-95-99.7 Rule • The distribution of Iowa Test of Basic Skills (ITBS) vocabulary scores for 7th-grade students in Gary, Indiana, is Normal with mean 6.84 and standard deviation 1.55. • What percent of the scores are between 5.29 and 9.94? Given that the ITBS vocabulary scores for 7th graders in Gary Indiana are Normal with µ = 6.84 and σ = 1.55, we would expect about 13.5% of their ITBS vocabulary scores to be between 5.29 and 9.94.
Empirical Rule Usage is very limited!!! The empirical rule is a quick and easy way to approximate areas under a normal curve…but it only works if we are interested in areas exactly 1, 2 or 3 standard deviations from the mean. To look at other areas (probabilities), we need a different method. The most typical way to do this involves using a standard normal table. This is the method used in most MATH 220 sections, so we will cover that next But for today, we will practice the 68, 95, 99.7 rule approach …
Your turn: 68 – 95 - 99.7 (Empirical) rule examples The distribution of heights of young men in the US is nearly normally distributed with a mean 70 inches and standard deviation 2.5 inches. Use the 68-95-99.7 rule to answer the questions that follow. • Start by labeling the variable on the x axis and then marking the values of the mean, 1 standard deviation, 2 standard deviation and 3 standard deviation intervals on a sketched normal curve.
Your turn: 68 – 95 - 99.7 (Empirical) rule examples The distribution of heights of young men in the US is nearly normally distributed with a mean 70 inches and standard deviation 2.5 inches. Use the 68-95-99.7 rule to answer the questions that follow.
68 – 95 - 99.7 (Empirical) rule examples • About what percent of US young men are taller than 75 inches? Shade the relevant area on curve. Shade the relevant area on curve. Find the relevant area/probability. Write an appropriate contextual sentence that answers the question asked.
68 – 95 - 99.7 (Empirical) rule examples • About what percent of US young men are taller than 75 inches? Shade the relevant area on curve. Given that the heights of young men from the US are nearly Normal with µ = 70 inches and σ = 2.5 inches, we would expect about 2.5% of these young men to have heights above 75 inches.
68 – 95 - 99.7 (Empirical) rule examples • Between what values do the heights of the middle 95% of US young men fall? Shade the relevant area on curve. Find the relevant area/probability. Write an appropriate contextual sentence that answers the question asked.
68 – 95 - 99.7 (Empirical) rule examples • Between what values do the heights of the middle 95% of US young men fall? Since the heights of young men from the US are nearly Normal with µ = 70 inches and σ = 2.5 inches, we would expect that the middle 95% of such heights to be between 65 inches and 75 inches.
68 – 95 - 99.7 (Empirical) rule examples • Approximately how short are the shortest 16% of US young men? Shade the relevant area on curve. Find the relevant area/probability. Write an appropriate contextual sentence that answers the question asked.
68 – 95 - 99.7 (Empirical) rule examples • Approximately how short are the shortest 16% of US young men? Since the heights of young men from the US are nearly Normal with µ = 70 inches and σ = 2.5 inches, we would expect that the shortest 16% of these men have heights less than 67.5 inches.
68 – 95 - 99.7 (Empirical) rule examples • Approximately what percent of US young men are taller than 67.5 inches? Shade the relevant area on curve. Find the relevant area/probability. Write an appropriate contextual sentence that answers the question asked.
68 – 95 - 99.7 (Empirical) rule examples • Approximately what percent of US young men are taller than 67.5 inches? Given that the heights of young men from the US are nearly Normal µ = 70 inches and σ = 2.5 inches, we would expect that about 84% of these young men would have heights above 67.5 inches.
68 – 95 - 99.7 (Empirical) rule examples • What is the approximate probability that a randomly chosen young man from the US has a height between 67.5 and 75 inches? Shade the relevant area on curve. Find the relevant area/probability. Write an appropriate contextual sentence that answers the question asked.
68 – 95 - 99.7 (Empirical) rule examples • What is the approximate probability that a randomly chosen young man from the US has a height between 67.5 and 75 inches? Since the heights of young men from the US are nearly Normal µ = 70 inches and σ = 2.5 inches, there is about a .815 probability that a young man randomly selected from this group would have a height between 67.5 inches and 75 inches.