1 / 37

STAT 110 - Section 5 Lecture 17

STAT 110 - Section 5 Lecture 17. Professor Hao Wang University of South Carolina Spring 2012. TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A A A A A. Last time. Mean, median, quartiles Five number summary.

royal
Download Presentation

STAT 110 - Section 5 Lecture 17

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. STAT 110 - Section 5 Lecture 17 Professor Hao Wang University of South Carolina Spring 2012 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAA

  2. Last time • Mean, median, quartiles • Five number summary

  3. USC’s Points Scored (average is 79.5). 70 70 75 77 78 79 79 82 82 83 86 93 How can we measure the spread?

  4. USC’s Points Scored (average is 79.5). 70 70 75 77 78 79 79 82 82 83 86 93 The deviations are the differences between the values and the mean, (value-mean): (70-79.5) = -9.5 (79-79.5) = -0.5 (70-79.5) = -9.5 (82-79.5) = 2.5(75-79.5) = -4.5 (82-79.5) = 2.5(77-79.5)= -2.5 (83-79.5) = 2.5 (78-79.5) = -1.5 (86-79.5) = 6.5(79-79.5) = -0.5 (93-79.5) = 13.5 The average of these deviations is zero; this is true for any data set!

  5. USC’s Points Scored (average is 79.5). 70 70 75 77 78 79 79 82 82 83 86 93 The absolute deviations are the absolute value of the difference between the values and the mean: |70-79.5| = 9.5 |79-79.5| = 0.5 |70-79.5| = 9.5 |82-79.5| = 2.5|75-79.5| = 4.5 |82-79.5| = 2.5|77-79.5|= 2.5 |83-79.5| = 2.5 |78-79.5| = 1.5 |86-79.5| = 6.5|79-79.5| = 0.5 |93-79.5| = 13.5 The average of these values is 4.75. That is the average distance of an observation to the mean.

  6. Unfortunately, much of the math behind the statistical methods we will use later in the semester is based on calculus… and “calculus hates absolute values”! So we need some other way of making values positive instead of just using absolute values…

  7. USC’s Points Scored (average is 79.5). 70 70 75 77 78 79 79 82 82 83 86 93 (70-79.5) 2 = 90.25 (79-79.5) 2 = 0.25 (70-79.5) 2 = 90.25 (82-79.5) 2 = 6.25(75-79.5) 2 = 20.25 (82-79.5) 2 = 5.25(77-79.5)2 = 6.25 (83-79.5) 2 = 12.25 (78-79.5) 2 = 2.25 (86-79.5) 2 = 42.25(79-79.5) 2 = 0.25 (93-79.5) 2 = 182.25 The average of these squared deviations (using n-1 instead of n in the bottom) is: 41.72. This is called the variance. That seems huge! And think about the units it would have.

  8. Standard Deviation Variance = 459 / 11 = 41.7272…. We need to take the square root! ≈ 6.46 USC’s Points Scored (average is 79.5). 70 70 75 77 78 79 79 82 82 83 86 93

  9. Standard Deviation standard deviation – (s) approximately “the average distance of the observations from their mean

  10. Standard Deviation • In practice, we use a calculator or computer software to calculate standard deviation! • In Excel the function is STDEV. • One on-line calculator found using Google is: • http://www.calculatorpro.com/standard-deviation-calculator

  11. Standard Deviation • s measures spread about the mean • Use s to describe the spread of a distribution only when you use the mean to describe the center. • s = 0 only when there is no spread. • No spread happens only when all observations have the same value. • As the observations become more spread out about their mean, s gets larger.

  12. Consider the data set 1 2 5 8 14 If we change observation 14 to 114… A – standard deviation will get larger B – the standard deviation will not change C – standard deviation will get smaller

  13. Consider the data sets… Set 1 1 2 5 8 14 Set 2 101 102 105 108 114 A – Set 1 has a larger standard deviation B – Both sets have the same standard deviation C – Set 2 has a larger standard deviation

  14. Choosing Numerical Descriptions • mean vs. median • - The mean is strongly influenced by a few extreme observations. • - The median is not.

  15. If the distribution is - symmetric  mean = median - skewed right  mean > median - skewed left  mean < median

  16. Choosing Numerical Descriptions • standard deviation vs. quartiles • - The standard deviation is made larger by outliers. • - The quartiles are much less sensitive to a few extreme observations. • If the distribution is • - roughly symmetric  std deviation • - skewed  five-number summary • - outliers  five-number summary

  17. Best Described by A – Mean and SD B – Median and SD C – Mean and five no. D – Median and five no.

  18. Important to Remember • Numerical summaries do not disclose the presence of multiple peaks or gaps. • A picture will help you detect skewness and outliers. • Always start with a graph of your data!

  19. Chapter 13 – Normal Distributions • Always plot your data: make a graph, usually a histogram or stemplot. • Look for the overall pattern (shape, center, spread) and for striking deviations such as outliers. • Choose either the five-number summary or the mean and standard deviation to briefly describe center and spread in numbers. • Sometimes the overall pattern of a large number of observations is so regular that we can describe it by a smooth curve.

  20. Density Curves

  21. Density Curves density curve – a curve that has no negative values where the area under the curve is exactly one Choose the density curve so that the proportion of observations in any interval is represented by the area under the curve.

  22. Consider the density curve shown at the right. The total area under the curve is: A – 0.5 B – 1.0 C – 1.5 D – 2.0

  23. The shaded area (the proportion of the density between 0 and 0.5) is approximately: A – 0.05 B – 0.25 C – 0.5 D – 0.75 E – 0.90

  24. Center of a Density Curve

  25. Center of a Density Curve • median  the equal-areas point • mean  the balance point • on a symmetric curve • mean = median • both lie at the center of the curve • on a skewed curve •  mean is pulled away from the median in the direction of the long tail

  26. A – Mean=Red, Median=Blue B – Mean=Blue, Median=Red

  27. Normal Density Curves

  28. Normal Density Curves normal curves – symmetric, bell-shaped curves with these properties: 1. It’s completely described by giving the mean and standard deviation. 2. The mean determines the center of the distribution. 3. The standard deviation determines the location of the “inflection points”

  29. Normal Density Curves

  30. Pictured at the right are two different normal distributions. Which is different between the two distributions? • Mean • Standard deviation • Both

  31. Which is different between the two normal distributions to the right? • Mean • Standard deviation • Both

  32. The 68-95-99.7 Rule 68% of the data falls within 1 std deviation of the mean 95% of the data falls within 2 std deviations of the mean 99.7% of the data falls within 3 std deviations of the mean

  33. The 68-95-99.7 Rule

  34. The starting salaries in a field are approximately normally distributed with a mean of $40,000 and astandard deviation of $5,000. • What can we say about the percent of people who make between $30,000 and $50,000? • Could be any percent • Is approximately 68% • Must be at least 75% • Must be at least 88.9% • E) Is approximately 95%

  35. Apply the 68-95-99.7 Rule • The Health and Nutrition Examination Study of 1976-1980 (HANES) studied the heights of adults (aged 18-24) and found that the heights follow a normal distribution with the following: • Women Mean (): 65.0 inches standard deviation (): 2.5 inches • Men Mean (): 70.0 inches standard deviation (): 2.8 inches

  36. Observations expressed in terms of standard deviations above or below the mean are called Standard Scores. • The standard score is the number of standard deviations above or below the mean at which an observation is located. • If the observation is below the mean, the standard score will be negative. • If the observation is above the mean, the standard score will be positive.

  37. Use standard score • Jennie scored 600 on the verbal part of the SAT. Her friend Gerald took the ACT and scored a 21 on the verbal part. SAT scores are normally distributed with mean 500 and standard deviation 100. ACT scores are normally distributed with mean 18 and standard deviation 6. Assuming that both tests measure the same kind of ability, who has the higher score?

More Related