1 / 54

Chapter 1 Section 1.2

Chapter 1 Section 1.2. Describing Distributions with Numbers. Parameter -. Fixed value about a population Typical unknown. Statistic -. Value calculated from a sample. Measures of Central Tendency. parameter. Mean - the arithmetic average Use m to represent a population mean

brody
Download Presentation

Chapter 1 Section 1.2

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 1Section 1.2 Describing Distributions with Numbers

  2. Parameter - • Fixed value about a population • Typical unknown

  3. Statistic - • Value calculated from a sample

  4. Measures of Central Tendency parameter • Mean - the arithmetic average • Use m to represent a population mean • Use to represent a sample mean statistic This is on the formula sheet, so you do not have to memorize it. • Formula: S is the capital Greek letter sigma – it means to sum the values that follow

  5. Measures of Central Tendency • Median - the middle of the data; 50th percentile • Observations must be in numerical order • Is the middle single value if n is odd • The average of the middle two values if n is even NOTE:ndenotes the sample size

  6. Measures of Central Tendency • Mode – the observation that occurs the most often • Can be more than one mode • If all values occur only once – there is no mode • Not used as often as mean & median

  7. Measures of Central Tendency Range- • The difference between the largest and smallest observations. • This is only one number! Not 3-8 but 5

  8. Suppose we are interested in the number of lollipops that are bought at a certain store. A sample of 5 customers buys the following number of lollipops. Find the median. The median is 4 lollipops! The numbers are in order & n is odd – so find the middle observation. 2 3 4 8 12

  9. Suppose we have sample of 6 customers that buy the following number of lollipops. The median is … The numbers are in order & n is even – so find the middle two observations. The median is 5 lollipops! Now, average these two values. 5 2 3 4 6 8 12

  10. Suppose we have sample of 6 customers that buy the following number of lollipops. Find the mean. To find the mean number of lollipops add the observations and divide by n. 2 3 4 6 8 12

  11. What would happen to the median & mean if the 12 lollipops were 20? 5 The median is . . . 7.17 The mean is . . . What happened? 2 3 4 6 8 20

  12. What would happen to the median & mean if the 20 lollipops were 50? 5 The median is . . . 12.17 The mean is . . . What happened? 2 3 4 6 8 50

  13. Resistant - • Statistics that are not affected by outliers • Is the median resistant? YES • Is the mean resistant? NO

  14. Look at the following data set. Find the mean. 22 23 24 25 25 26 29 30 Now find how each observation deviates from the mean. What is the sum of the deviations from the mean? Will this sum always equal zero? This is the deviation from the mean. YES 0

  15. 27 Look at the following data set. Find the mean & median. Mean = Median = 27 Create a histogram with the data. (use x-scale of 2) Then find the mean and median. Look at the placement of the mean and median in this symmetrical distribution. 21 23 23 24 25 25 26 26 26 27 27 27 27 28 30 30 30 31 32 32

  16. 28.176 Look at the following data set. Find the mean & median. Mean = Median = 25 Create a histogram with the data. (use x-scale of 8)Then find the mean and median. Look at the placement of the mean and median in this right skewed distribution. 22 29 28 22 24 25 28 21 25 23 24 23 26 36 38 62 23

  17. 54.588 Look at the following data set. Find the mean & median. Mean = Median = 58 Create a histogram with the data. Then find the mean and median. Look at the placement of the mean and median in this skewed left distribution. 21 46 54 47 53 60 55 55 60 56 58 58 58 58 62 63 64

  18. Go to java view

  19. Recap: • In a symmetrical distribution, the mean and median are equal. • In a skewed distribution, the mean is pulled in the direction of the skewness. • In a symmetricaldistribution, you should report the mean! • In a skewed distribution, the medianshould be reported as the measure of center!

  20. Quartiles • Arrange the observations in increasing order and locate the median M in the ordered list of observations. • The first quartile Q1 is the median of the 1st half of the observations • The third quartile Q3 is the median of the2nd half of the observations.

  21. 16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73 median Q3 Q1 34 25 41

  22. What if there is odd number? 16 19 24 25 25 33 33 34 34 median When dividing data in half, forget about the middle number

  23. The interquartile range (IQR) • The distance between the first and third quartiles. • IQR = Q3 – Q1 • Always positive

  24. Outlier: • We call an observation an outlier if it falls more than 1.5 x IQR above the third or below the first. • Let’s look back at the same data: 16 19 24 25 25 33 33 34 34 37 37 40 42 46 49 73 Q1=25 Q3=41 IQR=41-25=16 25 - 1.5 x 16 = 1 41 + 1.5 x 16 = 65 Lower Cutoff Upper Cutoff

  25. Since 73 is above the upper cutoff, we will call it an outlier.

  26. Five-number summary • Minimum • Q1 • Median • Q3 • Maximum

  27. If you plot these five numbers on a graph, we have a ……… Boxplot

  28. Advantage boxplots? ease of construction convenient handling of outliers construction is not subjective (like histograms) Used with medium or large size data sets (n > 10) useful for comparative displays

  29. Disadvantage of boxplots does not retain the individual observations should not be used with small data sets (n < 10)

  30. How to construct find five-number summary Min Q1 Med Q3 Max draw box from Q1 to Q3 draw median as center line in the box extend whiskers to min & max

  31. Modified boxplots display outliers fences mark off the outliers whiskers extend to largest (smallest) data value insidethefence ALWAYS use modified boxplots in this class!!!

  32. Modified Boxplot Interquartile Range (IQR) – is the range (length) of the box Q3 - Q1 Q1 – 1.5IQR Q3 + 1.5IQR These are called the fences and should not be seen. Any observation outside this fence is an outlier! Put a dot for the outliers.

  33. Modified Boxplot . . . Draw the “whisker” from the quartiles to the observation that is within the fence!

  34. A report from the U.S. Department of Justice gave the following percent increase in federal prison populations in 20 northeastern & mid-western states in 1999. 5.9 1.3 5.0 5.9 4.5 5.6 4.1 6.3 4.8 6.9 4.5 3.5 7.2 6.4 5.5 5.3 8.0 4.4 7.2 3.2 Create a modified boxplot. Describe the distribution. Use the calculator to create a modified boxplot.

  35. Evidence suggests that a high indoor radon concentration might be linked to the development of childhood cancers. The data that follows is the radon concentration in two different samples of houses. The first sample consisted of houses in which a child was diagnosed with cancer. Houses in the second sample had no recorded cases of childhood cancer. (see data on note page) Create parallel boxplots. Compare the distributions.

  36. Cancer No Cancer 200 100 Radon The median radon concentration for the no cancer group is lower than the median for the cancer group. The range of the cancer group is larger than the range for the no cancer group. Both distributions are skewed right. The cancer group has outliers at 39, 45, 57, and 210. The no cancer group has outliers at 55 and 85.

  37. Assignment 1.2

  38. Variability

  39. Why is the study of variability important? Allows us to distinguish between usual & unusual values In some situations, want more/less variability scores on standardized tests time bombs medicine

  40. Measures of Variability range (max-min) interquartile range (Q3-Q1) deviations variance standard deviation Lower case Greek letter sigma

  41. Suppose that we have these data values: 24 34 26 30 37 16 28 21 35 29 Find the mean. Find the deviations. What is the sum of the deviations from the mean?

  42. 24 34 26 30 37 16 28 21 35 29 Square the deviations: Find the average of the squared deviations:

  43. The average of the deviations squared is called the variance. parameter statistic Population Sample

  44. Calculation of variance of a sample df

  45. Astandard deviation is a measure of the average deviation from the mean.

  46. Calculation of standard deviation

  47. Degrees of Freedom (df) n deviations contain (n - 1) independent pieces of information about variability

  48. Which measure(s) of variability is/are resistant? IQR

  49. Activity (worksheet)

  50. Linear transformation rule • When multiplying or adding a constant to a random variable, the mean and median changes by both. • When multiplying or adding a constant to a random variable, the standard deviation changes only by multiplication. • Formulas:

More Related