1 / 61

SLIDES PREPARED By Lloyd R. Jaisingh Ph.D. Morehead State University Morehead KY

4-2. Chapter 4. Data Description

erv
Download Presentation

SLIDES PREPARED By Lloyd R. Jaisingh Ph.D. Morehead State University Morehead KY

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


    1. 4-1 SLIDES PREPARED By Lloyd R. Jaisingh Ph.D. Morehead State University Morehead KY

    2. 4-2 Chapter 4 Data Description – Numerical Measures of Position for Ungrouped Univariate Data

    3. 4-3 Outline Do I Need to Read This Chapter? 4-1 The z-Score or Standard Score 4-2 Percentiles It’s a Wrap

    4. 4-4 Objectives Introduction of some basic statistical measurements of position. Introduction of some graphical displays to explain these measures of position.

    5. 4-5 Introduction A measure of location or position for a collection of data values is a number that is meant to convey the idea of the relative position of a data value in the data set. The most commonly used measures of location for sample data are the: z-score, and percentiles.

    6. 4-6 4-1 The z-Score Explanation of the term – z-score: The z-score for a sample value in a data set is obtained by subtracting the mean of the data set from the value and dividing the result by the standard deviation of the data set. NOTE: When computing the value of the z-score, the data values can be population values or sample values. Hence we can compute either a population z-score or a sample z-score.

    7. 4-7 4-1 The z-Score The Sample z-score for a value x is given by the following formula: Where is the sample mean and s is the sample standard deviation.

    8. 4-8 4-1 The z-Score The Population z-score for a value x is given by the following formula: Where ? is the population mean and ? is the population standard deviation.

    9. 4-9 Quick Tip: The z-score is the number of standard deviations the data value falls above (positive z-score) or below (negative z-score) the mean for the data set.

    10. 4-10 Quick Tip: The z-score is affected by an outlying value in the data set, since the outlier (very small or very large value relative to the size of the other values in the data set) directly affects the value of the mean and the standard deviation.

    11. 4-11 The z-Score -- Example Example: What is the z-score for the value of 14 in the following sample values? 3 8 6 14 4 12 7 10

    12. 4-12 The z-score -- Example (Continued) Solution: Thus, the data value of 14 is 1.57 standard deviations above the mean of 8, since the z-score is positive.

    13. 4-13 The z-Score – Why do we use the z-score as a measure of relative position?

    14. 4-14 The z-score Observe that the distance between the mean of 8 and the value of 14 is 1.57?s = 5.99 ? 6. Observe that if we add the mean of 8 to this value of 6, we will get 8 + 6 = 14, the data value. Thus, this shows that the value of 14 is 1.57 standard deviations above the mean value of 8.

    15. 4-15 The z-score

    16. 4-16 The z-Score -- Example Example: What is the z-score for the value of 95 in the following sample values? 96 114 100 97 101 102 99 95 90

    17. 4-17 The z-Score -- Example (Continued) Example: First compute the sample mean and sample standard deviation. These values are respectively 99.3333 6.5955. Verify. Thus, z-score = (95 – 99.3333)/6.5955 = -0.6570 ? -0.66. Thus, the data value of 95 is located 0.66 standard deviation below the mean value of 99.3333, since the z-score is negative.

    18. 4-18 4-2 Percentiles Explanation of the term – percentiles: Percentiles are numerical values that divide an ordered data set into 100 groups of values with at most 1% of the data values in each group. When we discuss percentiles, we generally present the discussion through the kth percentile. Let the kth percentile be denoted by Pk.

    19. 4-19 4-2 Percentiles Explanation of the term – kth percentile: the kth percentile for an ordered array of numerical data is a numerical value Pk (say) such that at most k% of the data values are smaller than Pk, and at most (100 – k)% of the data values are larger than Pk. The idea of the kth percentile is illustrated on the next slide.

    20. 4-20 The kth Percentile

    21. 4-21 Quick Tip: In order for a percentile to be determined, the data set first must be ordered from the smallest to the largest value. There are 99 percentiles in a data set.

    22. 4-22 Display of the 99th Percentile

    23. 4-23 Percentile Corresponding to a Given Data Value The percentile corresponding to a given data value, say x, in a set is obtained by using the following formula.

    24. 4-24 Example: The shoe sizes, in whole numbers, for a sample of 12 male students in a statistics class were as follows: 13, 11, 10, 13, 11, 10, 8, 12, 9, 9, 8, and 9. What is the percentile rank for a shoe size of 12? Percentile Corresponding to a Given Data Value

    25. 4-25 Solution: First, we need to arrange the values from smallest to largest. The ordered array is given below: 8, 8, 9, 9, 9, 10, 10, 11, 11, 12, 13, 13. Observe that the number of values below the value of 12 is 9. Percentile Corresponding to a Given Data Value

    26. 4-26 Solution (continued): The total number of values in the data set is 12. Thus, using the formula, the corresponding percentile is: Percentile Corresponding to a Given Data Value

    27. 4-27 Example: In the previous example, what is the percentile rank for a shoe size of 10 ? Recall, the ordered array was: 8, 8, 9, 9, 9, 10, 10, 11, 11, 12, 13, 13. Observe that the number of values below the value of 10 is 5. Percentile Corresponding to a Given Data Value

    28. 4-28 Solution (continued): Recall, the total number of values in the data set was 12. Thus, using the formula, the corresponding percentile is: Percentile Corresponding to a Given Data Value

    29. 4-29 Assume that we want to determine what data value falls at some general percentile Pk. The following steps will enable you to find a general percentile Pk for a data set. Step 1: Order the data set from smallest to largest. Step 2: Compute the position c of the percentile. To compute the value of c, use the following formula: Procedure for Finding a Data Value for a Given Percentile

    30. 4-30 Procedure for Finding a Data Value for a Given Percentile

    31. 4-31 Procedure for Finding a Data Value for a Given Percentile

    32. 4-32 Example: The data given below represents the 19 countries with the largest numbers of total Olympic medals – excluding the United States, which had 101 medals – for the 1996 Atlanta games. Find the 65th percentile for the data set. 63, 65, 50, 37, 35, 41, 25, 23, 27, 21, 17, 17, 20, 19, 22, 15, 15, 15, 15. Percentile Corresponding to a Given Data Value

    33. 4-33 Solution: First, we need to arrange the data set in order. The ordered set is: . 15, 15, 15, 15, 17, 17, 19, 20, 21, 22, 23, 25, 27, 35, 37, 41, 50, 63, 65. Next, compute the position of the percentile. Here n = 19, k = 65. Thus, c = (19 ? 65)/100 = 12.35. We need to round up to a value 13. Percentile Corresponding to a Given Data Value

    34. 4-34 Solution (continued): Thus, the 13th value in the ordered data set will correspond to the 65th percentile. That is P65 = 27. Question: Why does a percentile measure relative position? Percentile Corresponding to a Given Data Value

    35. 4-35 Question: Why does a percentile measure Relative Position?

    36. 4-36 Question: Why does a percentile measure Relative Position?

    37. 4-37 Example: Find the 25th percentile for the following data set: 6, 12, 18, 12, 13, 8, 13, 11, 10, 16, 13, 11, 10, 10, 2, 14. Solution: First, we need to arrange the data set in order. The ordered set is: 2, 6, 8, 10, 10, 10, 11, 11, 12, 12, 13, 13, 13, 14, 16, 18. Percentile Corresponding to a Given Data Value

    38. 4-38 Solution (continued): Next, compute the position of the percentile. Here n = 16, k = 25. Thus, c = (16 ? 25)/100 = 4.0. Thus, the 25th percentile will be the average of the values located at the 4th and 5th positions in the ordered set. Thus, P25 = (10 + 100/2 =10. Percentile Corresponding to a Given Data Value

    39. 4-39 Deciles and quartiles are special percentiles. Deciles divide an ordered data set into 10 equal parts. Quartiles divide the ordered data set into 4 equal parts. We usually denote the deciles by D1, D2, D3, … , D9. We usually denote the quartiles by Q1, Q2, and Q3. Special Percentiles – Deciles and Quartiles

    40. 4-40 Deciles

    41. 4-41 Quartiles

    42. 4-42 Quick Tip: There are 9 deciles and 3 quartiles. Q1 = first quartile = P25 Q2 = second quartile = P50 Q3 = third quartile = P75 D1 = first decile = P10 D2 = second decile = P20 . . . D9 = ninth decile = P90

    43. 4-43 Quick Tip: P50 = D5 = Q2 = median i.e. the 50th percentile, the 5th decile, and the 2nd quartile, and the median are all equal to one another. Finding deciles and quartiles are equivalent equivalent to finding the equivalent percentiles.

    44. 4-44 OUTLIERS

    45. 4-45 Procedure to Check for OUTLIERS

    46. 4-46 Procedure to Check for OUTLIERS

    47. 4-47 Procedure to Check for OUTLIERS

    48. 4-48 Procedure to Check for OUTLIERS

    49. 4-49 Example: The data below represent the 20 countries with the largest number of total Olympic medals, including the United States, which had 101 medals for the 1996 Atlanta games. Determine whether the number of medals won by the United States is an outlier relative to the numbers for the other countries. The data is given on the next slide.

    50. 4-50 Example (continued): Data values – 63, 65, 50, 37, 35, 41, 25, 23, 27, 21, 17, 17, 20, 19, 22, 15, 15, 15, 15, 101. Solution: First, we need to arrange the data set in order. The ordered set is – 15 15 15 15 17 17 19 20 21 22 23 25 27 35 37 41 50 63 65 101. Next we need to determine the first and third quartiles. Verify that Q1 = P25 = 17 and Q3 = P75 = 39.

    51. 4-51 Example (continued): Thus the IQR = 39 – 17 = 22. Now, Q1 – 1.5?IQR = 17 – (1.5?22) = -16. and, Q3 + 1.5?IQR = 39 + (1.5?22) = 72. Since, 101 > 72, the value of 101 is an outlier relative to the rest of the values in the data set (based on the procedure presented here). That is, the number of medals won by the United States is an outlier relative to the numbers won by the other 19 countries for the 1996 Atlanta Olympic Games.

    52. 4-52 Pictorial Representation for the OUTLIER of the Number of Olympic Medals Won by the United States in 1996 Atlanta Games.

    53. 4-53 BOX PLOTS Explanation of the term – box plot: A box plot is a graphical display that involves a five-number summary of a distribution of values, consisting of the minimum value, the first quartile, the median, the third quartile, and the maximum value.

    54. 4-54 BOX PLOTS A horizontal box-plot is constructed by drawing a box between the quartiles Q1 and Q3. Horizontal lines are then drawn from the middle of the sides of the box to the minimum and maximum values.

    55. 4-55 BOX PLOTS These horizontal lines are called whiskers. A vertical line inside the box marks the median. Outliers are usually indicated by a dot or an asterisk.

    56. 4-56 Example of a Box Plot for the Olympic (1996) Medal Count Data

    57. 4-57 Information That Can Be Obtained From a Box Plot

    58. 4-58 Information That Can Be Obtained From a Box Plot – Looking at the Median

    59. 4-59 Information That Can Be Obtained From a Box Plot – Looking at the Length of the Whiskers

    60. 4-60 Box Plot Displaying Positive Skewness

    61. 4-61 Box Plot Displaying a Symmetrical Distribution

    62. 4-62 Box Plot Displaying a Negative Skewness

More Related