1 / 49

MM150 Unit 9 Seminar Statistics II

MM150 Unit 9 Seminar Statistics II. 9.1 Measures of Central Tendency. 2. Averages. Several different types of averages can be calculated for a given set of data. All averages, in general, are called measures of central tendency .

tasha-vang
Download Presentation

MM150 Unit 9 Seminar Statistics II

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. MM150Unit 9 SeminarStatistics II

  2. 9.1Measures of Central Tendency 2

  3. Averages • Several different types of averages can be calculated for a given set of data. • All averages, in general, are called measures of central tendency. • The three most common measures of central tendency are mean, median,mode, and midrange.

  4. Mean – To find the arithmetic mean, or mean, sum the data scores and then divide by the number of data scores. Example: Find the mean of the data scores 5, 6, 2, 9, 8 5 + 6 + 2 + 9 + 8 = 30 = 6 5 5 Median– To find the median, put the data scores in ascending or descending order and then find the middle data score. If there are an even number of data scores, after ranking the scores, find the mean of the middle two. Example: Find the median of the data scores 5, 7, 2, 9, 8 Put the scores in ascending order: 2, 5, 7, 8, 9 Example: Find the median of the data scores 4, 7, 2, 9 Put the scores in ascending order: 2, 4, 7, 9 Find the mean of 4 and 7: (4 + 7)/2 = 11/2 = 5.5

  5. Mode – The mode is the data score that occurs most frequently. • Example: Find the mode of the data scores 6, 4, 9, 8, 6, 5 • It may help to put the scores in ascending order: 4, 5, 6, 6, 8, 9 • You can see that the data score 6 occurs most often. • You can have data sets that don’t have a mode (each score occurs once) and you can have data sets that are bimodal – which means they have 2 modes. • Midrange – The midrange is the value halfway between the greatest and least data score. To find it, take the mean of the greatest and least data score. • Example: Find the midrange of the data scores 6, 4, 9, 8, 6, 5 • It may help to put the scores is ascending order: 4, 5, 6, 6, 8, 9 • The midrange is (4 + 9)/2 = 13/2 = 6.5 • *Please read on page 362 of your text when each is the ‘better’ average.

  6. EVERYONE: page 366 #21

  7. EVERYONE: page 366 #21 1, 2, 2, 3, 3, 3, 4, 5, 5, 7 Mean: 1 + 2 + 2 + 3 + 3 + 3 + 4 + 5 + 5 + 7 = 3.5 10 Median: 1, 2, 2, 3, 3, 3, 4, 5, 5, 7 3 weeks (3 + 3)/2 = 3 Mode: 1, 2, 2, 3, 3, 3, 4, 5, 5, 7 3 weeks Midrange: 1 + 7 = 4 weeks 2

  8. Mean Example Todd is taking a math class where his end of term grade is based on 4 exams, each having the same number of points and weighted the same. He scored 98, 82, and 87 on the first three exams in his class. What does he need to score on the 4th exam to get at least a 90% for the final grade? The instructor uses mean as the average. 267 + x = 360 x = 93 Todd must score a 93 or higher on the test. 98 + 82 + 87 + x = 90 4 267 + x = 90 4

  9. Measures of Position • There are 2 measures of position, percentiles and quartiles. • They are used to make comparisons for a large amount of data. • Percentile – There are 99 percentiles that divide the data up into 100 equal parts. • When you are reported to be in the 98th percentile, this does not mean your score is a 98%. This means that you outperformed about 98% of the population.

  10. 25% 25% 25% 25% Q1 Q2 Q3 (minimum) (maximum) (median) Measures of Position (con’t) Quartile – Quartiles divide data into 4 equal parts, called quartiles. The first quartile is at 25%, the second at 50%, and the third at 75%. Q1, Q2, Q3 dividerankedscores into four equal parts

  11. Example Determine Q1, Q2, and Q3 of the data below: 15, 10, 19, 18, 11, 15, 13, 18, 19, 17, 19, 15, 16, 13, 15, 16, 13, 12, 14 First put the data in ascending order 10, 11, 12, 13, 13, 13, 14, 15, 15, 15, 15, 16, 16, 17, 18, 18, 19, 19, 19 Find the median, 15 is Q2 Find the median of the lower half for Q1, 13 Find the median of the upper half for Q3, 18

  12. 9.2Measures of Dispersion 12

  13. 2 Measures of Dispersion Range – The range is the difference between the greatest and least data score. Example: Find the range of the data scores 55, 59, 51, 64, 60 Put the data scores in ascending order 51, 55, 59, 60, 64 The range is 64 – 51 = 13 Standard deviation– The standard deviation tells us how much the data differ from the mean. See the next PowerPoint slide for an example of standard deviation. Range = (maximum value) – (minimum value)

  14. Standard Deviation, cont’d

  15. Standard Deviation Example • Find the standard deviation of 11, 15, 18, 9, 12 • ) Find the mean of the data scores. 11+15+18+9+12 = 65 = 13 • 5 5 • 2.) Make a chart with 3 columns 3. 4. 5. 6.) 50 7.) Divide 50 by n – 1, where n is the number of data scores. So divide 50 by 4, which is 12.5 8.) Find the square root of the number found in step 7. √12.5 ≈ 3.5355

  16. Everyone Example Calculate the standard deviation for the data set: 15, 16, 20, 13

  17. EVERYONE solution • Find the mean. 15 + 16 + 20 + 13 = 64 = 16 • 4 4 • 2. Make a 3-column table. 3. 4. 5. 6. 26 • 26/3 = 8.667 • 8. √8.667 ≈ 2.944

  18. 9.3The Normal Curve 18

  19. Types of Distributions • Rectangular Distribution • All observed values occur with the same frequency. • J-shaped distribution • The frequency is either constantly increasing or constantly decreasing.

  20. Types of Distributions (con’t) • Bimodal • Two nonadjacent values occur more frequently than any other values in the set of data. • Skewed to left or right • Has more of a “tail” on one side than the other. • The greatest frequency appears on the left or the right of the curve.

  21. Skewed Distributions • In figure (a) the greatest frequency appears on the left so the mode would be on the left side of the curve. • All the data would be used to determine the mean. The values on the right side of the curve in (a) would increase the value of the mean. So the value of the mean would be farther to the right of the mode. • The median would be between the mean and the mode.

  22. Normal Distributions PROPERTIES • The graph of a normal distribution is called a normal curve. 2. The normal curve is bell-shaped and symmetric about the mean. 3. The mean, median, and mode of a normal distribution all have the same value and all occur a the center of the distribution.

  23. The Empirical Rule • In any normal distribution • Approximately 68% of all the data lies within one standard deviation of the mean (in both directions). • 2. Approximately 95% of all the data lies within two • standard deviations of the mean (in both • directions). • Approximately 99.7% of all the data lies within • three standard deviations of the mean (in both directions).

  24. The Empirical Rule

  25. Empirical Rule Example • Suppose that the weights of newborn infants are normally distributed. If approximately 2000 infants are born at Sarasota Memorial Hospital each year, determine the approximate number of infants who are expected to weigh: • within one standard deviation of the mean. • within two standard deviations of the mean. • Solution: • a) By the empirical rule, 68% of the infants weigh within one standard deviation of the mean. • (.68)(2000) = 1360 infants are expected to weigh within one • standard deviation of the mean. • a) By the empirical rule, 95% of the infants weigh within two standard deviations of the mean. • (.95)(2000) = 1900 infants are expected to weigh within two • standard deviations of the mean.

  26. z-Scores • z-scores determine how far, in terms of standard deviations, a given score is from the mean of the distribution.

  27. Example: z-scores • A normal distribution has a mean of 50 and a standard deviation of 5. Find z-scores for the following values. • a) 55 b) 60 c) 43 • a) A score of 55 is one standard deviation above the mean.

  28. Example: z-scores continued • b) A score of 60 is 2 standard deviations above the mean. • c) A score of 43 is 1.4 standard deviations below the mean.

  29. 9.4Linear Correlation and Regression 29

  30. Linear Correlation • Linear correlation is used to determine whether there is a relationship between two quantities and, if so, how strong the relationship is. • The linear correlation coefficient, r, is a unitless measure that describes the strength of the linear relationship between two variables. • If the value is positive, as one variable increases, the other increases. • If the value is negative, as one variable increases, the other decreases. • The variable, r, will always be a value between –1 and 1 inclusive.

  31. Scatter Diagrams • A visual aid used with correlation is the scatter diagram, a plot of points (bivariate data). • The independent variable, x, generally is a quantity that can be controlled. • The dependent variable, y, is the other variable. • The value of r is a measure of how far a set of points varies from a straight line. • The greater the spread, the weaker the correlation and the closer the r value is to 0. • The smaller the spread, the stronger the correlation and the closer the r value is to 1.

  32. Correlation

  33. Linear Correlation Coefficient • The formula to calculate the correlation coefficient (r) is as follows:

  34. Applicant Words per Minute Mistakes Ellen 24 8 George 67 11 Phillip 53 12 Kendra 41 10 Nancy 34 9 Example: Words Per Minute versus Mistakes There are five applicants applying for a job as a medical transcriptionist. The following shows the results of the applicants when asked to type a chart. Determine the correlation coefficient between the words per minute typed and the number of mistakes.

  35. Solution • We will call the words typed per minute, x, and the mistakes, y. • List the values of x and y and calculate the necessary sums. WPM Mistakes x y x2 y2 xy 24 8 576 64 192 67 11 4489 121 737 53 12 2809 144 636 41 10 1681 100 410 34 9 1156 81 306 Sx2 =10,711 Sxy = 2,281 Sx = 219 Sy = 50 Sy2 = 510

  36. Solution continued • The n in the formula represents the number of pieces of data. Here n = 5.

  37. Solution continued

  38. Solution continued • Since 0.86 is fairly close to 1, there is a fairly strong positive correlation. • This result implies that the more words typed per minute, the more mistakes made.

  39. Linear Regression • Linear regression is the process of determining the linear relationship between two variables. • The line of best fit (regression line or the least squares line) is the line such that the sum of the squares of the vertical distances from the line to the data points (on a scatter diagram) is a minimum.

  40. The Line of Best Fit • Equation:

  41. Example • Use the data in the previous example to find the equation of the line that relates the number of words per minute and the number of mistakes made while typing a chart. • Graph the equation of the line of best fit on a scatter diagram that illustrates the set of bivariate points.

  42. Solution • From the previous results, we know that

  43. Solution • Now we find the y-intercept, b. Therefore the line of best fit isy = 0.081x + 6.452

  44. x y 10 7.262 20 8.072 30 8.882 Solution continued • To graph y = 0.081x + 6.452, plot at least two points and draw the graph.

  45. Solution continued

  46. Example: page 407 #24 Draw a scatter diagram Determine the value of r, rounded to the nearest thousandth Determine whether a correlation exists at … = 0.05 Determine whether a correlation exists at …. = 0.01

  47. Example: continued a) The first thing to do is plot the points. Here we have (6,13), (8,11), (11,9), (14,10) and (17,7).

  48. Example: continued

  49. Example: page 407 #32 Determine the equation of the line of best fit from the data in the exercise indicated. Round both the slope and y intercept to the nearest hundredth.

More Related