1 / 49

Statistics

Statistics. Debra Kuyatt, MBA. Statistics Example: Stage 12 Tour de France, 2003. (Stage 12, 2003 Tour de France data is given on the slides at the end of this presentation.) Originally from: http://tdf.olntv.com. Definitions. Definitions. Population

roland
Download Presentation

Statistics

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics Debra Kuyatt, MBA AMBA 600

  2. Statistics Example: Stage 12Tour de France, 2003 (Stage 12, 2003 Tour de France data is given on the slides at the end of this presentation.) Originally from: http://tdf.olntv.com AMBA 600

  3. Definitions AMBA600

  4. Definitions Population The entire collection of items that you want to describe. This collection of items must share a measurable feature. Using statistics, you want to find the characteristics of this group. Example: All of the riders in stage 12 of the Tour de France in 2003. Sample A representative group from the population from which you make inferences about the entire population. Example: Ten randomly picked riders. AMBA600

  5. Definitions Discrete random variable A number that comes from counting Example: How many stages did each rider complete? Continuous random variable A number that comes from measuring Example: How long did each rider take to complete each stage? AMBA600

  6. Measures of Central Tendency AMBA600

  7. Measures of Central Tendency ——— Arithmetic Mean is shown by “X” ( for the sample mean), or “” which indicates the population mean.The mean is also know as the average and is calculated by added all of the value together and dividing by the number of values. The equation for the arithmetic mean is ———— n X = ∑ Xi/n = (X1 + X2 + … + Xn) / n i=1 “n” is the sample size. The arithmetic mean is the only statistic that you will use that does not need to be ordered to compute. AMBA600

  8. Measures of Central Tendency The arithmetic mean of the time in the 12th stage of the 2003 Tour de France is 1hour, 7 minutes, 6 seconds AMBA 600

  9. Measures of Central Tendency Mode: The value that appears most frequently. There are two modesof the time in the 12th stage of the 2003 Tour de France: 1 hour, 5 minutes, and 45 seconds and 1 hour, 9 minutes, and 38 seconds AMBA600

  10. Measures of Central Tendency Median This statistic points to the middle value in an ordered list. Find the middle value using this equation Median = (n+1) / 2 There are 165 riders, so the Median is 166/2 or 83. The 83rd time value in the 12th stage is 1 hour, 7 minutes, 22 seconds AMBA600

  11. Measures of Central Tendency Quartiles divide the values into 4 equal sections. Like the median, these statistics point to values in the ordered data. The values should be ordered from lowest to highest. The first quartileis the valuewhere 25% of the numbers are smaller than this value. Q1 = (n+1) / 4 Since there are 165 riders in the 12th stage, the first quartile is 166 / 4 or 41.5. Since this position is exactly halfway between 41 and 42, you would take the value that is halfway between these two values. The 41st and 42nd values are identical so the first quartile is 1 hour, 5 minutes, and 45 seconds. AMBA600

  12. Measures of Central Tendency The third quartileis the valuewhere 75% of the numbers are smaller than this value. Q3 = 3 * (n+1) / 4 Since there are 165 riders in the 12th stage, the third quartile is 498 / 4 or 124.5. Since this position is exactly halfway between 124 and 125, you would take the value that is halfway between these two values. The 124th value is 1 hour, 8 minutes, and 38 seconds while the 125th value is 1 hour, 8 minutes, and 39 seconds so the third quartile is 1 hour, 8 minutes, and 38.5 seconds. AMBA600

  13. Measures of Variation AMBA600

  14. Measures of Variation The range is the difference between the smallest and largest value. Again, the list must be ordered. The smallest value in the 12th stage is 58 minutes, 32 seconds, and the largest value is 1 hour, 11 minutes, and 19 seconds, so the range is 12 minutes and 47 seconds. AMBA600

  15. Measures of Variation The interquartile range is the range of the values between the first quartile and the third quartile. The first quartile is 1 hour, 5 minutes, and 45 seconds. The third quartile is 1 hour, 8 minutes, and 38.5 seconds. So, the interquartile range is 1 hour, 8 minutes, and 38.5 seconds - 1 hour, 5 minutes, and 45 seconds. The interquartile range equals 2 minutes and 53.5 seconds. AMBA600

  16. Measures of Variation Variance is the sum of the squared differences around the mean divided by the number of samples minus 1. The symbol designating the sample variance is S2 while the symbol designating the population variance is  2. Here is the variance equation: n —— S2 = ∑ (Xi - X)2 / (n - 1) i = 1 AMBA600

  17. Measures of Variation The Variance for the 12th stage of the 2003 Tour de France is 5 minutes, and 14 seconds The variance shows how the data fluctuates around the mean. Remember that the variance results in squared values. AMBA600

  18. Measures of Variation So, a more useful measure of variation is the standard deviation which is simply the square root of the variance. The standard deviation of the 12th stage of the Tour de France is 2 minutes and 17 seconds The majority of the racers’ times were within 2 minutes and 17 seconds of the mean. 112 riders’ times or 68% of the riders came within one standard deviation of the mean. AMBA600

  19. Measures of Variation Another measure of variation is the Coefficient of Variation. This measure is simply the standard deviation divided by the mean expressed as a percentage. The Coefficient of Variation for the 12th stage of the 2003 Tour de France is 3.40% This lowpercentage indicates that the relative size of the spread around the mean is very low. AMBA600

  20. Probability AMBA600

  21. Probability Probability is expressed as a fraction, decimal, or percentage. The denominator of the fraction is the total number of items in the population under consideration. The numerator of the fraction is the number of the items in the total population that are true - they have the characteristic the researcher want to find. For example, the probability that a randomly selected rider in the 12th stage of the Tour de France is French is 12/55. There are 36 Frenchmen in the 12th stage and 165 riders. Express fractions in reduced terms. AMBA600

  22. Probability The easiest way to picture probability is through tables. Riders in Stage 12 of the 2003 Tour de France AMBA600

  23. Probability Simple Probability, also called marginal probability uses the colored cells. For example, there are 36/165 or 12 out of 55 of the riders in stage 12 of the Tour de France are French. AMBA600

  24. Probability Mathematically, you express simple probability this way… P(A) = Meaning, the probability that A will occur. For example, P(number of Frenchmen) = 12/55. AMBA600

  25. Probability Riders in the 12th stage of the 2003 Tour de France. Joint probability involves two events. For example, there are 1/165 (or 0.61% of the racers) Frenchmen in the top 20 of the 12th stage. AMBA600

  26. Probability You express joint probability as P(A and B) = And read that equation as the probability that both A and B will occur. For example, P(French riders in Stage 12 and French riders that finished in the top 20 riders) = 1/165. AMBA600

  27. Probability The General Addition rule for Probability is P(A or B) = P(A) + P(B) - P(A and B). For example, P(Riders in the top 20 or French riders in stage 12) = 20/165 + 36/165 - 1/165 = 55/165 = 1/3 (reduced.) AMBA600

  28. Normal Distribution AMBA600

  29. Normal Distribution Calculating probabilities of continuous random variables….. You calculate the area under the distribution curve to find the probability that an event will occur. There are an infinite number of curves, and the mathematical equation for calculating the area is cumbersome (see formula 7-4 on page 217.) So, a standardized curve was created. AMBA600

  30. Normal Distribution This picture is taken from http://www.economics.soton.ac.uk/courses/ec106/Standardized%20Normal%20Distribution%20Tables.htm AMBA600

  31. Normal Distribution Properties of the Normal Distribution • It is bell-shaped (and, thus symmetrical) in appearance. • Mean = median = mode • Its “middle spread” is equal to 1.33 standard deviation. • Its associated random variable has an infinite range (continuous not discrete.) AMBA600

  32. Normal Distribution Use the transformational equation and the table of “Z” values to find the probability that “X” will occur. Transformation equation: Z = (X - ) /  AMBA600

  33. Normal Distribution Pretend that the probability density function for the 12th stage of the Tour de France is normal (It is actually slightly skewed to the left.) Remember that the mean is 1hour, 7 minutes, 6 seconds And the standard deviation is 2 minutes and 17 seconds AMBA600

  34. Normal Distribution What is the probability that a rider will take less than 1 hour, 9 minutes, and 23 seconds to finish the race? Z = (1 hour, 9 minutes, 23 seconds - 1 hour, 7 minutes, 6 seconds) / 2 minutes, 17 seconds. Z = 1.00 Check Z table for the intersection of row 1.0 with column .00 The intersection number is .3413. This is the area to the right of the mean. We add .5000 to this number to get the total area to the left of Z, so there is an 84.13% chance that a rider will finish in less than 1 hour, 9 minutes, and 23 seconds. AMBA600

  35. Normal Distribution What is the probability that a rider will take more than 1 hour to finish the race? Z = (1 hour - 1 hour, 7 minutes, 6 seconds) / 2 minutes, 17 seconds. Z = -3.11 (the z value is to the left of the mean) The intersection of row 3.1 with column .01 (not all tables extend this far) The intersection number is .49906. This is the area under the curve between the z value (z=-3.11) and the mean (z=0). The probability is 49.906% that a rider will take more than 1 hour but less than I hour, 7 minutes, 6 seconds to finish stage 12. There is also a 50% probability that a rider will complete stage 12 in more than 1 hour, 7 minutes, 6 seconds. So, the probability that the rider will finish in more than 1 hour is the sum of 49.906% and 50%. The probability that a rider will take more than 1 hour to finish is 99.906% AMBA600

  36. Normal Distribution What is the probability that a rider will take more than 1 hour but less than 1 hour, 8 minutes to finish the race? Calculate the probability that the rider will take less than 1 hour, 8 minutes to finish and then subtract the possibility that the rider will take less than 1 hour to finish to find the appropriate area. The probability that a rider will take less than 1 hour to finish is 0.094% Z = (1 hour, 8 minutes - 1 hour, 7 minutes, 6 seconds) / 2 minutes, 17 seconds. Z = 0.39 Look at the Z-table for the intersection of row 0.3 with column .09 The intersection number is .1517, add .500 to this to get .6517 Now subtract .094% from 65.17% to get 65.076%. AMBA600

  37. Tour de France 2003 data AMBA 600

  38. Stage 12 Data AMBA600

  39. Stage 12 Data AMBA600

  40. Stage 12 Data AMBA600

  41. Stage 12 Data AMBA600

  42. Stage 12 Data AMBA600

  43. Stage 12 Data AMBA600

  44. Stage 12 Data AMBA600

  45. Stage 12 Data AMBA600

  46. Stage 12 Data AMBA600

  47. Stage 12 Data AMBA600

  48. Stage 12 Data AMBA600

  49. Stage 12 Data AMBA600

More Related