490 likes | 626 Views
Statistics. Debra Kuyatt, MBA. Statistics Example: Stage 12 Tour de France, 2003. (Stage 12, 2003 Tour de France data is given on the slides at the end of this presentation.) Originally from: http://tdf.olntv.com. Definitions. Definitions. Population
E N D
Statistics Debra Kuyatt, MBA AMBA 600
Statistics Example: Stage 12Tour de France, 2003 (Stage 12, 2003 Tour de France data is given on the slides at the end of this presentation.) Originally from: http://tdf.olntv.com AMBA 600
Definitions AMBA600
Definitions Population The entire collection of items that you want to describe. This collection of items must share a measurable feature. Using statistics, you want to find the characteristics of this group. Example: All of the riders in stage 12 of the Tour de France in 2003. Sample A representative group from the population from which you make inferences about the entire population. Example: Ten randomly picked riders. AMBA600
Definitions Discrete random variable A number that comes from counting Example: How many stages did each rider complete? Continuous random variable A number that comes from measuring Example: How long did each rider take to complete each stage? AMBA600
Measures of Central Tendency AMBA600
Measures of Central Tendency ——— Arithmetic Mean is shown by “X” ( for the sample mean), or “” which indicates the population mean.The mean is also know as the average and is calculated by added all of the value together and dividing by the number of values. The equation for the arithmetic mean is ———— n X = ∑ Xi/n = (X1 + X2 + … + Xn) / n i=1 “n” is the sample size. The arithmetic mean is the only statistic that you will use that does not need to be ordered to compute. AMBA600
Measures of Central Tendency The arithmetic mean of the time in the 12th stage of the 2003 Tour de France is 1hour, 7 minutes, 6 seconds AMBA 600
Measures of Central Tendency Mode: The value that appears most frequently. There are two modesof the time in the 12th stage of the 2003 Tour de France: 1 hour, 5 minutes, and 45 seconds and 1 hour, 9 minutes, and 38 seconds AMBA600
Measures of Central Tendency Median This statistic points to the middle value in an ordered list. Find the middle value using this equation Median = (n+1) / 2 There are 165 riders, so the Median is 166/2 or 83. The 83rd time value in the 12th stage is 1 hour, 7 minutes, 22 seconds AMBA600
Measures of Central Tendency Quartiles divide the values into 4 equal sections. Like the median, these statistics point to values in the ordered data. The values should be ordered from lowest to highest. The first quartileis the valuewhere 25% of the numbers are smaller than this value. Q1 = (n+1) / 4 Since there are 165 riders in the 12th stage, the first quartile is 166 / 4 or 41.5. Since this position is exactly halfway between 41 and 42, you would take the value that is halfway between these two values. The 41st and 42nd values are identical so the first quartile is 1 hour, 5 minutes, and 45 seconds. AMBA600
Measures of Central Tendency The third quartileis the valuewhere 75% of the numbers are smaller than this value. Q3 = 3 * (n+1) / 4 Since there are 165 riders in the 12th stage, the third quartile is 498 / 4 or 124.5. Since this position is exactly halfway between 124 and 125, you would take the value that is halfway between these two values. The 124th value is 1 hour, 8 minutes, and 38 seconds while the 125th value is 1 hour, 8 minutes, and 39 seconds so the third quartile is 1 hour, 8 minutes, and 38.5 seconds. AMBA600
Measures of Variation AMBA600
Measures of Variation The range is the difference between the smallest and largest value. Again, the list must be ordered. The smallest value in the 12th stage is 58 minutes, 32 seconds, and the largest value is 1 hour, 11 minutes, and 19 seconds, so the range is 12 minutes and 47 seconds. AMBA600
Measures of Variation The interquartile range is the range of the values between the first quartile and the third quartile. The first quartile is 1 hour, 5 minutes, and 45 seconds. The third quartile is 1 hour, 8 minutes, and 38.5 seconds. So, the interquartile range is 1 hour, 8 minutes, and 38.5 seconds - 1 hour, 5 minutes, and 45 seconds. The interquartile range equals 2 minutes and 53.5 seconds. AMBA600
Measures of Variation Variance is the sum of the squared differences around the mean divided by the number of samples minus 1. The symbol designating the sample variance is S2 while the symbol designating the population variance is 2. Here is the variance equation: n —— S2 = ∑ (Xi - X)2 / (n - 1) i = 1 AMBA600
Measures of Variation The Variance for the 12th stage of the 2003 Tour de France is 5 minutes, and 14 seconds The variance shows how the data fluctuates around the mean. Remember that the variance results in squared values. AMBA600
Measures of Variation So, a more useful measure of variation is the standard deviation which is simply the square root of the variance. The standard deviation of the 12th stage of the Tour de France is 2 minutes and 17 seconds The majority of the racers’ times were within 2 minutes and 17 seconds of the mean. 112 riders’ times or 68% of the riders came within one standard deviation of the mean. AMBA600
Measures of Variation Another measure of variation is the Coefficient of Variation. This measure is simply the standard deviation divided by the mean expressed as a percentage. The Coefficient of Variation for the 12th stage of the 2003 Tour de France is 3.40% This lowpercentage indicates that the relative size of the spread around the mean is very low. AMBA600
Probability AMBA600
Probability Probability is expressed as a fraction, decimal, or percentage. The denominator of the fraction is the total number of items in the population under consideration. The numerator of the fraction is the number of the items in the total population that are true - they have the characteristic the researcher want to find. For example, the probability that a randomly selected rider in the 12th stage of the Tour de France is French is 12/55. There are 36 Frenchmen in the 12th stage and 165 riders. Express fractions in reduced terms. AMBA600
Probability The easiest way to picture probability is through tables. Riders in Stage 12 of the 2003 Tour de France AMBA600
Probability Simple Probability, also called marginal probability uses the colored cells. For example, there are 36/165 or 12 out of 55 of the riders in stage 12 of the Tour de France are French. AMBA600
Probability Mathematically, you express simple probability this way… P(A) = Meaning, the probability that A will occur. For example, P(number of Frenchmen) = 12/55. AMBA600
Probability Riders in the 12th stage of the 2003 Tour de France. Joint probability involves two events. For example, there are 1/165 (or 0.61% of the racers) Frenchmen in the top 20 of the 12th stage. AMBA600
Probability You express joint probability as P(A and B) = And read that equation as the probability that both A and B will occur. For example, P(French riders in Stage 12 and French riders that finished in the top 20 riders) = 1/165. AMBA600
Probability The General Addition rule for Probability is P(A or B) = P(A) + P(B) - P(A and B). For example, P(Riders in the top 20 or French riders in stage 12) = 20/165 + 36/165 - 1/165 = 55/165 = 1/3 (reduced.) AMBA600
Normal Distribution AMBA600
Normal Distribution Calculating probabilities of continuous random variables….. You calculate the area under the distribution curve to find the probability that an event will occur. There are an infinite number of curves, and the mathematical equation for calculating the area is cumbersome (see formula 7-4 on page 217.) So, a standardized curve was created. AMBA600
Normal Distribution This picture is taken from http://www.economics.soton.ac.uk/courses/ec106/Standardized%20Normal%20Distribution%20Tables.htm AMBA600
Normal Distribution Properties of the Normal Distribution • It is bell-shaped (and, thus symmetrical) in appearance. • Mean = median = mode • Its “middle spread” is equal to 1.33 standard deviation. • Its associated random variable has an infinite range (continuous not discrete.) AMBA600
Normal Distribution Use the transformational equation and the table of “Z” values to find the probability that “X” will occur. Transformation equation: Z = (X - ) / AMBA600
Normal Distribution Pretend that the probability density function for the 12th stage of the Tour de France is normal (It is actually slightly skewed to the left.) Remember that the mean is 1hour, 7 minutes, 6 seconds And the standard deviation is 2 minutes and 17 seconds AMBA600
Normal Distribution What is the probability that a rider will take less than 1 hour, 9 minutes, and 23 seconds to finish the race? Z = (1 hour, 9 minutes, 23 seconds - 1 hour, 7 minutes, 6 seconds) / 2 minutes, 17 seconds. Z = 1.00 Check Z table for the intersection of row 1.0 with column .00 The intersection number is .3413. This is the area to the right of the mean. We add .5000 to this number to get the total area to the left of Z, so there is an 84.13% chance that a rider will finish in less than 1 hour, 9 minutes, and 23 seconds. AMBA600
Normal Distribution What is the probability that a rider will take more than 1 hour to finish the race? Z = (1 hour - 1 hour, 7 minutes, 6 seconds) / 2 minutes, 17 seconds. Z = -3.11 (the z value is to the left of the mean) The intersection of row 3.1 with column .01 (not all tables extend this far) The intersection number is .49906. This is the area under the curve between the z value (z=-3.11) and the mean (z=0). The probability is 49.906% that a rider will take more than 1 hour but less than I hour, 7 minutes, 6 seconds to finish stage 12. There is also a 50% probability that a rider will complete stage 12 in more than 1 hour, 7 minutes, 6 seconds. So, the probability that the rider will finish in more than 1 hour is the sum of 49.906% and 50%. The probability that a rider will take more than 1 hour to finish is 99.906% AMBA600
Normal Distribution What is the probability that a rider will take more than 1 hour but less than 1 hour, 8 minutes to finish the race? Calculate the probability that the rider will take less than 1 hour, 8 minutes to finish and then subtract the possibility that the rider will take less than 1 hour to finish to find the appropriate area. The probability that a rider will take less than 1 hour to finish is 0.094% Z = (1 hour, 8 minutes - 1 hour, 7 minutes, 6 seconds) / 2 minutes, 17 seconds. Z = 0.39 Look at the Z-table for the intersection of row 0.3 with column .09 The intersection number is .1517, add .500 to this to get .6517 Now subtract .094% from 65.17% to get 65.076%. AMBA600
Tour de France 2003 data AMBA 600
Stage 12 Data AMBA600
Stage 12 Data AMBA600
Stage 12 Data AMBA600
Stage 12 Data AMBA600
Stage 12 Data AMBA600
Stage 12 Data AMBA600
Stage 12 Data AMBA600
Stage 12 Data AMBA600
Stage 12 Data AMBA600
Stage 12 Data AMBA600
Stage 12 Data AMBA600
Stage 12 Data AMBA600