520 likes | 669 Views
Chapter 5. Statistical Reasoning. 5.1 – exploring data. Chapter 5. Data set. The payrolls for three small companies are shown in the table. Figures include year-end bonuses. Each company has 15 employees. Sanela wonders if the companies have similar “average” salaries.
E N D
Chapter 5 Statistical Reasoning
5.1 – exploring data Chapter 5
Data set The payrolls for three small companies are shown in the table. Figures include year-end bonuses. Each company has 15 employees. Sanela wonders if the companies have similar “average” salaries. • What is the range of salaries for each company? • 245 000 – 27 300 (Media Focus Advertising) • 217 700 • 362 000 – 52 300 (Computer Rescue) • 309 700 • 97 500 – 55 250 (Auto Value Sales) • 42 250 Are there any outliersfor any of the companies?
Measures of central tendency What are the measures of central tendency (mean, median, and mode)? • Mean: • Company A: • 27 300 + 28 500 + 33 400 + 36 200 + 39 500 + 42 500 + 47 400 + 57 500 + 61 000 + 61 000 + 65 000 + 71 000 + 86 000 + 162 000 + 245 000 = 1 063 300 • 1 063 300/15 = 70 887 • Company B: • 52 300 + 52 700 + 53 100 + 53 800 + 55 200 + 55 900 + 56 500 + 59 000 + 59 200 + 62 500 + 63 000 + 96 500 + 96 500 + 112 000 + 362 000 = 1 290 200 • 1 290 200/15 = 86 013 • Company C: • 55 250 + 55 250 + 56 900 + 57 300 + 57 900 + 58 200 + 58 300 + 58 900 + 61 500 + 62 300 + 62 800 + 62 800 + 64 400 + 66 900 + 97 500 = 936 200 • 936 200/15 = 62 413
Median and mode What are the measures of central tendency (mean, median, and mode)? Median: We’re looking for the term right in the middle Company A: 57 500 Company B: 59 000 Company C: 58 900 Mode: The number that appears the most times Company A: 61 000 Company B: 96 500 Company C: 55 250 Mean: Company A: 70 887 Company B: 86 013 Company C: 62 413
Paulo needs a new battery for his car. He is trying to decide between two different brands. Both brands are the same price. He obtains data for the lifespan, in years, of 30 batteries of each brand, as shown below. • What’s the range? • Brand X: 3.1 – 8.2 • Brand Y: 4.5 – 7.0 Which brand would you choose? Range is a measure of dispersion. Dispersion is a measure that varies by the spread among the data in a set; dispersion has a value of zero if all the data in a set is identical, and it increases in value as the data becomes more spread out. • What’s the mode? • Brand X: 5.7 • Brand Y: 5.6, 5.7, 5.8, 5.9, 6.0, 6.8 Is the mode useful in this comparison?
handout Today we will be examining sets of data. Answer all of the questions in the handout to your fullest abilities, because this is a summative assessment.
5.2 – Frequency Tables, histograms, and frequency polygons Chapter 5
example The following data represents the flow rates of the Red River from 1950 to 1999, as recorded at the Redwood Bridge in Winnipeg, Manitoba.
Example (continued) Determine the water flow rate that is associated with serious flooding by creating a frequency distribution. Frequency distribution is a set of intervals (table or graph), usually of equal width, into which data is organized; each interval is associated with a frequency that indicates the number of measurements in this interval. • What’s the lowest water flow rate? • 159 • What’s the highest? • 4587 The range is 4587 – 159 = 4428 We can divide this into 10 equal parts: 4428/10 = 442.8 So, let’s let the interval width be 500.
Example (continued) Instead, you could have created a histogram. A histogram is the graph of a frequency distribution, in which equal intervals of values are marked on a horizontal axis and the frequencies associated with these intervals are indicated by the areas of the rectangles drawn for these intervals. We know that there were nine floods. Based on the histogram, the flow rate was greater than 1950 m3/s in only 12 years. These 12 years must include the flood years. We could predict that floods occur when the flow rate is greater than 1950 m3/s.
Example (continued) Or, we could create a frequency polygon. A frequency polygon is the graph of a frequency distribution, produced by joining the midpoints of the intervals using straight lines. Most of the data is in the first four intervals, and the most common water flow is between 1500 and 2000 m3/s. After this, the frequencies drop off dramatically. There were six years where the flow rate was around 2625, 3075 or 4425 m3/s. These must have been flood years. The other three floods should have occurred when the flow rate was around 2175 m3/s. Assuming that the flow rate in three of those years was 2175 m3/s or greater, floods should occur when the flow rate is 2175 m3/s or greater.
example The magnitude of an earthquake is measured using the Richter scale. Examine the histograms for the frequency of earthquake magnitudes in Canada from 2005 to 2009. Which of these years could have had the most damage from earthquakes?
Example (continued) We could use a frequency table: 2008 had the greatest number of earthquakes with the potential for minor damage. Four of the years had three earthquakes with magnitudes from 5.0 to 5.9, while 2008 had five earthquakes with these magnitudes. Therefore, 2008 could have had the most damage from earthquakes.
Example (continued) We could have made a frequency polygon. Both 2008 and 2009 had the strongest earthquakes, registering from 6.0 to 6.9 on the Richter scale. The number of earthquakes in the three highest intervals was greater into 2008 than in 2009, so 2008 could have had the most damage from earthquakes.
Pg. 249-253, # 1, 3, 5, 6, 8, 10, 11, 12 Independent Practice
5.3 – standard deviation Chapter 5
Using excel to find mean and standard deviation We will go over how to find the mean and standard deviation using Excel. It’s important that you pay close attention as I go through the steps, since you will need to use this information to do complete today’s summative assessment.
Handout Today we will be finding the mean and the standard deviation of a set of data. Answer all of the questions in the handout to your fullest abilities, because this is a summative assessment.
Standard deviation Standard deviation is a measure of dispersion, of how the data in a set is distributed. The mean, , can be expressed as: where x is each term in a set of data, and n is the number of terms in the data. Standard deviation, σ, can be expressed as: What does Σ mean?
example Brendan works part-time in the canteen at his local community centre. One of his tasts is to unload delivery trucks. He wondered about the accuracy of the mass measurements given on two cartons that contained sunflower seeds. He decided to measure the masses of the 20 bags in the two cartons. One carton contained 227 g bags, and the other carton contained 454 g bags. How can measures of dispersion be used to determine if the accuracy of measurement is the same for both bag sizes?
Example (continued) We can find the mean, , and standard deviation, σ, using our calculator. Enter all of the terms into L1 STAT EDIT STAT 1-Var Stats CALC The accuracy is not the same for both bags. The 454 g bags are more accurate. We are given all of the information about the data set. Try it with these two sets: Set 1: Set 2:
example Angele conducted a survey to determine the number of hours per week that Grade 11 males in her school play video games. She determined that the mean was 12.84 h, with a standard deviation of 2.16 h. Janessa conducted a similar survey of Grade 11 females in her school, she organized her results in this frequency table. Compare the results. Firstly, we need to use the midpoint of the Hours, since we can’t enter a range into our data. Midpoints: 4, 6, 8, 10, 12, 14
Midpoints: 4, 6, 8, 10, 12, 14 For group data, like this, there is an extra step. Enter Hours into L1, & Frequency into L2 1-Var STAT STAT L1 ( 2nd ) 2nd , STAT L2 ENTER Boys have a higher mean (they play more video games), and they have a lower standard deviation (so they’re more consistent than the girls). For the girls, we find: Compared to the boys:
Pg. 261-265, # 2, 4, 5, 7, 9, 10, 12, 13, 14 Independent Practice
5.4 – The normal distribution Chapter 5
Normal distribution If we rolled a single die 50 000 times, what do you think the graph would look like? What about two dice? In partners, roll two dice 50 times. One person rolls, one person keeps a frequency diagram.
Normal distribution A normal curve is a symmetrical curve that represents the normal distribution; also called a bell curve. Normal distribution is data that, when graphed as a histogram or a frequency polygon, results in a unimodalsymmetric distribution about the mean.
example Heidi is opening a new snowboard shop near a local ski resort. She knows that the recommended length of a snowboard is related to a person’s height. Her research shows that most of the snowboarders who visit this resort are males, 20 to 39 years old. To ensure that she stock the most popular snowboard lengths, she collects height data for 1000 Canadian men, 20 to 39 years old. How can she use the data to help her stock her store with boards that are the appropriate lengths? First, enter the data into your calculator. Remember to use midpoints for the height range (i.e. 60.5, 61.5, etc).
example • Heights within one standard deviation of the mean: • 69.52 – 2.98 = 66.5 inches • 69.52 + 2.98 = 72.5 inches • Heights within two standard deviations of the mean: • 69.52 – 2(2.98) = 63.5 inches • 69.52 + 2(2.98) = 75.5 inches Number of men within one std. deviation (from 66.5–72.5 in): 116 + 128 + 147 + 129 + 115 + 63 = 698 men Number of men within two std. deviation (from 63.5–75.5 in): 30 + 52 + 64 + 116 + 128 + 147 + 129 + 115 + 63 + 53 + 29 + 20 = 944 men She surveyed 1000 men, so 698/1000 or 69.8% of men are 66.5 to 72.5 inches tall, while 944/1000 or 94.4% of men are within 63.5 to 75.5 inches tall.
example Jim raises Siberian husky sled dogs at his kennel. He know, from the data he has collected over the years, that the weights of adult male dogs are normally distributed, with a mean of 52.5 lbs and a standard deviation of 2.4 lbs. Jim used this information to sketch a normal curve, with: • 68% of the data within one standard deviation of the mean • 95% of the data within two standard deviations of the mean • 99.7% of the data within three standard deviations of the mean • What percent of dogs at Jim’s kennel would you expect to have a weight between 47.7 lbs and 54.9 lbs.
Example (continued) • We want to find the percentage of dogs who weigh between 47.7 lbs and 54.9 lbs. • Shade in the area on the provided graph. • So, we can see that 68% of dogs will be between 50.1 and 54.9 lbs, so we just need to know what percent falls between 47.7 and 50.1. • How can we figure it out? • What percent happens between 47.7 and 52.5? • 95/2 = 47.5% • What percent is between 50.1 and 52.5? • 68/2 = 34% • So, what percent is left between 47.7 and 50.1? • 47.5% – 34% = 13.5% 13.5% So, 68% + 13.5% = 81.5% are between 47.7 lbs and 54.9 lbs.
example Sometimes, we use the symbol μ to represent the mean. We can make a histogram using this data on our calculator to check if it is normally distributed. First, enter all of the data into your STAT list. Enter it into your calculator, to find that: μ = 2.526 σ = 0.482 Median = 2.55 (scroll down in the 1-Var list) When the median and the mean are close, that suggests the data may be normally distributed.
Example Making a histogram on your calculator: Switch it to ON 2nd Y= ENTER It’s also helpful to change the x-scale under WINDOW to the standard deviation (0.482, in this case) Use the arrows to navigate to the histogram picture, press ENTER ZOOM 9 GRAPH
example Now that we have a graph, we can see that it lookslike it’s probably normally distributed. However, to make sure, we need to check that it has the right percents—normally distributed graphs have approximately 68% of the data within one standard deviation of the mean, approximately 95% within two standard deviations of the mean, and approximately 99.7% within three. These are important to remember. We can check this on the calculator, using the TRACE function. When on the graph screen, press TRACE, and then use the arrow keys to find the number of terms within each standard deviation. • In the two tallest bars, we have: • n = 18 • n = 17 • That’s within one standard deviation, so 17 + 18 = 35 terms are within one standard deviation. • 35/50 = 70% are within one deviation. • In the next two, we have: • n = 7 and n = 6 • 35 + 7 + 6 = 48 terms are within 2 standard deviations • 48/50 = 96% • In the last, we have: • n = 1 and n = 1 • 48 + 2 = 50 • 50/50 = 100% are within three standard deviations 70%, 96% and 100% are pretty close to the percents for a normal distribution, so we can say that this data approximates a normal distribution.
Example (continued) b) If Shirley purchases this cellphone, what is the likelihood that it will last for more than three years? Sketch a frequency diagram, and show one standard deviation (0.482) above the mean (2.526): • Using the standard normal distribution percents (68%, 95%, 99.7%), we know that 50% of data should fall below the mean of 2.526. • 68%/2 = 34%, so approximately 34% should fall within one standard deviation above the mean. • So, approximately, what percent would be lasting longer than 3? • 100% – 34% – 50% = 16% • There is a 16% chance that her cellphone will last more than three years.
Pg. 279-282, # 1, 3, 4-7, 9, 12-16 Independent Practice
5.5 – Z-SCORES Chapter 5
Z-scores A z-score is a standardized value that indicates the number of standard deviations of a data value above or below the mean (it tells you how far away from the mean a value is–the higher the z-score, the further a value is from the mean). The standard normal distribution is a normal distribution that has a mean of zero and a standard deviation of one. When we do problems involving z-scores, we need to use the z-score tables on pages 580-581.
example Hailey and Serge belong to a running club in Vancouver. Part of their training involves a 200 m sprint. Below are normally distributed times for the 200 m sprint in Vancouver and on a recent trip to Lake Louise. At higher altitudes, run times improve. Determine at which location Hailey’s run time was better, when compared with the results. For any score, x, we have x = μ + zσ, where z represents the number of standard deviations of the score from the mean. Solve for z:
Example (continued) Vancouver: Lake Louise: Try it: Use z-scores to figure out with of Serge’s run times was better. In this case, is a score that is more negative a good or bad thing? Hailey’s time was better than the mean in both places, however, since her z-score was lower for Lake Louise, her better time was in Lake Louise.
example IQ tests are sometimes used to measure a person’s intellectual capacity at a particular time. IQ scores are normally distributed, with a mean of 100 and a standard deviation of 15. If a person scores 119 on an IQ test, how does this score compare with the scores of the general population? Diagram: Now that we have the z-score, we can find the percentile of the score, which tells you the percent of people who would be lower on the graph. 119 The value in the z-score table is 0.8980. This means that a person who scores 119, scores greater than 89.80% of the population. We say that he’s in the 89.8th percentile.
example Athletes should replace their running shoes before the shoes lose their ability to absorb shock. Running shoes lose their shock-absorption after a mean distance of 640 km, with a standard deviation of 160 km. Zack is an elite runner and wants to replace his shoes at a distance when only 25% of people would replace their shoes. At what distance should he replace his shoes? Diagram: What is the accompanying z-score? 0.2500 isn’t an option on the table, so we see that it must fall between 0.2483 and 0.2514. That means that the z-score falls between –0.67 and –0.68. We can say that z = –0.675. He should replace his shoes after 532 km. Now, we use the z-score table in the opposite way: look for the number closest to 0.25 in the middle of the table.
example The ABC Company produces bungee cords. When the manufacturing process is running well, the lengths of the bungee cords produced are normally distributed, with a mean of 45.2 cm and a standard deviation of 1.3 cm. Bungee cords that are shorter than 42.0 cm or longer than 48.0 cm are rejected by the quality of control workers. If 20 000 bungee cords are manufactured each day, how many bungee cords would you expect the quality control workers to reject? Minimum = 42 cm Maximum = 48 cm There is 0.69% below the minimum, and 1.58% above the max. They will reject approximately 0.69 + 1.58 = 2.27% of the bungee cords. (0.0227)(20 000) = 454 So, they would reject approximately 454 cords. We want to find the area to the right of the max, so the process is different. 2.15 = 1 – 0.9842 2.15 = 0.0158 Look up the percentiles for these values: –2.46 = 0.0069
example A manufacturer of personal music players has determined that the mean life of the players is 32.4 months, with a standard deviation of 6.3 months. What length of warranty should be offered if the manufacturer wants to restrict repairs to less than 1.5% of all the players sold? • 1.5% = 0.015 • Find the z-score for 0.0150, using your calculator • z = –2.17 They should offer an 18-month warranty.
PG. 292-294 # 1-4, 6, 8, 10, 12, 13, 15, 17, 20, 23 Independent Practice
5.6 – Confidence intervals Chapter 5
Pg. 302-304, # Independent Practice