1.28k likes | 1.29k Views
Explore applications of statistics in agriculture, biology, business, discrimination, and more with mean, median, mode, range, and box plots. Enhance your data analysis skills!
E N D
Year 1 Unit 1 Strand 2: Concept 1 Data Analysis
Statistics in your daily lives • For example: • “In a poll of 1000 U.S. Citizens, 47% were in favor of immigration reform.” • “The unemployment rate in May 2005 was 9.3%.” • “Two out of three dentists recommend Crest.”
Agriculture • What varieties of wheat are best suited for the Canadian prairies? What are the best combinations of inputs (such as fertilizer and herbicides)? • Archaeology • How old is an archaeological site? How should the overlapping claims by First Nations be resolved based upon evidence of habitation. • Biology • How many northern spotted owls are present in the old growth forests of BC? How many black bears are there in the interior of BC? Are they endangered? Where am I going to use this?
Business • How does ICBC decide upon the price of insurance for a particular make and model of car and driver class? What type of students are most likely to default upon their student loans? Do inducements (e.g. prizes) to open bank accounts produce any long term business? Is it true that the companies most likely to be taken over are those that have been achieving poor returns? How is the consumer price index determined? • Discrimination • How are claims of discrimination in the workforce resolved? How are claims of `equal pay for work of equal value' resolved? • Environmental studies • What impact will a proposed industrial plant have on the surrounding ecology? Is there an increase in birth defects near nuclear power plants? Do strong electric or magnetic fields induce higher cancer rates among people living close to them? Where am I going to use this?
Fisheries • How many salmon return to spawn in the Fraser River? How many salmon should be caught? • Forestry • How much wood is there in a forest due to be felled? When should we fell trees in order to maximize economic return? • Genetics • Does the data support genetic theories about how various characteristics are inherited? Is there evidence for or against the theories expounded in the book The Bell Curve that certain racial groups have lower or higher IQs? Where am I going to use this?
Industry • How should inspection plans be implemented for quality control? How should processes by modified to reduce defectives? • Medicine • What are the important risk factors for heart disease? Does drinking coffee increase your cancer risk? Does diet affect intelligence? Does Vitamin C prevent colds? • Marketing • What makes advertisements irritating? Is an irritating ad a bad ad? Why was `New Coke' launched? Where am I going to use this?
1.1 Measures of Center • Mean, Median, Mode and Range are referred to as measures of central tendency. • They can also be referred to as averages.
Key Terms Data Set: 5, 5 ,5 ,5 ,6 ,6 ,6 ,6 ,6 ,7 ,7 ,8 ,9 ,10 ,10 ,10 ,11 ,11 ,13 ,18 • Mean: The average of all of the data • (add all numbers together and divide by the number of data points you have) • Median: The middle number when the data is put IN ORDER from smallest to biggest. • Note: (If there are two numbers in the middle: add them together and divide by 2) 154÷20 = 7.7 The mean is 7.7. 7 + 7 = 14÷2=7 The median is 7. • Range: Gives the spread of the data. • (The highest number minus the lowest number.) • Mode: The number you see the MOST. • (There can more than one mode or no mode at all.) 18-5 = 13 The range is 13 The mode is 6.
EX. 1 Find the mean, median, mode, and range of the data set. • 29, 38, 32, 37, 29 Mode: 29 Mean: 165÷5= 33 Median: 32 Range: 38-29 = 9 Ex. 2 Which set of data should you pick so that your range is 30, your median 50 and your mode is 62? a. 30, 43, 45, 45, 47, 50, 65, 67, 68, 70 b. 40, 42, 42, 42, 50, 50, 62, 65, 65, 70 c. 40, 41, 45, 47, 48, 50, 53, 62, 62, 70 d. 40, 41, 45, 47, 50, 50, 62, 62, 62, 70
Ex.3 Brian’s Biology first six test scores during his senior year in high school were 91, 94, 96, 98, 100, and 100. What score could he receive on the last test to ensure his test median is 98? He could score a 98, 99, or 100. Ex.4 The most repeated score on the math test was 85%. What measure of central tendency does this represent? The mode Ex. 5 What measure of central tendency is best used to in determining batting averages for baseball players? The mean
1.2 Box and Whisker Plots • Box and Whisker Plots are also called Box Plots or Five number Summaries. • There are two parts: The box and the whiskers.Let’s examine them more closely.
Upper Extreme Median Lower Extreme Upper Quartile Lower Quartile Analyzing Box Plots
1 • Box and Whisker plots divide the data into 4 groups or (quartiles). • Each quartile represents 25% of the data. 2 3 4 25% 25% 25% 25%
Example 1: Use the box and whisker plot to find the following values. • Lower extreme: • Upper extreme: • Median: • Lower Quartile: • Upper Quartile: 58 66 62 60 64
Example 1: • What percent of students are taller than 62 inches? 50% • What percent of students at least 60 inches tall? 75%
Five Number Summary • Upper Extreme (UE): The biggest number. • Lower Extreme (LE): The smallest number. • Median: the middle number • Upper Quartile (UQ): The median of the numbers bigger than THE MEDIAN. • Lower Quartile (LQ): The median of the numbers less than THE MEDIAN.
Other Key Terms • Interquartile Range (IQR): The UQ minus the LQ. Gives the size of the box. • Outlier: any number that is far away from the rest of your data
Lower Extreme Lower Quartile Median Upper Quartile Upper Extreme Constructing Box and Whisker Plots • Example 1: Construct a box and whisker plot for the following data. 5, 8, 10, 17, 25 But, what if there are more than 5 numbers?
What is a quartile? • Take your strip of paper and fold it into four. • What value do you think represents the median? • The LE and UE? • The LQ and UQ?
How to make a Box and Whisker Plot Steps: • Put the data order from least to greatest. • Find the Upper and Lower Extreme. • Find the Median • Find the Upper and Lower Quartile • Make a number line. • Mark the values above the number line with a dot. • Draw a box around the quartile values. • Draw a vertical line through the median value. • Extend whiskers from each quartile to the extreme data points.
EX. 2 Construct a box-and-whisker plot of the test scores shown. UE: LE: M: UQ: LQ: 100 • What percentage of the scores are between 44 and 72.5? • 25% • What percentage of the scores are greater than 78.5? • 50% 44 78.5 88.5 72.5 4 5 6 7 8 9 10 4 6 8 1 2 3 5 5 5 8 9 9 0 2 8 8 9 2 3 0
1.3 Statistics: Stem-and-Leaf Plots • Stem-and-Leaf Plots: A convenient method to display every piece of data by showing the digits of each number. • In a stem-and leaf plot, the greatest common place value of the data is used to form stems. • The numbers in the next greatest place-value position are then used to form the leaves.
Stem-and-Leaf Plots Leaf: The last digit on the right of the number. Stem: The digit or digits that remain when the leaf is dropped. Look at the number 84 The leaf is the last digit formed: the number 4. The stem is the remaining digits when the leaf is dropped: the number 8. The stem with the leaf forms the number 84. Stem Leaf 8 4 = 84
Example 1: Age of United states Presidents at their First Inauguration: 57 61 57 57 58 57 61 54 68 51 49 64 50 48 65 52 56 46 54 49 50 47 55 54 42 51 56 55 51 54 51 60 62 43 55 56 61 52 69 64 (Through the 40th presidency) Stem Leaf Notice that the data (numerical facts) are numbers between 42-69. Create the stem by listing numbers from 4-6. 4 5 6 2 3 6 7 8 9 9 0 0 1 1 1 1 2 2 4 4 4 4 5 5 5 6 6 6 7 7 7 7 8 0 1 1 1 2 4 4 5 8 9 Make sure the leaf is in numerical order from least to greatest Key: 57 means 57
It is easy to interpret or analyze information from the Stem-and-Leaf. • How many presidents were at least 51 years old at their inauguration? • What age is the youngest president to be inaugurated? • What is the age of the oldest president to be inaugurated? • How many presidents were 40-49 years old at their inauguration? 31 42 69 7 Stem Leaf 4 5 6 2 3 6 7 8 9 9 0 0 1 1 1 1 2 2 4 4 4 4 5 5 5 6 6 6 7 7 7 7 8 0 1 1 1 2 4 4 5 8 9 Key: 57 means 57
Example 2: The following are scores obtained by two classes of 25 grade five students on a math test. Compare the two sets of scores by using back-to-back stem-and-leaf plots. What conclusions might you draw by studying the data displayed in this way? Class A 73 75 42 93 88 62 62 37 73 76 96 54 80 75 69 66 81 79 83 56 69 88 80 52 59 Class B 65 80 67 80 87 44 82 71 91 93 75 76 79 80 87 83 54 56 57 82 62 69 75 80 91
Class B Class A 3 7 4 4 2 7, 6, 4 5 2, 4, 6, 9 9, 7, 5, 2 6 2, 2, 6, 9, 9 9, 6, 5, 5, 1 7 3, 3, 5, 5, 6, 9 7, 7, 3, 2, 2, 0, 0, 0, 0 8 0, 0, 1, 3, 8, 8 3, 1, 1 9 3, 6 Key: 57 means 57 Find the mean, median, mode and range for each set of data. Class B Class A range - 49 range - 59 median - 79 median - 73 mode - 80 mode - 62, 69, 73, 75, 80, 88 mean - 70.72 mean - 74.64
Class B Class A 3 7 4 4 2 7, 6, 4 5 2, 4, 6, 9 9, 7, 5, 2 6 2, 2, 6, 9, 9 9, 6, 5, 5, 1 7 3, 3, 5, 5, 6, 9 7, 7, 3, 2, 2, 0, 0, 0, 0 8 0, 0, 1, 3, 8, 8 3, 1, 1 9 3, 6 What comparisons can you make from the data? Key: 57 means 57 Which class scored better? What does the shape of the stem and leaf plot tell you about the distribution of the data?
1.4 Histograms • The histogram is a graph that displays the data by using vertical bars of various heights to represent the frequencies.
How is a Histogram like a stem and leaf plot? Here is a stem and leaf plot for a group of 52 pennies sorted by mint year.
Draw a rectangle over each value on the horizontal axis with a height corresponding to the frequency of that value:
Mint Year of Pennies Remove the dots, shade the rectangles, and add a vertical scale to indicate the frequency of each interval on the horizontal scale:
How to Make a Histogram Step 1: Make a Frequency Table • Put the data in order from least to greatest • Find the range (highest number – lowest number) • Pick your intervals (must be even). Try to have 4-6 intervals. • Tally the data.
Histograms Use common sense in determining the number of intervals to use. Too many categories Too few categories
6 5 y c 4 n e u 3 q e r 2 F 1 0 5 8 1 1 1 4 1 7 2 0 N u m b e r o f C i g a r e t t e s S m o k e d p e r D a y Example of a Good Histogram
Example 1: • Russell asked each of the people in his mountain-climbing club how many times that they had been mountain climbing. The results are show below. Make a frequency table to show the data. 1, 4, 5, 6, 7 , 8, 10, 15, 17, 21, 22, 32, 32, 37, 40, 43, 51, 55
Interval Total 1, 4, 5, 6, 7 , 8, 10, 15, 17, 21, 22, 32, 32, 37, 40, 43, 51, 55 What would be an appropriate interval for the data? 0 - 9 6 10-19 3 • Range: • 55-1 • = 54 20-29 2 30-39 3 10?’s 40-49 2 50-59 2
Step 2: Making the histogram • Draw your x and y-axis. • Label the horizontal axis to show the intervals. • Label the vertical axis to show frequency • Draw your bars. All bars must have the same width. The height of the bar shows the frequency for the given interval.
Data for Number of Hikes 7 6 5 4 Frequency 3 2 1 0 0 - 9 10-19 20-29 30-39 40-49 50-59 Number of Hikes Set up….
1.5 Scatter Plots • Scatter Plot: A scatter plot shows the relationship between TWO data sets. • This relationship is also calledcorrelation. • There are three types of correlation: • Positive Correlation • Negative Correlation • No Correlation
Positive Correlation • The pattern of the dots slants UP and to the right. The two data sets INCREASE TOGETHER.
Negative Correlation • The pattern of the dots slants DOWN and to the right. When one data set INCREASES, the other DECREASES.
No Correlation • The dots are spread out. There is no pattern.
Examples: What type of relationship is shown by each scatter plot? Positive Negative No Correlation
Determine whether a scatter plot of the data might show a positive, negative, or no correlation. • study time, higher grades • height, intelligence • shoe size, salary • age of car, value of car • miles per gallon, gas expense • education, salary Positive No Correlation No Correlation Negative Negative Positive
Determine whether a scatter plot of the data might show a positive, negative, or no correlation. • wrist circumference, appetite • Birthdate, ring size • wind-chill, ice cream sales • age of tree, number of rings • amount of snowfall, shovel sales • hair length, hat size No Correlation No Correlation Negative Positive Positive No Correlation