150 likes | 318 Views
Spread. Set 1: 9, 10, 11, 8, 12 Set 2: 5, 2, 10, 18, 15 In what ways do Set1 and Set2 differ?. We have 4 ways of calculating how “Spread Out” our data is. Range IQR (inter quartile range) Variance Standard Deviation *** *** The most used Describing individual data points we have
E N D
Spread • Set 1: 9, 10, 11, 8, 12 • Set 2: 5, 2, 10, 18, 15 • In what ways do Set1 and Set2 differ?
We have 4 ways of calculating how“Spread Out” our data is • Range • IQR (inter quartile range) • Variance • Standard Deviation *** • *** The most used Describing individual data points we have • Z-scores • Percentiles
Range • The range literally tells you how spread out the data is. • Range = Maximum Value–Minimum Value • 2,4,5,6,7,8,9 • Range = 9-2 • =7
IQR (Inter Quartile Range) • IQR is the width of the box of a box and whisker plot. • e.g. • 2 3 4 5 6 7 8 9 10 11 12 • Q1 Q2 Q3 • IQR = Q3-Q1 = 10-4 = 6 • The semi-IQR is half the IQR • (in this case 3)
Remember Box and Whisker Plot e.g. 2 3 4 5 6 7 8 9 10 11 12 Q1 Q2 Q3 IQR = Q3-Q1 = 10-4 = 6 1 2 3 4 5 6 7 8 9 10 11 12 The box represents the middle half of the data and the whiskers represent the upper and lower quartiles respectively. The line in the box is at the median.
Modified Box and Whisker Plot and Outliers • A modified box and whisker plot plots the outliers as points, away from the whiskers. • An outlier is any data point that is 1.5 times the box length away from the box. • E.G. 2,3,3,4,5,7,8,12,25 Q1 Q2 Q3 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
What problems do you seewith Range and IQR? • Many data points are ignored. This is a strength of the median but when considering how spread out data is, we are ignoring the feature that we are looking for!
Variance • Variance is the mean square deviation • The deviation is the distance from a point to the mean. • Variance: for population data for sample data
Why do we square the (x-x)? • Some (xi-x) values are greater than 0, others are less than 0. They will average to 0 no matter what the spread is.
Standard Deviation (ungrouped data) Our most used measure of spread. Population Sample
Using Your Calculator • σx or σn mean Standard Deviation for population data. • Sx or σn-1 give you the standard deviation for sample data. The latter will return a higher number as one would expect a sample to have less spread than a population, needing a “boost”
The question From the first slide. • For X={8,9,10,11,12} • μ=10, σ= 1.4 • For X={2,5,10,15,18} • μ =10, σ= 5.97
Z-scores • A Z-score for a data point is the number of standard deviations that it is above (positive) or below (negative) the mean. For example, if the mean is 5 and standard deviation is 1, a data point with a value of 7 has a Z-score of 2. • We will use Z-scores later in the year as we leverage the Normal Distribution to solve Percentile and Binomial Experiment problems.
Percentile • Percent means per hundred • 60 % of the data has a value less than or equal to the 60th percentile. E.G. If you are in the 80th percentile for height, 80 percent Of the population have a height that is less than or equal to yours.
HWK • Page 148 • 1-5, 6,7 (notice it is a frequency table) • 9, 10,14,17