350 likes | 362 Views
Learn how to measure the center and spread of symmetric data sets using mean, median, range, and standard deviation.
E N D
2.4 Describing Distributions Numerically – cont. Describing Symmetric Data
Recall: 2 characteristics of a data set to measure • center measures where the “middle” of the data is located • variability measures how “spread out” the data is
Measure of Center When Data Approx. Symmetric • mean (arithmetic mean) • notation
Connection Between Mean and Histogram • A histogram balances when supported at the mean. Mean x = 140.6
Mean: balance pointMedian: 50% area each halfright histo: mean 55.26 yrs, median 57.7yrs
Properties of Mean, Median 1. The mean and median are unique; that is, a data set has only 1 mean and 1 median (the mean and median are not necessarily equal). 2. The mean uses the value of every number in the data set; the median does not.
Think about mean and median 456=270; 270-40=230; 230/5=46 • Six people in a room have a median age of 45 years and mean age of 45 years. • One person who is 40 years old leaves the room. • Questions: • What is the median age of the 5 people remaining in the room? • What is the meanage of the 5 people remainingin the room? Can’t answer 46
Example: class pulse rates • 53 64 67 67 70 76 77 77 78 83 84 85 85 89 90 90 90 90 91 96 98 103 140
2018 n = 877 mean = $4,512,768 median = $1,450,000 max = $34,083,333 2014 n = 848 mean = $3,932,912 median = $1,456,250 max = $28,000,000 2014, 2018 baseball salaries
Disadvantage of the mean • Can be greatly influenced by just a few observations that are much greater or much smaller than the rest of the data
Skewness: comparing the mean, and median • Skewed to the right (positively skewed) • mean>median
Skewed to the left; negatively skewed • mean < median. mean=86.92; median=98.45
Symmetric data • mean, median approx. equal
Describing Variability of Symmetric Data Describing symmetric data (cont.)
Describing Symmetric Data (cont.) • Measure of center for symmetric data: • Measure of variability for symmetric data?
Ways to measure variability 1. range=largest-smallest ok sometimes; in general, too crude; sensitive to one large or small obs.
The Sample Standard Deviation, a measure of spread around the mean • Square the deviation of each observation from the mean; find the square root of the “average” of these squared deviations
Calculations … Women height (inches) Mean = 63.4 Sum of squared deviations from mean = 85.2 (n − 1) = 13; (n − 1) is called degrees freedom (df) s2 = variance = 85.2/13 = 6.55 inches squared s = standard deviation = √6.55 =2.56 inches
We’ll never calculate these by hand, so make sure to know how to get the standard deviation using your calculator, Excel, or other software. Mean ± 1 s.d. Sample standard deviation s and sample variance s2
Remarks 1. Note that s and s are always greater than or equal to zero. 2. The larger the value of s (or s ), the greater the spread of the data. When does s=0? When does s =0?
Remarks (cont.) 3. The standard deviation is the most commonly used measure of risk in finance and business • Stocks, Mutual Funds, etc. 4. Variance • s2 sample variance • 2 population variance • Units are squared units of the original data • square $, square gallons ??
Remarks 6):Why divide by n-1 instead of n? • degrees of freedom • each observation has 1 degree of freedom • however, when estimate unknown population parameter like m, you lose 1 degree of freedom
Remarks 6) (cont.):Why divide by n-1 instead of n? Example • Suppose we have 3 numbers whose average is 9 • x1= x2= • then x3 must be • once we selected x1 and x2, x3 was determined since the average was 9 • 3 numbers but only 2 “degrees of freedom”
Example #1 #2 #3 #4 32 33 38 37 41 35 39 42 44 45 39 45 47 50 40 46 50 52 56 47 53 54 57 48 56 58 58 50 59 59 61 67 68 64 62 68 • x 50 50 50 50 • s 10.6 10.6 10.6 10.6 • m 50 52 56 47
Review: Properties of s and s • s and s are always greater than or equal to 0 when does s = 0? s = 0? • The larger the value of s (or s), the greater the spread of the data • the standard deviation of a set of measurements is an estimate of the likely size of the chance error in a single measurement