500 likes | 508 Views
3-1. Chapter 3. Data Description. Outline. 3-2. 3-1 Introduction 3-2 Measures of Central Tendency 3-3 Measures of Variation 3-4 Measures of Position 3-5 Exploratory Data Analysis. Objectives. 3-3.
E N D
3-1 Chapter 3 Data Description
Outline 3-2 • 3-1 Introduction • 3-2 Measures of Central Tendency • 3-3 Measures of Variation • 3-4 Measures of Position • 3-5 Exploratory Data Analysis
Objectives 3-3 • Summarize data using the measures of central tendency, such as the mean, median, mode, and midrange. • Describe data using the measures of variation, such as the range, variance, and standard deviation.
Objectives 3-4 • Identify the position of a data value in a data set using various measures of position, such as percentiles, deciles and quartiles.
3-2 Measures of Central Tendency 3-6 • Astatisticis a characteristic or measure obtained by using the data values from a sample. • Aparameteris a characteristic or measure obtained by using the data values from a specific population.
3-2 The Mean (arithmetic average) 3-7 • Themeanis defined to be the sum of the data values divided by the total number of values. • We will compute two means: one for the sample and one for a finite population of values. • The mean, in most cases, is not an actual data value.
3-2 The Sample Mean 3-9 T h e s y m b o l X r e p r e s e n t s t h e s a m p l e m e a n . X i s r e a d a s " X - b a r " . T h e G r e e k s y m b o l i s r e a d a s " s i g m a " a n d i t m e a n s " t o s u m " . X X . . . + X + + X = 1 2 n n X . = n
3-2 The Sample Mean - Example 3-10 T h e a g e s i n w e e k s o f a r a n d o m s a m p l e o f s i x k i t t e n s a t a n a n i m a l s h e l t e r a r e 3 , 8 , 5 , 1 2 , 1 4 , a n d 1 2 . F i n d t h e a v e r a g e a g e o f t h i s s a m p l e . T h e s a m p l e m e a n i s X 3 + 8 + 5 + 1 2 + 1 4 + 1 2 X = = n 6 5 4 = 9 w e e k s . = 6
3-2 The Population Mean 3-11 T h e G r e e k s y m b o l r e p r e s e n t s t h e p o p u l a t i o n m m e a n . T h e s y m b o l i s r e a d a s " m u " . m N i s t h e s i z e o f t h e f i n i t e p o p u l a t i o n . X X . . . + X + + = m 1 2 N N X . = N
A s m a l l c o m p a n y c o n s i s t s o f t h e o w n e r , t h e m a n a g e r , t h e s a l e s p e r s o n , a n d t w o t e c h n i c i a n s . T h e s a l a r i e s a r e l i s t e d a s $ 5 0 , , , , , a n d r e s p e c t i v e l y A s s u m e t h i s i s t h e p o p u l a t i o n T h e n t h e p o p u l a t i o n m e a n w i l l b e X N 3-2 The Population Mean-Example 3-12 0 0 0 2 0 0 0 0 1 2 0 0 0 , 9 , 0 0 0 9 , 0 0 0 . ( . ) = m 5 0 , 0 0 0 + 2 0 , 0 0 0 + 1 2 , 0 0 0 + 9 , 0 0 0 + 9 , 0 0 0 = 5 = $ 2 0 , 0 0 0 .
3-2 The Median 3-20 • When a data set is ordered, it is called adata array. • Themedianis defined to be the midpoint of the data array. • The symbol used to denote the median isMD.
3-2 The Median - Example 3-21 • The weights (in pounds) of seven army recruits are 180, 201, 220, 191, 219, 209, and 186. Find the median. • Arrange the data in order and select the middle point.
3-2 The Median - Example 3-22 • Data array:180, 186, 191,201, 209, 219, 220. • The median,MD = 201.
3-2 The Median 3-23 • In the previous example, there was an odd number of values in the data set. In this case it is easy to select the middle number in the data array.
3-2 The Median 3-24 • When there is aneven numberof values in the data set, the median is obtained by taking theaverage of the two middle numbers.
3-2 The Median -Example 3-25 • Six customers purchased the following number of magazines: 1, 7, 3, 2, 3, 4. Find the median. • Arrange the data in order and compute the middle point. • Data array: 1, 2, 3, 3, 4, 7. • The median, MD = (3 + 3)/2 = 3.
3-2 The Median -Example 3-26 • The ages of 10 college students are: 18, 24, 20, 35, 19, 23, 26, 23, 19, 20. Find the median. • Arrange the data in order and compute the middle point.
3-2 The Median -Example 3-27 • Data array:18, 19, 19, 20,20,23, 23, 24, 26, 35. • The median,MD = (20 + 23)/2 = 21.5.
3-2 The Mode 3-38 • Themodeis defined to be the value that occurs most often in a data set. • A data set can have more than one mode. • A data set is said to have no mode if all values occur with equal frequency.
3-2 The Mode -Examples 3-39 • The following data represent the duration (in days) of U.S. space shuttle voyages for the years 1992-94. Find the mode. • Data set: 8, 9, 9, 14, 8, 8, 10, 7, 6, 9, 7, 8, 10, 14, 11, 8, 14, 11. • Ordered set: 6, 7, 7, 8, 8, 8, 8, 8, 9, 9, 9, 10, 10, 11, 11, 14, 14, 14.Mode = 8.
3-2 The Mode -Examples 3-40 • Six strains of bacteria were tested to see how long they could remain alive outside their normal environment. The time, in minutes, is given below. Find the mode. • Data set: 2, 3, 5, 7, 8, 10. • There isno modesince each data value occurs equally with a frequency of one.
3-2 The Mode -Examples 3-41 • Eleven different automobiles were tested at a speed of 15 mph for stopping distances. The distance, in feet, is given below. Find the mode. • Data set: 15, 18, 18, 18, 20, 22, 24, 24, 24, 26, 26. • There aretwo modes (bimodal). The values are 18 and 24. Why?
3-2 The Midrange 3-45 • The midrangeis found by adding the lowest and highest values in the data set and dividing by 2. • The midrange is a rough estimate of the middle value of the data. • The symbol that is used to represent the midrange isMR.
3-2 The Midrange -Example 3-46 • Last winter, the city of Brownsville, Minnesota, reported the following number of water-line breaks per month. The data is as follows: 2, 3, 6, 8, 4, 1. Find the midrange.MR = (1 + 8) / 2 = 4.5. • Note:Extreme values influence the midrange and thus may not be a typical description of the middle.
3-2 The Weighted Mean 3-47 • Theweighted meanis used when the values in a data set are not all equally represented. • Theweighted meanof a variable Xisfound by multiplying each value by its corresponding weight and dividing the sum of the products by the sum of the weights.
3-2 The Weighted Mean 3-48 T h e w e i g h t e d m e a n w X w X . . . w X w X + + + X = = 1 1 2 2 n n w w . . . w w + + + 1 2 n w h e r e w , w , . . . , w a r e t h e w e i g h t s 1 2 n f o r t h e v a l u e s X , X , . . . , X . 1 2 n
3-3 Measures of Variation - Range 3-53 • The rangeis defined to be the highest value minus the lowest value. The symbol R is used for the range. • R = highest value – lowest value. • Extremely large or extremely small data values can drastically affect the range.
T h e v a r i a n c e i s t h e a v e r a g e o f t h e s q u a r e s o f t h e d i s t a n c e e a c h v a l u e i s f r o m t h e m e a n . T h e s y m b o l f o r t h e p o p u l a t i o n v a r i a n c e i s ( i s t h e G r e e k l o w e r c a s e l e t t e r s i g m a ) 2 s s m s = m 3-3 Measures of Variation - Population Variance 3-54 ( X ) - 2 , w h e r e 2 = N X i n d i v i d u a l v a l u e = p o p u l a t i o n m e a n N = p o p u l a t i o n s i z e
3-3 Measures of Variation - Population Standard Deviation 3-55 T h e s t a n d a r d d e v i a t i o n i s t h e s q u a r e r o o t o f t h e v a r i a n c e . ( X ) 2 - m = . 2 s s = N
3-3 Measures of Variation -Example 3-56 • Consider the following data to constitute the population: 10, 60, 50, 30, 40, 20. Find the mean and variance.
3-3 Measures of Variation -Example 3-56 • Consider the following data to constitute the population: 10, 60, 50, 30, 40, 20. Find the mean and variance. • Themean = (10 + 60 + 50 + 30 + 40 + 20)/6 = 210/6 = 35. • Thevariance 2= 1750/6 = 291.67. See next slide for computations.
m m X 2 X - ( X - ) 1 0 - 2 5 6 2 5 6 0 + 2 5 6 2 5 6 0 + 2 5 6 2 5 5 0 + 1 5 2 2 5 5 0 + 1 5 2 2 5 3 0 - 5 2 5 3 0 - 5 2 5 4 0 + 5 2 5 2 0 - 1 5 2 2 5 3-3 Measures of Variation -Example 3-57 X 2 m m X – ( X – ) 1 0 - 2 5 6 2 5 4 0 + 5 2 5 2 0 - 1 5 2 2 5 2 1 0 1 7 5 0 2 1 0 1 7 5 0
The unbias ed estimat or of the population variance o r the samp le varianc e is a statistic whose valu e approxim ates the expected v alue of a population variance. It is deno ted by s , where 2 - å ( X X ) 2 = s , and 2 - n 1 X = sample mean n = sample size 3-3 Measures of Variation - Sample Variance 3-58
3-3 Measures of Variation - Sample Standard Deviation 3-59 The sample standard deviation is the squ are root of t he sample variance. - å ( X X ) 2 = s = s 2 - n 1
s 2 3-3Shortcut Formulafor the Sample Variance and the Standard Deviation 3-60 - å å ( X ) / n 2 X 2 = - n 1 - å å ( X ) / n X 2 2 s = - n 1
3-3 Sample Variance -Example 3-61 • Find the variance and standard deviation for the following sample: 16, 19, 15, 15, 14.
3-3 Sample Variance -Example 3-61 • Find the variance and standard deviation for the following sample: 16, 19, 15, 15, 14. • X = 16 + 19 + 15 + 15 + 14 = 79. • X2 = 162 + 192 + 152 + 152 + 142 = 1263.
3-3 Sample Variance -Example 3-62 X ( ) / n X 2 2 s = 2 n 1 1263 (79) / 5 2 = 3.7 4 s = 3.7 1 . 9
s s = × × CVar 100% or CVar = 100%. m X 3-3 Coefficient of Variation 3-67 • The coefficient of variation is defined to be the standard deviation divided by the mean. The result is expressed as a percentage.
For samples : - X X = z . s For population s : - m X = z . s 3-4 Measures of Position — z-score 3-72 • The z score represents the number of standard deviations a data value falls above or below the mean.
3-4 z-score - Example 3-73 • A student scored 65 on a statistics exam that had a mean of 50 and a standard deviation of 10. Compute the z-score. • z = (65 – 50)/10 = 1.5. • That is, the score of 65 is 1.5 standard deviationsabovethe mean. • Above -since the z-score is positive.
3-4 Measures of Position - Percentiles 3-74 • Percentiles divide the distribution into 100 groups. • ThePkpercentile is defined to be that numerical value such that at most k% of the values are smaller than Pk and at most (100 – k)% are larger than Pk in an ordered data set.
3-4 Percentile Formula 3-75 • The percentile corresponding to a given value (X) is computed by using the formula: number of values below X + 0.5 Percentile 100% total number of values
3-4 Percentiles - Example 3-76 • A teacher gives a 20-point test to 10 students. Find the percentile rank of a score of 12. Scores: 18, 15, 12, 6, 8, 2, 3, 5, 20, 10. • Ordered set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20. • Percentile = [(6 + 0.5)/10](100%) = 65th percentile. Student did better than 65% of the class.
3-4 Percentiles - Finding the value Corresponding to a Given Percentile 3-77 • Procedure:Let p be the percentile and n the sample size. • Step 1:Arrange the data in order. • Step 2:Compute c = (np)/100. • Step 3:If c is not a whole number, round up to the next whole number. If cis a whole number, use the value halfway between cand c+1.
3-4 Percentiles - Finding the value Corresponding to a Given Percentile 3-78 • Step 4:The value of c is the position value of the required percentile. • Example: Find the value of the 25th percentile for the following data set: 2, 3, 5, 6, 8, 10, 12, 15, 18, 20. • Note: the data set is already ordered. • n = 10, p = 25, so c = (1025)/100 = 2.5. Hence round up to c = 3.
3-4 Percentiles - Finding the value Corresponding to a Given Percentile 3-79 • Thus, the value of the 25th percentile is the value X = 5. • Find the 80th percentile. • c = (1080)/100 = 8. Thus the value of the 80th percentile is the average of the 8th and 9th values. Thus, the 80th percentile for the data set is (15 + 18)/2 = 16.5.
3-4 Special Percentiles - Deciles and Quartiles 3-80 • Decilesdivide the data set into 10 groups. • Deciles are denoted by D1, D2, …, D9 with the corresponding percentiles being P10, P20, …, P90 • Quartilesdivide the data set into 4 groups.
3-4 Special Percentiles - Deciles and Quartiles 3-81 • Quartiles are denoted by Q1, Q2, and Q3 with the corresponding percentiles being P25, P50, and P75. • The median is the same as P50 or Q2.
3-4 Outliers and the Interquartile Range (IQR) 3-82 • Anoutlieris an extremely high or an extremely low data value when compared with the rest of the data values. • TheInterquartile Range, IQR = Q3 – Q1.