270 likes | 436 Views
Chapter 5. Understanding and Comparing Distributions. The Big Picture. Below is a histogram of the Average Wind Speed at Hopkins Forest in Western Massachusetts, for every day in 1989. The Big Picture (cont). The distribution is: High value may be an outlier
E N D
Chapter 5 Understanding and Comparing Distributions .
The Big Picture • Below is a histogram of the Average Wind Speed at Hopkins Forest in Western Massachusetts, for every day in 1989.
The Big Picture (cont) • The distribution is: • High value may be an outlier • Median daily wind speed =1.90 mph • IQR is 1.78 mph
The of a distribution reports its median, quartiles, and minimum and maximum Example: The five-number summary for the daily wind speed is: The Five-Number Summary
Daily Wind Speed: Making Boxplots • A is a graphical display of the five-number summary. • Boxplots are particularly useful when comparing groups.
Draw a single vertical axis spanning the range of the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box. Constructing BoxplotsFive number summary : 0.20, 1.15, 1.90, 2.93, 9.67
Sketch “fences” around the main part of the data. The upper fence is 1.5 IQRs above the upper quartile. The lower fence is 1.5 IQRs below the lower quartile. Note: the fences only help with constructing the boxplot and should not appear in the final display. Constructing Boxplots (cont.)Five number summary : 0.20, 1.15, 1.90, 2.93, 9.67
Use the fences to grow “whiskers.” Draw lines from the ends of the box up and down to the minimum and maximum data values found If a data value falls outside one of the fences, we do not connect it with a whisker. Constructing Boxplots (cont.)
Add the outliers by displaying any data values beyond the fences with special symbols. We often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles. Constructing Boxplots (cont.)
Wind Speed: Making Boxplots (cont.) • Let us compare the histogram and boxplot for daily wind speeds:
Comparing Groups • It is always more interesting to compare groups. • With histograms, note the shapes, centers, and spreads of the two distributions. • What does this graphical display tell you?
Comparing Groups (cont) • Boxplots hide the details while displaying the overall summary information. • We often plot them side by side for groups or categories we wish to compare.
What About Outliers? • If there are any clear outliers and you are reporting the mean and standard deviation • Report with the outliers present and with the outliers removed • Note: The median and IQR are not likely to be affected by the outliers.
Timeplots: Order, Please! • For some data sets, we are interested in how the data behave over time. In these cases, we construct of the data.
Re-expressing Skewed Data to Improve Symmetry • One way to make a skewed distribution more symmetric is to or the data • Apply a simple function (e.g., logarithmic function).
Re-expressing Skewed Data to Improve Symmetry (cont.) • A logarithmic function was applied to each of the observations of the data displayed in the previous slide. • Note the change in from the raw data (previous slide) to the data (left).
What Can Go Wrong? • Avoid inconsistent scales • Beware of outliers • Be careful when comparing groups with very different spreads
What have we learned? • We’ve learned the value of comparing data groups and looking for patterns among groups and over time • We’ve seen that boxplots are very effective for comparing groups graphically • We’ve experienced the value of identifying and investigating outliers
Practice Exercise - Chapter 5 A survey conducted in a college intro stats class during Autumn 2003 asked students about the number of credit hours they were taking that quarter. The number of credit hours for a random sample of 16 students is 10 10 12 14 15 15 15 15 17 17 19 20 20 20 20 22
Practice Exercise - Chapter 5 (cont) a. Find the five number summary for the data above b. Find the IQR for the data c. From parts (a) and (b), are there any outliers in the data? d. Create a boxplot of these data.
Practice Exercise - Chapter 5 (cont) 10 10 12 14 15 15 15 15 17 17 19 20 20 20 20 22 a. Find the 5 number summary:
Practice Exercise - Chapter 5 (cont) To find quartiles, divide data into 2 even sets 1st: 10 10 12 14 15 15 15 15 2nd: 17 17 19 20 20 20 20 22 To find Q1 we find the median of the first set of numbers above: → Q1 = To find Q3 we find the median of the second set of numbers: → Q3 =
Practice Exercise - Chapter 5 (cont) a. Five number summary:
Practice Exercise - Chapter 5 (cont) b. Find the IQR of the data. IQR = = =
Practice Exercise - Chapter 5 (cont) c. From parts (a) and (b), are there any outliers in the data? To determine if there are outliers we need to calculate the values of the fences. Lower fence = = =
Practice Exercise - Chapter 5 (cont) Upper fence = Q3 + 1.5 x IQR = = • Are there any observation outside the fences? • None of the observations lie outside the fences, hence in the data
Practice Exercise - Chapter 5 (cont) d. Create a boxplot of these data. Min = 10 Q1 = 14.5 Median = 16 Q3 = 20 Max = 22 Lower fence = 5.75 Upper fence = 28.25