830 likes | 846 Views
Learn how to summarize and display large sets of quantitative data using histograms and stem-and-leaf plots. Understand the concepts of shape, center, spread, outliers, mean, median, range, and more.
E N D
Chapter 4 Part 1 Displaying and Summarizing Quantitative Data
Objectives • Histogram • Stem-and-leaf plot • Dotplot • Shape • Center • Spread • Outliers • Mean • Median • Range • Interquartile range (IQR) • Percentile • 5-Number summary* • Resistant • Variance • Standard Deviation
Dealing With a Lot of Numbers… • Summarize the data, that will help us when we look at large sets of quantitative data, to grasp what data tell us – make a quantitative frequency table. • Display the summarized data. The best thing to do is to make a picture… • We can’t use bar charts or pie charts for quantitative data, since those displays are for categorical variables. • Therefore, display quantitative data using …
Histograms or stem-and-leaf plots These are summary graphs for a single variable. They are very useful to understand the pattern of variability in the data. • Line graphs: time plots Use when there is a meaningful sequence, like time. The line connecting the points helps emphasize any change over time. • Other graphs to reflect numerical summaries are Dotplots and Cumulative Frequency Curves
Quantitative Data HISTOGRAM
Histogram • To make a histogram we first need to organize the data using a quantitative frequency table. • Two types of quantitative data • Discrete – use ungrouped frequency table to organize. • Continuous – use grouped frequency table to organize.
Quantitative Frequency Tables – Ungrouped • An ungrouped frequency table simply lists the data values with the corresponding frequency counts with which each value occurs. • Commonly used withdiscrete quantitative data.
Quantitative Frequency Tables – Ungrouped • Example:The at-rest pulse rate for 16 athletes at a meet were57, 57, 56, 57, 58, 56, 54, 64, 53, 54, 54, 55, 57, 55, 60,and58. Summarize the information with an ungrouped frequency distribution. Note: The (ungrouped) classes are the observed values themselves.
Quantitative Relative Frequency Tables - Ungrouped Note:The relative frequency for a class is obtained by computingf/n.
Quantitative Frequency Tables – Grouped • A grouped frequency table is obtained by constructing classes (or intervals) for the data, and then listing the corresponding number of values (frequency counts) in each interval. • Commonly used withcontinuous quantitative data. • Grouped frequency tables are used to construct histograms.
Quantitative Frequency Tables – Grouped There are several procedures that one can use to construct a grouped frequency tables. • A frequency table should have a minimum of 5 classes and a maximum of 20 classes. • For small data sets, one can use between 5 and 10 classes. • For large data sets, one can use up to 20 classes.
Quantitative Frequency Tables – Grouped Example:The weights of 30 female students majoring in Physical Education on a college campus are as follows: 143, 113, 107, 151, 90, 139, 136, 126, 122, 127, 123, 137, 132, 121, 112, 132, 133, 121, 126, 104, 140, 138, 99, 134, 119, 112, 133, 104, 129,and123. Summarize the data with a frequency distribution using seven classes. Weight,lbs To display the data - make a histogram:
Quantitative Frequency Tables – Grouped Example Continued • Histogram – a graphical display of a frequency or a relative frequency table that uses classes and vertical (horizontal) bars (rectangles) of various heights to represent the frequencies. • This histogram has to have seven classes. • Classes for the weights are along the x-axisand frequencies are along the y-axis. • The number at the top of each rectangular box, represents the frequency for the class.
Quantitative Frequency Tables – Grouped Example Continued Histogram with 7 classes for the weights.
Quantitative Frequency Tables – Grouped Example Continued • From the histogram, the classes (intervals) are 85 – 95, 95 – 105, 105 – 115etc. with corresponding frequencies of 1, 3, 4, etc. • Observe - the upper class limit of 95 for the class 85 – 95 is listed as the lower class limit for the class 95 – 105. • Since the value of 95 cannot be included in both classes, we will use the convention that the upper class limit is not included in the class.
Quantitative Frequency Tables – Grouped Example Continued • That is, the class 85 – 95 should be interpreted as having the values 85 and up to 95 but not including the value of 95. • Using these observations, the grouped frequency distribution is constructed from the histogram.
Quantitative Frequency Tables – Grouped Example Continued • In the grouped frequency distribution, the sum of the relative frequencies did not add up to 1. This is due to rounding to four decimal places. • The same should be noted for the cumulative relative frequency column. Weight, lbs
Creating a Histogram It is an iterative process—try and try again. What bin size should you use? • Not too many bins with either 0 or 1 counts • Not overly summarized that you lose all the information • Not so detailed that it is no longer summary Rule of thumb: Start with 5 to10 bins. Look at the distribution and refine your bins. (There isn’t a unique or “perfect” solution.)
Not summarized enough Too summarized Same data set
Histograms Frequency Distributions, Example 2
Lower Class Limits Lower Class Limits - are the smallest numbers that can actually belong to different classes
Upper Class Limits Upper Class Limits - are the largest numbers that can actually belong to different classes
- 0.5 99.5 199.5 299.5 399.5 499.5 Class Boundaries Class Boundaries - are the numbers used to separate classes, but without the gaps created by class limits
Class Midpoints Class Midpoints (class mark) - can be found by adding the lower class limit to the upper class limit and dividing the sum by two. 49.5 149.5 249.5 349.5 449.5
100 100 100 100 100 Class Width Class Width - is the difference between two consecutive lower class limits or two consecutive lower class boundaries
Summary of Terminology • Classes - non-overlapping intervals the data is divided into. • Class Limits–The smallest and largest observed values in a given class. • Class Boundaries– Fall halfway between the upper class limit for the smaller class and the lower class limit for larger class. Used to close the gap between classes. • Class Width– The difference between the class boundaries for a given class. • Class mark– The midpoint of a class.
Constructing A Frequency Table 1. Decide on the number of classes (should be between 5 and 20) . 2. Calculate (round up). 3. Starting point: Begin by choosing a lower limit of the first class. 4. Using the lower limit of the first class and class width, proceed to list the lower class limits. 5. List the lower class limits in a vertical column and proceed to enter the upper class limits. 6. Go through the data set putting a tally in the appropriate class for each data value. (highest value) – (lowest value) class width number of classes
Histogram Then to complete the Histogram, graph the Frequency Table data.
Frequency Histogram vs Relative Frequency Histogram A bar graph in which the horizontal scale represents the classes of data values and the vertical scale represents the frequencies.
Frequency Histogram vs Relative Frequency Histogram Has the same shape and horizontal scale as a histogram, but the vertical scale is marked with relative frequencies.
Histograms - Facts • Histograms are useful when the data values are quantitative. • A histogram gives an estimate of the shape of the distribution of the population from which the sample was taken. • If the relative frequencies were plotted along the vertical axis to produce the histogram, the shape will be the same as when the frequencies are used.
Making Histograms on the TI-83/84 Use of Stat Plots on the TI-83/84 Raw Data: 548, 405, 375, 400, 475, 450, 412 375, 364, 492, 482, 384, 490, 492 490, 435, 390, 500, 400, 491, 945 435, 848, 792, 700, 572, 739, 572
Frequency Frequency Table Data: Class Limits 350 to < 450 450 to < 550 550 to < 650 650 to < 750 750 to < 850 850 to < 950 11 10 2 2 2 1
Quantitative Data Stem and leaf Plot
Stem-and-Leaf Plots • What is a stem-and-leaf plot? A stem-and-leaf plot is a data plot that uses part of a data value as the stemto form groups or classes and part of the data value as theleaf. • When is it used? Most often used for small or medium sized data sets. For larger data sets, histograms do a better job. • Note!: A stem-and-leaf plot has an advantage over a grouped frequency table or histogram, since a stem-and-leaf plot retains the actual data by showing them in graphic form.
Stemplots Include key – how to read the stemplot. 0|9 = 9 How to make a stemplot: • Separate each observation into a stem, consisting of all but the final (rightmost) digit, and a leaf, which is that remaining final digit. Stems may have as many digits as needed. Use only one digit for each leaf—either round or truncate the data values to one decimal place after the stem. • Write the stems in a vertical column with the smallest value at the top, and draw a vertical line at the right of this column. • Write each leaf in the row to the right of its stem, in increasing order out from the stem. Original data: 9, 9, 22, 32, 33, 39, 39, 42, 49, 52, 58, 70 STEM LEAVES
Stem-and-Leaf Plot • Example: Consider the following values – 96, 98, 107, 110, and 112. Construct a stem-and-leaf plot by using the units digits as the leaves. Stems and leaves for the data values. Stem-and-leaf plot for the data values. Key: 09|6 = 96 Stem Leaf 09 6 8 10 7 11 0 2
Example: Stem-and-Leaf Plot A sample of the number of admissions to a psychiatric ward at a local hospital during the full phases of the moon is as follows: 22, 30, 21, 27, 31, 36, 20, 28, 25, 33, 21, 38, 32, 35, 26, 19, 43, 30, 30, 34, 27, and 41. Display the data in a stem-and-leaf plot with the leaves represented by the unit digits. Key: 1|9 = 19 Stem Leaf 1 9 2 0 1 1 2 5 6 7 7 8 3 0 0 0 1 2 3 4 5 6 8 4 1 3
Variations of the StemPlot • Splitting Stems – (too few stems or classes) Split stems to double the number of stems when all the leaves would otherwise fall on just a few stems. • Each stem appears twice. • Leaves 0-4 go on the 1st stem and leaves 5-9 go on the 2nd stem. • Example: data – 120,121,121,123,124,124,125,125,125,126,126,128,129,130,132, 132,133,134,134,134,135,137,138,138,138,139 StemPlotStemPlot (splitting stems) 12 0 1 1344555668912 0 1 1344 13 022344457888912 5556689 13 0223444 13 578889
Stem-and-Leaf plots versus Histograms • Stem-and-leaf displays show the distribution of a quantitative variable, like histograms do, while preserving the individual values. • Stem-and-leaf displays contain all the information found in a histogram and, when carefully drawn, satisfy the area principle and show the distribution.
Example: Stem-and-Leaf Plot • Compare the histogram and stem-and-leaf display for the pulse rates of 24 women at a health clinic. Which graphical display do you prefer? Key: 5|6 = 56
Quantitative Data DOTPLOTS
Dot Plots • A dot plot is a plot that displays a dot for each value in a data set along a number line. If there are multiple occurrences of a specific value, then the dots will be stacked vertically.
A dotplot is a simple display. It just places a dot along an axis for each case in the data. The dotplot to the right shows Kentucky Derby winning times, plotting each race as its own dot. You might see a dotplot displayed horizontally or vertically. Dotplots
Shape, Center, and Spread When describing a distribution, make sure to always tell about : • shape • center • spread
What is the Shape of the Distribution? • Does the histogram have a single, central hump or several separated bumps (discuss possible modes)? • Is the histogram symmetricor skewed? • Does it have any unusualfeatures?
1. Humps Does the histogram have a single, central hump or several separated bumps? • Humps in a histogram are called modesor peaks. • A histogram with one main peak is considered unimodal; histograms with two peaks are bimodal; histograms with three or more peaks are called multimodal.
A bimodal histogram has two apparent peaks: Humps (cont.)
A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform or rectangular: Humps (cont.) - every class has approximately equal frequency - uniform distribution is symmetric with the added property that the bars are the same height.