530 likes | 649 Views
Chapter 4. Displaying Quantitative Data. Quantitative variables. Quantitative variables- record measurements or amounts of something. Must have units or a variable in which the numbers act as numerical values. Types of Displays. Histogram Stem and Leaf Displays Dotplots. Histogram.
E N D
Chapter 4 Displaying Quantitative Data
Quantitative variables • Quantitative variables- record measurements or amounts of something. Must have units or a variable in which the numbers act as numerical values
Types of Displays • Histogram • Stem and Leaf Displays • Dotplots
Histogram • A histogram uses adjacent bars to show the distribution of values in a quantitative variable. • Looks very similar to a bar graph but there are differences. • The horizontal axis is continuous not just labeled.
An example • The histogram shown below gives the number of children visited a particular zoo. • .
Histogram • A histogram is more convenient than a dot-plot or a stem and leaf plot because you don't have to represent each data point. However, you don't get to see the value of each data point. So a table of data and summary statistics would help people interpret the data.
Be Careful • A histogram gives the number of data points that fall into equal intervals. Care must be taken in choosing the intervals because it can affect the shape of the graph and misrepresent the true data.
1st graph • The first graph is uses intervals of size 10 yielding the intervals 40-50, 50-60, etc. In this case, Yemen had a life expectancy of 50 and was placed in the 50-60 column. Usually, borderline values are placed in the higher column.
2nd Graph • In the second graph, the intervals are 40-45, 45-50, 50-55, etc. This affects the shape of the graph.
Stem and Leaf Displays • Shows quantitative data values in a way that sketches the distribution of the data. • The stem-and-leaf plot below shows the number of students enrolled • in a dance class in the past 12 years. • The number of students are 81, 84, 85, 86, 93, 94, 97, 100, 102, 103, 110, and 111.
Dotplot • Graphs a dot for each case against a single axis • Graph the following number 5, 5,5,5,5,5,5,10,10,10,10,10 etc
Dotplot with two sets of data Example
Shape • To describe the shape of a distribution, look for • Symmetry versus skewness • Single versus multiple modes
Symmetrical • A distribution is symmetric if the two halves on either side of the center look approximately like the mirror images of each other.
Symmetrical • Symmetrical Histogram
Dotplot • Dots are mirrored images
Stem and leaf • Example
Skewed • A distribution is skewed if it is not symmetric and one tail stretched out further than the other. • Skewed left- when the longer tail stretches to the left. • Skewed right-when the longer tail stretched to the right
Examples • Skewed right
Skewed left • Left
All three • Examples
Funny example • http://www.herkimershideaway.org/apstatistics/ymmsum99/ymm111.htm
New seating chart • http://www.random.org/integers/
Stem-and-Leaf Revisited Compare the histogram and stem-and-leaf display for the pulse rates of 24 women at a health clinic. Which graphical display do you prefer?
Think Before You Draw, Again • Remember the “Make a picture” rule? • Now that we have options for data displays, you • need to Think carefully about which type of • display to make. • Before making a stem-and-leaf display, a • histogram, or a dotplot, check the • Quantitative Data Condition: The data are • values of a quantitative variable whose units • are known.
Constructing a Stem-and-Leaf Display • First, cut each data value into leading digits • (“stems”) and trailing digits (“leaves”). • Use the stems to label the bins. • Use only one digit for each leaf—either round or • truncate the data values to one decimal place • after the stem.
Center • A value that attempts the impossible by summarizing the entire distribution with a single number, a “typical” value. • Measures include the mean and median.
Spread • A numerical summary of how tightly the values are clustered around the center. • Measures of spread include the IQR and standard deviation.
Mode • a hump or local high pint in the shape of the distribution of a variable is called the mode. The apparent location of modes can change as the scale of a histogram is changed
Unimodal • Having one mode.
Bimodal • Distribution with two modes
Uniform A histogram that doesn’t appear to have any mode and in which all the bars are approximately the same height is called uniform:
Anything Unusual? • Do any unusual features stick out? • Sometimes it’s the unusual features that tell • us something interesting or exciting about the • data. • You should always mention any stragglers, or • outliers, that stand off away from the body of • the distribution. • Are there any gaps in the distribution? If so, • we might have data from more than one • group.
Outliers The following histogram has outliers—there are three cities in the leftmost bar:
Outliers • Are extreme values that do not appear to belong to the rest of the data. They may be unusual values that deserve further investigation, or they may be just mistakes; there’s no obvious way to tell. Do not delete them. Outliers can affect many statistical analyses, so you should always be alert to them.
Outliers • Away from the main portion of data
Where is the Center of the Distribution? • If you had to pick a single number to describe all • the data what would you pick? • It’s easy to find the center when a histogram is • unimodal and symmetric—it’s right in the middle. • On the other hand, it’s not so easy to find the • center of a skewed histogram or a histogram with • more than one mode. • For now, we will “eyeball” the center of the • distribution. In the next chapter we will find the • center numerically.
How Spread Out is the Distribution? • Variation matters, and Statistics is about • variation. • Are the values of the distribution tightly clustered • around the center or more spread out? • In the next two chapters, we will talk about • spread…
Comparing Distributions • Often we would like to compare two or more • distributions instead of looking at one distribution • by itself. • When looking at two or more distributions, it is • very important that the histograms have been put • on the same scale. Otherwise, we cannot really • compare the two distributions. • When we compare distributions, we talk about the • shape, center, and spread of each distribution.
Example Compare the following distributions of ages for female and male heart attack patients: Compare the following distributions of ages for female and male heart attack patients:
Web Pages Used • http://www.fao.org/wairdocs/ilri/x5469e/x5469e38.gif • http://www.sciencebuddies.org/science-fair-projects/descriptive_statistics_files/BimodalDist.jpg • http://images.absoluteastronomy.com/images/encyclopediaimages/b/bi/bimodal.png • http://upload.wikimedia.org/wikipedia/commons/b/bc/Bimodal_geological.PNG
Web Pages Used • http://mathworld.wolfram.com/images/eps-gif/OutlierHistogram_1000.gif
Timeplots: Order, Please! • For some data sets, we are interested in how the • data behave over time. In these cases, we • construct timeplots of the data.
*Re-expressing Skewed Data toImprove Symmetry (cont.) One way to make a skewed distribution more symmetric is to re-express or transform the data by applying a simple function (e.g., logarithmic function). Note the change in skewness from the raw data (Figure 4.11) to the transformed data (Figure 4.12):
What Can Go Wrong? • Don’t make a histogram of a categorical variable— • bar charts or pie charts should be used for • categorical data. • Don’t look for shape, • center, and spread • of a bar chart.
What Can Go Wrong? (cont.) • Don’t use bars in every display—save them for • histograms and bar charts. • Below is a badly drawn timeplot and the proper • histogram for the number of eagles sighted in a • collection of weeks: