270 likes | 286 Views
Topic 1. Organizing Information Pictorially Using Charts and Graphs. Characteristics of the individuals under study are called variables Some variables have values that are attributes or characteristics … those are called qualitative or categorical variables
E N D
Topic 1 Organizing Information Pictorially Using Charts and Graphs
Characteristics of the individuals under study are called variables • Some variables have values that are attributes or characteristics … those are called qualitative or categorical variables • Some variables have values that are numeric measurements … those are called quantitative variables • The suggested approaches to analyzing problems vary by the type of variable
Examples of categorical variables • Gender • Zip code • Blood type • States in the United States • Brands of televisions • Categorical variables have category values … those values cannot be added, subtracted, etc.
Examples of quantitative variables • Temperature • Height and weight • Sales of a product • Number of children in a family • Points achieved playing a video game • Quantitative variables have numeric values … those values can be added, subtracted, etc.
A simple data set is blue, blue, green, red, red, blue, red, blue • A frequency table for this qualitative data is • The most commonly occurring color is blue
The relativefrequencies are the proportions (or percents) of the observations out of the total • A relative frequency distribution lists • Each of the categories • The relative frequency for each category
A relative frequency table for this qualitative data is • A relative frequency table can also be constructed with percents (50%, 12.5%, and 37.5% for the above table)
Bar graphs for categorical data • Bar graphs for our simple data (using Excel) • Frequency bar graph • Relative frequency bar graph
Comparative Bar Graph • An example side-by-side bar graph comparing educational attainment in 1990 versus 2003
Pie Chart • An example of a pie chart
Histogram for quantitative data • Quantitative data sometimes cannot be put directly into frequency tables since they do not have any obvious categories • Categories are created using classes, or intervals of numbers • The data is then put into the classes
For ages of adults, a possible set of classes is 20 – 29 30 – 39 40 – 49 50 – 59 60 and older • For the class 30 – 39 • 30 is the lowerclasslimit • 39 is the upperclasslimit • The classwidth is the difference between the upper class limit and the lower class limit • For the class 30 – 39, the class width is 40 – 30 = 10
All the classes have the same widths, except for the last class • The class “60 and above” is an open-endedclass because it has no upper limit • Classes with no lower limits are also called open-ended classes
The classes and the number of values in each can be put into a frequency table • In this table, there are 1147 subjects between 30 and 39 years old
Good practices for constructing tables for continuous variables • The classes should not overlap • The classes should not have any gaps between them • The classes should have the same width (except for possible open-ended classes at the extreme low or extreme high ends) • The class boundaries should be “reasonable” numbers • The class width should be a “reasonable” number
Just as for discrete data, a histogram can be created from the frequency table • Instead of individual data values, the categories are the classes – the intervals of data
Stemplots • A stemplot is a different way to represent data that is similar to a histogram • To draw a stem-and-leaf plot, each data value must be broken up into two components • The stem consists of all the digits except for the right most one • The leaf consists of the right most digit • For the number 173, for example, the stem would be “17” and the leaf would be “3”
Stemplots • In the stem-and-leaf plot below • The smallest value is 56 • The largest value is 180 • The second largest value is 178
Stemplots • To draw a stemplot • Write all the values in ascending order • Find the stems and write them vertically in ascending order • For each data value, write its leaf in the row next to its stem • The resulting leaves will also be in ascending order • The list of stems with their corresponding leaves is the stem-and-leaf plot
Comparative Stemplots If we wanted to compare two sets of data, we could draw two stem-and-leaf plots using the same stem, with leaves going left (for one set of data) and right (for the other set)
A useful way to describe a variable is by the shape of its distribution • Some common distribution shapes are • Uniform • Bell-shaped (or normal) • Skewed right • Skewed left
A variable has a uniform distribution when • Each of the values tends to occur with the same frequency • The histogram looks flat
A variable has a bell-shaped distribution when • Most of the values fall in the middle • The frequencies tail off to the left and to the right • It is symmetric
A variable has a skewedright distribution when • The distribution is not symmetric • The tail to the right is longer than the tail to the left • The arrow from the middle to the long tail points right Right
A variable has a skewedleft distribution when • The distribution is not symmetric • The tail to the left is longer than the tail to the right • The arrow from the middle to the long tail points left Left
The two graphs show the same data … the difference seems larger for the graph on the left • The vertical scale is truncated on the left
The gazebo on the right is twice as large in each dimension as the one on the left • However, it is much more than twice as large as the one on the left Original “Twice” as large