1 / 20

Exploring Data Distributions with Graphs

This chapter introduces different graphs for displaying distributions of categorical and quantitative variables. Learn how to construct bar graphs, histograms, stemplots, and more.

alatham
Download Presentation

Exploring Data Distributions with Graphs

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Chapter 3 Looking at Data:Distributions Chapter Three Looking At Data: Distributions Introduction 3.1 Displaying Distributions with Graphs

  2. 3.1 Displaying Distributions with Graphs • Variables • Examining Distributions of Variables • Graphs for Categorical Variables • Bar graphs • Pie charts • Graphs for Quantitative Variables • Histograms • Stemplots • Time plots

  3. Statistics Statistics is the science of learning from data. The first step in dealing with data is to organize your thinking about the data. An exploratory data analysis is the process of using statistical tools and ideas to examine data in order to describe their main features. Exploring Data • Begin by examining each variable by itself. Then move on to study the relationships among the variables. • Begin with a graph or graphs. Then add numerical summaries of specific aspects of the data.

  4. Variables We construct a set of data by first deciding which cases or observations or individuals orunits we want to study. For each case, we record information about characteristics that we call variables. Individual An object described by data Categorical variable Places individual into one of several groups or categories. Variable Characteristic of the individual Quantitative variable Takes numerical values for which arithmetic operations make sense.

  5. Quantitative Variables • Quantitative variables can either be counts or measurements or rates. • SEE EXAMPLE 3.6 ON PAGE 84 (IN THE CHAPTER 3 INTRODUCTION) FOR WHY RATES ARE IMPORTANT… • RATE = # of occurrences of the event per X in the population of all possible occurrences (where X is a large number (10,000 ; 100,000 e.g.) • Murder rate in NH County = (#murders in NHC/#possible murders(i.e., population) * large number (like 100,000) • (in 2012, 9 murders, population estimate=209234; so the murder rate in NH County in 2012 is 9/209234 =0.00004301. Multiply by 100,000 to get the rate per 100,000. 0.00004301 * 100000 = 4.3 murders per 100,000 people (or per capita) – Guilford County: 26/500879 * 100000 = 5.2; etc….

  6. Distribution of a Variable To examine a single variable, we graphically display its distribution. • The distribution of a variable tells us what values it takes and with what frequency it takes on these values. • Distributions can be displayed using a variety of graphical tools. The proper choice of graph depends on the kind of the variable and how easy it is to draw them…. JMP makes easy work of graphing! Categorical variable Pie chart – I don’t recommend pie charts! Bar graph – these are fine! Quantitative variable Histogram Stemplot

  7. Categorical Variables • The distribution of a categorical variable lists the categories and gives the count or percent of individuals who fall into that category. • Pie charts show the distribution of a categorical variable as a “pie” whose slices are sized by the counts or percents for the categories – hard to draw and hard to interpret! • Bar graphs represent each category as a bar whose heights show the category counts or percents.

  8. Pie Charts and Bar Graphs

  9. Quantitative Variables • The distribution of a quantitative variable tells us what values the variable takes on and the frequency with which it takes on those values. • Histograms show the distribution of a quantitative variable by using bars whose height represents the number of individuals who take on a value within a particular class. • Stemplots separate each observation into a stem and a leaf that are then plotted to display the distribution while maintaining the original values of the variable – original values are not hidden as in the histogram. • Time plots plot each observation (on the vertical axis) against the time at which it was measured (on the horizontal axis).

  10. Stemplots • To construct a stemplot: • Separate each observation into a stem(first part of the number) and a leaf(the remaining part of the number). • Write the stems in a vertical column; draw a vertical line to the right of the stems. • Write each leaf in the row to the right of its stem; order leaves if desired.

  11. 10 0166 11 009 12 0034578 13 00359 14 08 15 00257 16 555 17 000255 18 000055567 19 245 20 3 21 025 22 0 23 24 25 26 0 Stemplots Example:Weight data―Introductory Statistics class 5 2 2 Key 20|3 means203 pounds Stems = 10sLeaves = 1s Stems Leaves

  12. 151516161717 Stemplots • If there are very few stems (when the data cover only a very small range of values), then we may want to create more stems by splittingthe original stems. • Example: If all of the data values were between 150 and 179, then we may choose to use the following stems: Leaves 0–4 would go on each upper stem (first “15”), and leaves 5–9 would go on each lower stem (second “15”).

  13. Histograms • For quantitative variables that take many values and/or large datasets: • Divide the possible values into classes(equal widths). • Count how many observations fall into each interval (may change to percents). • Draw picture representing the distribution―bar heights are equivalent to the number (percent) of observations in each interval. • JMP does all three of the above with a couple of clicks: Analyze -> Distribution -> choose the variable to be plotted.

  14. Histograms Example:Weight data―Introductory Statistics class

  15. Examining Distributions • In any graph of data, look for the overall patternand for striking deviationsfrom that pattern. • You can describe the overall pattern by its shape,center, and spread. • An important kind of deviation is an outlier, an individual that falls outside the overall pattern.

  16. Examining Distributions - Shape • A distribution is symmetricif the right and left sides of the graph are approximately mirror images of each other. • A distribution is skewed to the right (right-skewed) if the right side of the graph (containing the half of the observations with larger values) is much longer than the left side. • A distribution is skewed to the left (left-skewed) if the left side of the graph is much longer than the right side. Symmetric Skewed-right Skewed-left

  17. An important kind of deviation is an outlier.Outliersare observations that lie outside the overall pattern of a distribution. Always look for outliers and try to explain them. Outliers This overall pattern is fairly symmetrical, except for two states that clearly do not belong to the main trend. Alaska and Florida have unusual representation of the elderly in their populations. Large gaps in the distribution are places to look for outliers…. Alaska Florida

  18. Time Plots A time plot shows behavior of a quantitative variable over time. • Time is always on the horizontal axis, and the variable being plotted is on the vertical axis. • Look for an overall pattern (trend), and deviations from this trend. Connecting the data points by lines may emphasize this trend. • Look for patterns that repeat at known regular intervals (seasonal variations) • Go over the US Regular Retail Gas Prices in JMP

  19. Time Plots Look at the gas price data…

  20. HW – use JMP whenever possible to draw the graphs… HW: Begin reading Intro. to Ch.3 and section 3.1; work through the Examples in 3.1; do the Exercises #3.7, 3.10-3.14, 3.21, 3.24, 3.25, 3.27, 3.32, 3.33-3.36, 3.38 (JMP), 3.39

More Related