1 / 31

Slides by JOHN LOUCKS St. Edward’s University

Slides by JOHN LOUCKS St. Edward’s University. y. x. Chapter 2, Part B Descriptive Statistics: Tabular and Graphical Presentations. Exploratory Data Analysis: Stem-and-Leaf Display Crosstabulations and Scatter Diagrams. Exploratory Data Analysis.

iden
Download Presentation

Slides by JOHN LOUCKS St. Edward’s University

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Slides by JOHN LOUCKS St. Edward’s University

  2. y x Chapter 2, Part BDescriptive Statistics:Tabular and Graphical Presentations • Exploratory Data Analysis: Stem-and-Leaf Display • Crosstabulations and Scatter Diagrams

  3. Exploratory Data Analysis • The techniques of exploratory data analysis consist of • simple arithmetic and easy-to-draw pictures that can • be used to summarize data quickly. • One such technique is the stem-and-leaf display.

  4. Stem-and-Leaf Display • A stem-and-leaf display shows both the rank order • and shape of the distribution of the data. • It is similar to a histogram on its side, but it has the • advantage of showing the actual data values. • The first digits of each data item are arranged to the • left of a vertical line. • To the right of the vertical line we record the last • digit for each item in rank order. • Each line in the display is referred to as a stem. • Each digit on a stem is a leaf.

  5. Example: Hudson Auto Repair The manager of Hudson Auto would like to have a better understanding of the cost of parts used in the engine tune-ups performed in the shop. She examines 50 customer invoices for tune-ups. The costs of parts, rounded to the nearest dollar, are listed on the next slide.

  6. Example: Hudson Auto Repair • Sample of Parts Cost ($) for 50 Tune-ups

  7. Stem-and-Leaf Display 5 6 7 8 9 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9 a stem a leaf

  8. Stretched Stem-and-Leaf Display • If we believe the original stem-and-leaf display has • condensed the data too much, we can stretch the • display by using two stems for each leading digit(s). • Whenever a stem value is stated twice, the first value • corresponds to leaf values of 0 - 4, and the second • value corresponds to leaf values of 5 - 9.

  9. Stretched Stem-and-Leaf Display 5 5 6 6 7 7 8 8 9 9 10 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9

  10. Stem-and-Leaf Display • Leaf Units • A single digit is used to define each leaf. • In the preceding example, the leaf unit was 1. • Leaf units may be 100, 10, 1, 0.1, and so on. • Where the leaf unit is not shown, it is assumed • to equal 1. • The single digit can be multiplied times the • stem-and-leaf numbers to create the original • data.

  11. Stem-and-Leaf Display • Although the stem-and-leaf display may appear to offer the same information as a histogram, it has two primary advantages: 1. The stem-and-leaf display is easier to construct by hand. 2. Within a class interval, the stem-and-leaf display provides more information than the histogram because the stem-and-leaf shows the actual data.

  12. Example: Leaf Unit = 0.1 If we have data with values such as 8.6 11.7 9.4 9.1 10.2 11.0 8.8 a stem-and-leaf display of these data will be Leaf Unit = 0.1 8 9 10 11 6 8 1 4 2 0 7

  13. Example: Leaf Unit = 10 If we have data with values such as 1806 1717 1974 1791 1682 1910 1838 a stem-and-leaf display of these data will be Leaf Unit = 10 16 17 18 19 8 The 82 in 1682 is rounded down to 80 and is represented as an 8. 1 9 0 3 1 7

  14. Crosstabulations and Scatter Diagrams • Thus far we have focused on presentations that are • used to summarize the data for one variable at a time. • Often a manager is interested in presentations that will help understand the relationship between two variables. • Crosstabulation and a scatter diagram are two • methods for summarizing the data for two variables • simultaneously.

  15. Crosstabulation • A cross-tabulationis a tabular summary of data for two variables to provide insight about the relationship between two variables. • Cross-tabulation can be used when: • one variable is qualitative and the other is • quantitative, • both variables are qualitative, or • both variables are quantitative. • The left and top margin labels define the classes for • the two variables.

  16. Crosstabulation • Example: Finger Lakes Homes The number of Finger Lakes homes sold for each style and price for the past two years is shown below. quantitative variable qualitative variable Home Style Price Range Colonial Log Split A-Frame Total 18 6 19 12 55 45 < $99,000 > $99,000 12 14 16 3 30 20 35 15 Total 100

  17. Crosstabulation • Insights Gained from Preceding Crosstabulation • The greatest number of homes (19) in the sample • are a split-level style and priced at less than or • equal to $99,000. • Only three homes in the sample are an A-Frame • style and priced at more than $99,000.

  18. Crosstabulation Frequency distribution for the price variable Home Style Price Range Colonial Log Split A-Frame Total 18 6 19 12 55 45 < $99,000 > $99,000 12 14 16 3 30 20 35 15 Total 100 Frequency distribution for the home style variable

  19. Crosstabulation: Row or Column Percentages • Converting the entries in the table into rowpercentages or column percentages can provide additional insight about the relationship between the two variables. • The right and bottom margins of the cross-tabulation provide the frequency distribution. • Dividing the totals in the right and bottom margins of the cross-tabulation by the total for that column provides a relative and percent frequency distribution. • The sum of the values in each column or row will not add exactly to the totals for that column or row because the values being summed are rounded.

  20. Crosstabulation: Row Percentages Home Style Price Range Colonial Log Split A-Frame Total 32.73 10.91 34.55 21.82 100 100 < $99,000 > $99,000 26.67 31.11 35.56 6.67 Note: row totals are actually 100.01 due to rounding. (Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100

  21. Crosstabulation: Column Percentages Home Style Price Range Colonial Log Split A-Frame 60.00 30.00 54.29 80.00 < $99,000 > $99,000 40.00 70.00 45.71 20.00 100 100 100 100 Total (Colonial and > $99K)/(All Colonial) x 100 = (12/30) x 100

  22. Crosstabulation: Simpson’s Paradox • Data in two or more cross-tabulations are often aggregated to produce a summary cross-tabulation. • We must be careful in drawing conclusions about the • relationship between the two variables in the • aggregated cross-tabulation. • Simpson’ Paradox: In some cases the conclusions based upon an aggregated cross-tabulation can be completely reversed if we look at the un-aggregated data. Always use color to mark patterns in the aggregated data, decompose the data into separate cross-tabulations, and mark the separate cross-tabulations with the same method. If the patterns are not the same, you will have Simpson’s Paradox.

  23. Scatter Diagram and Trendline • A scatter diagram is a graphical presentation of the • relationship between two quantitative variables. • One variable is shown on the horizontal xaxis and the • other variable is shown on the vertical y axis. • The general pattern of the plotted points suggests the • overall relationship between the variables. • A trendline is an approximation of the relationship.

  24. Scatter Diagram and Trendline • A Positive Relationship y ordered x

  25. Scatter Diagram and Trendline • A Negative Relationship y ordered x

  26. Scatter Diagram and Trendline • No Apparent Relationship y random x

  27. Example: Panthers Football Team • Scatter Diagram and Trendline The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions y = Number of Points Scored 14 24 18 17 30 1 3 2 1 3

  28. 35 30 25 20 15 10 5 0 1 0 2 3 4 Scatter Diagram and Trendline y Number of Points Scored x Number of Interceptions

  29. Example: Panthers Football Team • Insights Gained from the Preceding Scatter Diagram • The scatter diagram and trendlineindicate a • positive relationship between the number of • interceptions and the number of points scored. • Higher points scored are associated with a higher • number of interceptions. • The relationship is not perfect; all plotted points in • the scatter diagram are not on a straight line.

  30. Tabular and Graphical Procedures Data Qualitative Data Quantitative Data Tabular Methods Graphical Methods Tabular Methods Graphical Methods • Bar Graph • Pie Chart • Frequency • Distribution • Relative Freq. • Distribution • Percent Freq. • Distribution • Crosstabulation • Dot Plot • Histogram • Ogive • Stem-and- • Leaf Display • Scatter • Diagram • with Trend • Line • Frequency Dist. • Rel. Freq. Dist. • % Freq. Dist. • Cum. Freq. Dist. • Cum. Rel. Freq. • Distribution • Cum. % Freq. • Distribution • Crosstabulation

  31. End of Chapter 2, Part B

More Related