310 likes | 566 Views
Chapter-2: Descriptive Statistics:. Tabular and Graphical Presentations Part A. Exploratory Data Analysis Cross-tabulations; Scatter Diagrams. Tabular and Graphical Procedures. Data. Qualitative Data. Quantitative Data. Tabular Methods. Graphical Methods. Tabular Methods.
E N D
Chapter-2: Descriptive Statistics: Tabular and Graphical PresentationsPart A Exploratory Data Analysis Cross-tabulations; Scatter Diagrams
Tabular and Graphical Procedures Data Qualitative Data Quantitative Data Tabular Methods Graphical Methods Tabular Methods Graphical Methods • Bar Graph • Pie Chart • Frequency • Distribution • Rel. Freq. Dist. • Percent Freq. • Distribution • Crosstabulation • Dot Plot • Histogram • Ogive • Scatter • Diagram • Frequency • Distribution • Rel. Freq. Dist. • Cum. Freq. Dist. • Cum. Rel. Freq. • Distribution • Stem-and-Leaf • Display • Crosstabulation
Stem-and-Leaf Display • A stem-and-leaf display is an exploratory data analysis method that shows both the rank order • and shape of the distribution of the data. • It is similar to a histogram, but it has the • advantage of showing the actual data values.
Example: Hudson Auto Repair • Sample of Parts Cost for 50 Tune-ups
Stem-and-Leaf Display 5 6 7 8 9 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9
Steps to Construct a Stem-and-Leaf Display • Arrange the first digits of each data to the • left of a vertical line. • To the right of the vertical line, record the last • digit for each item in rank order. • Each line in the display is referred to as a stem. • Each digit on a stem is a leaf.
Stem-and-Leaf Display 5 6 7 8 9 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9 a stem a leaf
Stretched Stem-and-Leaf Display • If we believe the display has condensed the data too much, we can stretch thedisplay by using two stems for each leading digit(s). • Whenever stretched, the first value of a stem corresponds to leaf values of 0 - 4, and the second • value of a stem corresponds to leaf values of 5 - 9.
Stretched Stem-and-Leaf Display 5 5 6 6 7 7 8 8 9 9 10 10 2 7 2 2 2 2 5 6 7 8 8 8 9 9 9 1 1 2 2 3 4 4 5 5 5 6 7 8 9 9 9 0 0 2 3 5 8 9 1 3 7 7 7 8 9 1 4 5 5 9
Stem-and-Leaf Display—Some notes • Leaf Units • A single digit is used to define each leaf. • In the preceding example, the leaf unit was 1. • Leaf units may be 100, 10, 1, 0.1, and so on. • Where the leaf unit is not shown, it is assumed • to equal 1.
Example: Leaf Unit = 0.1 If we have data with values such as 8.6 11.7 9.4 9.1 10.2 11.0 8.8 a stem-and-leaf display of these data will be Leaf Unit = 0.1 8 9 10 11 6 8 1 4 2 0 7
Example: Leaf Unit = 10 If we have data with values such as 1806 1717 1974 1791 1682 1910 1838 a stem-and-leaf display of these data will be Leaf Unit = 10 16 17 18 19 8 The 82 in 1682 is rounded down to 80 and is represented as an 8. 1 9 0 3 1 7
Crosstabulations and Scatter Diagrams • Thus far we have focused on methods that are used • to summarize the data on one variable at a time. • Cross tabulationand a scatter diagram are methods for • summarizing the data on two (or more) variables • simultaneously. • Help us understand the relationship between two variables.
Crosstabulation • A crosstabulation is a tabular summary of data for two variables. • Crosstabulation can be used when: • one variable is qualitative and the other is • quantitative, • both variables are qualitative, or • both variables are quantitative. • The left and top margin labels define the classes for • the two variables.
Cross-tabulation • Example: The number of Homes sold by style and price for the past two years is shown below. qualitative variable quantitative variable Home Style Price Range Colonial Log Split A-Frame Total 18 6 19 12 55 45 < $99,000 > $99,000 12 14 16 3 30 20 35 15 Total 100
Cross-tabulation • Insights Gained from Preceding Cross-tabulation • The greatest number of homes in the sample (19) • are a split-level style and priced at less than or • equal to $99,000. • Only three homes in the sample are an A-Frame • style and priced at more than $99,000.
Cross-tabulation: Row or Column Percentages • Converting the entries in the table into row percentages or column percentages can provide additional insight about the relationship between the two variables.
Cross-tabulation Frequency distribution for the price variable Home Style Price Range Colonial Log Split A-Frame Total 18 6 19 12 55 45 < $99,000 > $99,000 12 14 16 3 30 20 35 15 Total 100 Frequency distribution for the home style variable
Cross-tabulation: Row Percentages Home Style Price Range Colonial Log Split A-Frame Total 32.73 10.91 34.55 21.82 100 100 < $99,000 > $99,000 26.67 31.11 35.56 6.67 (Colonial and > $99K)/(All >$99K) x 100 = (12/45) x 100
Simpson’s Paradox • Often Times data in two or more cross-tabulations are aggregated to produce a summary cross-tabulation. • We must be careful in drawing conclusions about the • relationship between the two variables in the • aggregated cross-tabulation. • Simpson’ Paradox: In some cases the conclusions based upon an aggregated cross-tabulation can be completely different from the un-aggregated data.
Scatter Diagram and Trend line • A scatter diagram is a graphical presentation of the • relationship between two quantitative variables. • One variable is shown on the horizontal axis and the • other variable is shown on the vertical axis. • The general pattern of the plotted points suggests the • overall relationship between the variables. • A trend line is an approximation of the relationship.
Scatter Diagram • A Positive Relationship y x
Scatter Diagram • A Negative Relationship y x
Scatter Diagram • No Apparent Relationship y x
Example: Panthers Football Team • The Panthers football team is interested in investigating the relationship, if any, between interceptions made and points scored. x = Number of Interceptions y = Number of Points Scored 14 24 18 17 30 1 3 2 1 3
35 30 25 20 15 10 5 0 1 0 2 3 4 Scatter Diagram y Number of Points Scored x Number of Interceptions
Example: Panthers Football Team • Insights Gained from the Preceding Scatter Diagram • The scatter diagram indicates a positive relationship • between the number of interceptions and the • number of points scored. • Higher points scored are associated with a higher • number of interceptions. • The relationship is not perfect; all plotted points in • the scatter diagram are not on a straight line.