490 likes | 622 Views
Chapter Two. Graphical and Tabular Descriptive Techniques. Introduction & Re-cap…. Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful information is produced.
E N D
Chapter Two Graphical and Tabular Descriptive Techniques
Introduction & Re-cap… • Descriptive statistics involves arranging, summarizing, and presenting a set of data in such a way that useful information is produced. • Its methods make use of graphical techniques and numerical descriptive measures (such as averages) to summarize and present the data. Statistics Data Information
Populations & Samples • The graphical & tabular methods presented here apply to both entire populations and samples drawn from populations. Population Sample Subset
Definitions… • A variable is some characteristic of a population or sample. • E.g. student grades. • Typically denoted with a capital letter: X, Y, Z… • The valuesof the variable are the range of possible values for a variable. • E.g. student marks (0..100) • Data are the observed values of a variable. • E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
Types of Data & Information • Data (at least for purposes of Statistics) fall into three main groups: • Interval Data • Nominal Data • Ordinal Data
Interval Data… • Intervaldata • • Real numbers, i.e. heights, weights, prices, etc. • • Also referred to as quantitative or numerical. • Arithmetic operations can be performed on Interval Data, thus its meaningful to talk about 2*Height, or Price + $1, and so on.
Nominal Data… • Nominal Data • • Thevalues of nominal data are categories. • E.g. responses to questions about marital status, coded as: • Single = 1, Married = 2, Divorced = 3, Widowed = 4 • Because the numbers are arbitrary arithmetic operations don’t make any sense (e.g. does Widowed ÷ 2 = Married?!) • Nominal data are also called qualitative or categorical.
Ordinal Data… • OrdinalData appear to be categorical in nature, but their values have an order; a ranking to them: • E.g. College course rating system: • poor = 1, fair = 2, good = 3, very good = 4, excellent = 5 • While its still not meaningful to do arithmetic on this data (e.g. does 2*fair = very good?!), we can say things like: • excellent > poor or fair < very good • That is, order is maintained no matter what numeric values are assigned to each category.
Types of Data & Information… Data Categorical? Interval Data N Y Ordered? Ordinal Data Y Categorical Data N Nominal Data
E.g. Representing Student Grades… Data Categorical? Interval Data e.g. {0..100} N Y Ordered? Ordinal Data e.g. {F, D, C, B, A} Y Categorical Data N Rank order to data Nominal Data e.g. {Pass | Fail} NO rank order to data
Calculations for Types of Data • As mentioned above, • • All calculations are permitted on interval data. • • Only calculations involving a ranking process are allowed for ordinal data. • • No calculations are allowed for nominal data, save counting the number of observations in each category. • This lends itself to the following “hierarchy of data”…
Hierarchy of Data… • Interval • Values are real numbers. • All calculations are valid. • Data may be treated as ordinal or nominal. • Ordinal • Values must represent the ranked order of the data. • Calculations based on an ordering process are valid. • Data may be treated as nominal but not as interval. • Nominal • Values are the arbitrary numbers that represent categories. • Only calculations based on the frequencies of occurrence are valid. • Data may not be treated as ordinal or interval.
Graphical & Tabular Techniques for Nominal Data… • The only allowable calculation on nominal data is to count the frequency of each value of the variable. • We can summarize the data in a table that presents the categories and their counts called a frequency distribution. • A relative frequency distribution lists the categories and the proportion with which each occurs. • Refer to Example 2.1
Example 2.1 • This is on page 24 of your text. The text includes the data. • The student placement office at a university conducted a survey of last year’s business school graduates to determine the general areas in which the graduates found jobs. The areas of employment are: • Accounting • Finance • General management • Marketing/Sales • Other
Nominal Data (Frequency) Bar Charts are often used to display frequencies…
Nominal Data (Relative Frequency) Pie Charts show relative frequencies…
Nominal Data It all the same information, (based on the same data). Just different presentation.
Graphical Techniques for Interval Data • There are several graphical methods that are used when the data are interval (i.e. numeric, non-categorical). • The most important of these graphical methods is the histogram. • The histogram is not only a powerful graphical technique used to summarize interval data, but it is also used to help explain probabilities.
Example 2.4 • This is on page 34 of your text book. • A long-distance telephone company wanted to acquire information about the monthly bills of new subscribers in the first month after signing with the company. A survey the first month’s bills of 200 new residential subscribers was conducted, and the data recorded.
Building a Histogram… • Collect the Data (Example 2.4) • Create a frequency distribution for the data… • How? • a) Determine the number of classes to use… • How? • Refer to Table 2.6: With 200 observations, we should have between 7 & 10 classes… Alternative, we could use Sturges’ formula: Number of class intervals = 1 + 3.3 log (n)
Building a Histogram… • Collect the Data • Create a frequency distribution for the data… • How? • a) Determine the number of classes to use. [8] • b) Determine how large to make each class… • How? • Look at the range of the data, that is, • Range = Largest Observation – Smallest Observation • Range = $119.63 – $0 = $119.63 • Then each class width becomes: • Range ÷ (# classes) = 119.63 ÷ 8 ≈ 15
Building a Histogram… • Collect the Data • Create a frequency distribution for the data… • How? • a) Determine the number of classes to use. [8] • b) Determine how large to make each class. [15] • c) Place the data into each class… • each item can only belong to one class; • classes contain observations greater than their lower limits and less than or equal to their upper limits.
Building a Histogram… • Collect the Data • Create a frequency distribution for the data. 3) Draw the Histogram…
Building a Histogram… • Collect the Data • Create a frequency distribution for the data. • Draw the Histogram.
Interpret… (18+28+14=60)÷200 = 30% i.e. nearly a third of the phone bills are $90 or more. about half (71+37=108) of the bills are “small”, i.e. less than $30 There are only a few telephone bills in the middle range.
Shapes of Histograms… • Symmetry • A histogram is said to be symmetric if, when we draw a vertical line down the center of the histogram, the two sides are identical in shape and size: Frequency Frequency Frequency Variable Variable Variable
Shapes of histograms Symmetry There are four typical shape characteristics
Shapes of histograms Skewness Negatively skewed Positively skewed
Shapes of Histograms… • Skewness • A skewed histogram is one with a long tail extending to either the right or the left: Frequency Frequency Variable Variable Positively Skewed Negatively Skewed
Shapes of Histograms… • Modality • A unimodal histogram is one with a single peak, while a bimodal histogram is one with two peaks: Bimodal Unimodal Frequency Frequency Variable Variable A modal class is the class with the largest number of observations
Shapes of Histograms… • Bell Shape • A special type of symmetricunimodal histogram is one that is bell shaped: Frequency Many statistical techniques require that the population be bell shaped. Drawing the histogram helps verify the shape of the population in question. Variable Bell Shaped
Interpreting a Histogram: • In this class we will use 4 steps in interpreting a histogram: • What is the range of the typical data? What is the most frequent data? • How many data pieces was in the sample? What was the highest and lowest data collected? • Are there any “outliers” or unusual data? If so, what was the value(s)? • What is the shape of the histogram? • The interpretation should be in terms of the variable of interest. All statements refer to this variable, and the correct units. All parts of the interpretation should be phrased in complete sentences
INTERPRETING A HISTOGRAM (Example) It is desired to describe the daily sales of a newspaper. A sample of sales for 70 days is obtained, and these are shown below. The sales are in 1000’s. Obtain a histogram of these sales, and completely describe the histogram.
The histogram below was created using Minitab. This newspaper typically sold about 100,000 copies per day. Sales between 85,000 and 115,000 were quite frequent. For this sample of 70 days’ sales, the smallest number of newspapers sold was about 75,000 and the largest is about 150,000. There were an unusually large number of newspapers sold one day. The day on which 150,000 newspapers were sold is atypical. Finally, due to the atypical large value, the histogram is slightly skewed to the right, or positively skewed. Without this value, the histogram would be reasonably symmetric.
Interpreting histograms (Comparing two) • Example 2.5: (p 41) Selecting an investment • An investor is considering investing in one out of two investments. • The returns on these investments were recorded. • From the two histograms, how can the investor interpret the • Expected returns • The spread of the return (the risk involved with each investment)
The center for A The center for B Example 2.5 - Histograms 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- -15 0 15 30 45 60 75 -15 0 15 30 45 60 75 Return on investment A Return on investment B Interpretation:The center of the returns of Investment Ais slightly lower than that for Investment B
17 16 26 34 43 46 Example 2.5 - Histograms Sample size =50 Sample size =50 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- -15 0 15 30 45 60 75 -15 0 15 30 45 60 75 Return on investment A Return on investment B Interpretation:The spread of returns for Investment Ais less than that for investment B
Example 2.5 - Histograms 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- 18- 16- 14- 12- 10- 8- 6- 4- 2- 0- -15 0 15 30 45 60 75 -15 0 15 30 45 60 75 Return on investment A Return on investment B Interpretation:Both histograms are slightly positively skewed. There is a possibility of large returns.
Providing information • Example 2.5: Conclusion • It seems that investment A is better, because: • Its expected return is only slightly below that of investment B • The risk from investing in A is smaller. • The possibility of having a high rate of return exists for both investment.
Histogram Comparison… • Compare & contrast the following histograms based on data from Example 2.6 (p 43)& Example 2.7(p44). The two courses have very different histograms… unimodal vs. bimodal spread of the marks (narrower | wider)
Graphing the Relationship Between Two Interval Variables… • We are frequently interested in how two interval variables are related. • To explore this relationship, we employ a scatter diagram, which plots two variables against one another. • The independent variable is labeled X and is usually placed on the horizontal axis, while the other, dependent variable, Y, is mapped to the vertical axis.
Scatter Diagram… • Example 2.9 (page 58) A real estate agent wanted to know to what extent the selling price of a home is related to its size… • Collect the data • Determine the independent variable (X – house size) and the dependent variable (Y – selling price) • Use Excel to create a “scatter diagram”…
Scatter Diagram… • It appears that in fact there is a relationship, that is, the greater the house size the greater the selling price…
Patterns of Scatter Diagrams… • Linearity and Direction are two concepts we are interested in Positive Linear Relationship Negative Linear Relationship Weak or Non-Linear Relationship
Time Series Data… • Observations measured at the same point in time are called cross-sectional data. • Observations measured at successive points in time are called time-series data. • Time-series data graphed on a line chart, which plots the value of the variable on the vertical axis against the time periods on the horizontal axis.
Line Chart… • From Example 2.10, plot the total amounts of U.S. income tax for the years 1987 to 2002…
Line Chart… • From ’87 to ’92, the tax was fairly flat. Starting ’93, there was a rapid increase taxes until 2001. Finally, there was a downturn in 2002.
Summary • From this chapter you are expected to remember: • Working definitions of types of data: quantitative, qualitative, interval, nominal, ordinal • Working definitions of frequency and relative frequency distributions, classes, histogram, skewedness, symmetry • How to make a histogram. • How to interpret a histogram. • How to compare two histograms.