320 likes | 434 Views
DESCRIPTIVE STATISTICS. UNIT 2: Measures of Central Tendency. TouchText. The Median The Mode The Mean. Problems and Exercises. Next. Statistics vs. Data Presentations .
E N D
DESCRIPTIVE STATISTICS UNIT 2: Measures of Central Tendency TouchText • The Median • The Mode • The Mean Problems and Exercises Next
Statistics vs. Data Presentations Frequency distributions and histograms are common, useful ways of collected and presenting (quantitative) data, both of a population and a sample. Dictionary By contrast, statistics are numbers that characterizethat sample data. Statistics are often collected and presented in conjunction with frequency distributions and histograms, but this is not necessary. Take Notes Back Next
Measures of Central Tendency The most frequently cited, and often most significant, statistic in a data set is its “center” in some sense of the word. There are three such statistics, which are referred to as “measures of central tendency”. Dictionary For quantitative data (only), there are three standard measures of central tendency. These comprise: The MEDIAN The MODE … and most importantly: The MEAN Take Notes Back Next
Population vs. Samples With the introduction of sample statistics, it is necessary to clarify whether we are referring to a population or a sample. The notation is as follows: Dictionary Also, the expected value of a sample mean is equal to the population mean (making the sample mean an unbiased estimator of the population mean). Therefore, Take Notes Back Next
The MEDIAN To get the median of a data set…. Dictionary • The data set must be orderedfrom smallest to biggest, but not grouped together as when creating a frequency distribution. • The MEDIAN observation is the one that is exactly in the middleof the ordered data set, and the median value of the variable is the that of the median observation. Math Goodies Math is Fun Take Notes Back Next
The MEDIAN (cont.) • With there is an even total of n observations in the sample data set, this implies that … • There are n/2 observations below the median. • There are n/2 observations above the data set. • The median observation is not actually a part of the data set; but rather, it is an average of the two (closest to the) middle observations in the ordered data set. • The median value of the variable is calculated as the average of the two variables of the two most middle observations. Dictionary Math Goodies • With there is an odd total of n observations in the sample data set, this implies that … • There are (n – 1)/2 observations below the median. • There are (n – 1)/2 observations above the data set. • The median value comprises (1/n) of the data set. • The median observation is the [(n-1)/2 + 1] observation in the ordered data set. • The median value of the variable is that of the median observation. Math is Fun Take Notes Back Next
The MEDIAN (Example: even number of observations) Median (middle) Dictionary Math Goodies 10 passengers younger 10 passengers older Median Age = (19 + 19)/2 = 19 n = 20 passengers. The median is average age of the 10th and 11th ordered passenger. * Same example and data set as in previous unit. Take Notes Back Next
The MEDIAN (Example: odd number of observations) Median (middle) Math Goodies Dictionary 10 passengers younger 10 passengers older Median Age = Age of 11th passenger = 19 Now, there is an additional 21st passenger. The median age is now that of the 11th passenger, with (21-1)/2 = 10 passengers older, and 10 younger then him/her. * Same example and data set except a 21st passenger, age 10, is added. Take Notes Back Next
About the examples for calculating the Median … • In the first example (even data), the age of the two most middle observations was the same (19), so mathematically calculating an average was not really necessary. • In the second example (odd data), an extra younger passenger was added to the data set. The median observation was now a single passenger, but the median stayed the same (19). • These results were the result of the particular sample data set used, and are not always true. • On the other hand, it is worth noting that adding extremely large or small data need not have an effect on the median. This point will be re-emphasized again later in this unit. Dictionary Take Notes Back Next
More About the Median • The median can be a particularly useful – and even preferred - measure of central tendency when the data are skewed, or when there is no obvious “center” to data. Dictionary Do an internet search of the word “median” and try to find a recent news item that discusses “median ______ (something)”. Examples might be “median income”, “median growth”, etc. Speculate with your classmates why the median was used in this particular case, instead of another measure of central tendency. Take Notes Back
The MODE • The mode is the observations’ variable value that occurs most frequently. • Unlike when calculating the median, one need not re-order the data from smallest to largest when calculated the mode. • Graphically, the mode is the highest or longest bar on a (relative) frequency distribution. • It is possible to have two or more modes for a frequency distribution. Dictionary Take Notes Back Next
The MODE (Example) Dictionary Mode Math Goodies * Same example and data set as in previous unit. Take Notes Back Next
Properties of the MODE • The value of the mode will depend critically on if/how data is grouped. • The mode is unaffected by extremely high or low outlying data. • The mode is a very useful way of characterizing qualitative data. Dictionary Use the data from the voting results (right). On MS Excel (link below), create both a bar chart and pie chart to show the relative frequency distribution. Identify the mode. Take Notes Back Next
The MEAN • The mean is a mathematical calculation that requires no ordering or grouping into frequencies. • The mean – commonly referred to as the average– is simply the total of the data observation values, divided by the n number of observation values. Dictionary As notation, if X1 is the first observation, X2 is the second observation, and so on, until Xn is the nth observation, the sample mean is equal to … StatTrek Take Notes Back Next
The MEAN: Notation The mean is calculated as: Dictionary How the mean is denoted: For a sample: mean (X) is denoted X (pronounced “X bar”) For a population: mean (X) is denoted μ (pronounced “mu” – the Greek letter M) StatTrek Sometimes, because it is easier to enter on computers, the (population or sample) mean is written as: mean (X) or avg (X) (for average of X), or E[X] (for the expectation of X). Take Notes Back Next
The MEAN (Example) In the example, each passenger represents one observation (for a total of n = 20 observations), and the age of each passenger represents the observation’s value (i.e. the Xi). Dictionary * Same example and data set as in previous unit. *Note that in this example the mean of 28.35 is much larger than the mode or the median of 19. It is effected by the small number of very old passengers, offset against the many, but only slightly younger-than-average teenagers. See the Information page. Mean Take Notes Back Next
The MEAN (visually) One can think of the bars of a frequency distribution a weights on a balance beam, and the average as the point at which the fulcrum keeps the distribution in balance. Dictionary Mean * Same example and data set as in previous unit. Mean = 28.35 Take Notes Back
Calculating the MEAN from a frequency distribution One need not have a frequency distribution or ordered data set to calculate the mean. However, it is possible to do so. Using Xi to indicate the ith observation, the mathematical formula is: Dictionary For a frequency distribution: For a relative frequency distribution: Take Notes Back Next
Calculating the MEAN from a frequency distribution From a frequency distribution: Dictionary * The mean age is calculated to be 28.35, the same as before. Take Notes Back Next
Calculating the MEAN from a frequency distribution From a relative frequency distribution: Dictionary * The mean age is calculated to be 28.35, the same as before. Take Notes Back Next
Task: Calculating the Mean Use the raw data on daily customers to calculate the mean. Calculate the mean from (a) raw data, (b) a frequency distribution, and (c) a relative frequency distribution. (You should get the same answer each time!) The data is reproduced on the MS Excel link (below). Once completed, calculate both the median and mode from this sample data. Dictionary Take Notes Back
Means, Medians and Modes, and The Symmetry or Asymmetry of Distributions Whether one uses the mean, median or mode as the preferred measure of central tendency depends both upon the data set and the intentions of the person evaluating the data set. However, there is a noteworthy relationship between these statistics and the (positive or negative) skewness of the distribution. Dictionary Symmetric Distributions: mean = median = mode Positively Skewed Distributions: mean > median > mode Negatively Skewed Distributions: mean < median < mode Take Notes Back Next
Means, Medians and Modes, and The Symmetry or Asymmetry of Distributions The previous unit provided an example of a symmetric, positively skewed, and negatively skewed distribution. The three data sets are reproduced, but in an ordered list format (in one table, at left). The mean, median and mode of each distribution is calculated below. Dictionary * Note: The MS Excel formulas needed to get these statistics are shown below. Take Notes Back Next
Symmetrical Distributions Many distributions are symmetricalaround their means, such as the distribution and histogram shown below. Dictionary In this example, 25 students are given a score out of 10 maximum. Mean = Median = Mode = 7.00 * For symmetrical distributions, mean = median = mode. Take Notes Back Next
Positively Skewed Distributions Some distributions are positively skewed, with extremely high observations above their means, such as the distribution and histogram shown below. Dictionary Positive Skew In this example, 25 students are given a score out of 10 maximum. Mode (3.00) < Median (4.00) < Mean (4.48) * For positively skewed distributions, mode< median < mean. Take Notes Back Next
Negatively Skewed Distributions Some distributions are negatively skewed, with extremely low observations below their means, such as the distribution and histogram shown below. Dictionary Negative Skew In this example, 25 students are given a score out of 10 maximum. Mean (5.76) < Median (6.00) < Mode ( 7.00) * For negatively skewed distributions, mean < median < mode. Take Notes Back Next
End of Unit 2 Questions and Problems The following problems require the computation of statistics of central tendency using MS Excel. The problems are linked to actual Excel spreadsheets, where students should do their work. Dictionary Take Notes Back Next
Problem 1 The following table shows course scores of n =70 students. Calculate the (a) mean, (b) median, and (c) mode from this data. Dictionary Do this problem on the pre-formatted MS Excel spreadsheet, which can be accessed via the spreadsheet icon in the navigation pane below. Take Notes Back Next
Summary Descriptive Statistics on MS Excel You’ve been working too hard! Dictionary The MS Excel Data Analysis Add-In will actually calculate all of these measures of central tendency for you. All one must do is to place the univariate data into a single column and then follow the steps outlined on the next pages (link to MS Excel): Click on the MS Excel icon (below) to see how! Take Notes Back END
MS Excel 1. With your data in a column, choose Data > Data Analysis > Descriptive Statistics and click OK. Dictionary Take Notes Back Next
MS Excel 2. Select your input range (your data), click the “Summary Statistics” box, and click OK. Dictionary Take Notes Back Next
MS Excel 3. On a new Excel sheet, the Data Analysis add-in creates the following table. (It has been slightly reformatted here. The original on is in black/white and keeps the original column width, which is too narrow. Dictionary Try this yourself on the test score data you worked with earlier in this unit. (Click on the MS Excel icon below.) Take Notes Back Next