150 likes | 159 Views
This chapter discusses the concepts of mean, variance, and weighted mean in data characterization. It explains how to compute the mean and variance, when to use weighted mean, and the importance of standard deviation. The empirical rule for data distribution is also mentioned.
E N D
Chapter 3 Data Characterization BUS304 – Data Charaterization
Mean: also called “average” Formula: Characterize the center of the data distribution The most commonly used measure Sample mean x The average derived from sample Population mean The average derived from the population Exercise: compute the mean weight for the Chargers’ offense players and defense players. Which mean should be higher? Why? Are they population mean or sample mean? Today: Mean and Variance • Ways to compute the mean: • Use calculator. • Use Excel. (function: average) BUS304 – Data Charaterization
Compute the mean for the following 2 groups of data Household income in community a: (Unit =10000$) Household income in community b: (Unit =10000$) Sensitivity to outliers If the mayor decide to provide more public facilities to poor communities, and the decision is made based on whether the mean income in the community is below $50,000 per year. Does such a decision make sense? BUS304 – Data Charaterization
Below is a frequency table showing the number of days the teams finish their projects How many days on average does a team finish one project? Create a histogram using the data on the left, locate the mean on the graph. How to describe the shape of the histogram? What is the relationship between the mean and peak? Use relative frequency to find out the mean. Compute the mean from frequency table BUS304 – Data Charaterization
515 2535 45 55 Compute the mean from Histogram Histogram conveys the same information as the frequency table Mathematical Expression: if sample, if population BUS304 – Data Charaterization
Weighted Mean • The mean assumes that each piece of information equally. • E.g. average score of the students. • Sometimes, different data should be put in different weight. • One may be more important than the other. • E.g. some instructor assign 60% on the homework score, and 40% on the final exam. If a student’s homework score is 84, and got 70 in the exam, compute the student’s final score. (weighted mean of homework score and exam score) -- this teacher thinks homework reveals more comprehensive information about a student’s knowledge, and hence put more weight. BUS304 – Data Charaterization
When to use weighted mean? • Some other examples of weighted mean: • A student’s GPA. A course with more credit takes more weight. • An economic growth indicator. (some industries affects the economy more than others) • Crush time leader: a player who perform the best in the last few minutes of the game. – can reveal the person’s performance under pressure. • Expectation – you will see in chapter 4 • E.g. in a gambling game, if with 60% chance you lose one dollar, and with 40% chance you gain one dollar, the expectation is 60%x$(-1)+40%x$1=-$0.2 • Other examples? (average Cal State Tuition) Always think whether you should use weighted mean or simple mean. BUS304 – Data Charaterization
Break BUS304 – Data Charaterization
Variance • A measure of data spread. • Also called “the average of squared deviations from the mean” The larger the variance, the fat the histogram -- sample variance -- population variance Note the difference! BUS304 – Data Charaterization
Steps to compute the variance • Identify whether the data are of a population or sample (the formulae are different.) • Use the following table to compute the deviation: • Find out the mean: • Find out the distance (fill out the 2nd column) • Find out the squared distance (the 3rd column) • Add up the 3rd column • divided by • population size; or • sample size -1 =5-mean=1.167 =(1.67)2=1.36 BUS304 – Data Charaterization
Comparing variance vs. histogram Find the variance for the following groups of sample data: Compare the mean and variance. Create the histogram to compare the distribution. BUS304 – Data Charaterization
What does variance mean? • Variance indicate variation: • The larger the variance, the more spread out the data. • Indicates unpredictability. • E.g. • Weather data: weather changes dramatically, hard to predict tomorrow’s temperature (If look at temperature data: which has larger variance, Chicago or San Diego?) • Stock: more risk on returns. • A person’s performance: consistency. emotional… • Other examples? BUS304 – Data Charaterization
Use frequency table to compute the populationvariance: Compute the weighted average BUS304 – Data Charaterization
Standard Deviation • Square root of variance. • An indicator of data deviation, can be directly compared to the mean. Exercise: compute the standard deviation from the histogram on slide no. 5 and locate it on the histogram. OR Sample variance Population variance Sample standard deviation Population standard deviation BUS304 – Data Charaterization
68% 99.7% 95% Empirical Rule • If the data is bell shaped (most of the time), then • 68% of all data will fall in the range of • 95% of all data will fall in the range of • 99.7% of all data will fall in the range of BUS304 – Data Charaterization