Measures of Central Tendency

Measures of Central Tendency “to be or not to be Normal”

TOPICS • Normal Distributions • Skewness & Kurtosis • Normal Curves and Probability • Z- scores • Confidence Intervals • Hypothesis Testing • The t-distribution

Is this normal ?

Normal Distributions • Are your curves normal? • Why do we care about normal curves? • What do normal curves tell us? Answer: The curves tell us something about the distribution of the population The curves allow us to make statistical inferences regarding the probability of some outcomes within some margin of error

The normal distribution • A distribution is easily depicted in a graph where the height of the line determined by the frequency of cases for the values beneath it. • Most cases cluster near the middle of a distribution if close to normal

The Normal Curve • Bell-shaped distribution or curve • Perfectly symmetrical about the mean. Mean = median = mode • Tails are asymptotic: closer and closer to horizontal axis but never reach it.

Skewness and Sample Distributions Not all curves are normal, even if still bell-shaped

Skewness • Formula for skewness

Kurtosis (It’s not a disease) • Beyond skewness, kurtosis tells us when our distribution may have high or low variance, even if normal. • The kurtosis value for a normal distribution will equal 3. Anything above this is a peaked value (low variance) and anything below is platykurtic (high variance).

Back to normal distributions • The power of normal distributions, or those close to it, is that we can predict where cases will fall within a distribution probabilistically. • For example, what are the odds, given the population parameter of human height, that someone will grow to more than eight feet? • Answer, likely less than a .025 probability

What does Andre the Giant do to the sample distribution? What is the probability of finding someone like Andre in the population? Are you ready for more inferential statistics? Answer: Oh boy, yes!! Sample Distribution

Normal Curves and probability • We have answered the question of what Andre and the Sumo wrestler would do to the distribution • But what about the probability of finding someone the same height as Andre in the population? • What is the probability of finding someone the same height as Dr. Peña or Dr. Boehmer?

More on normal curves and probability Andre would be here Dr. Boehmer would be here

Z-Scores (no sleeping!!) • We can standardize the central tendency away from the mean across different samples with z-scores. • The basic unit of the z-score is the standard deviation.

We can use the z-score to score each observation as a distance from the mean. How far is a given observation from the mean when its z-score = 2? Answer: 2 standard deviations. Approximately what percentage of cases is a given case higher than if its z-score = 2? Answer: 97%

Random Sampling Error • Ever hear a poll report a margin of error? What is that? Random Sampling Error = standard deviation/ square root of the sample size Or As the variance of the population increases, so does the chance that a sample could not reflect the population parameters

Standard Error • We often refer to both the random sampling error with both the chance to err when sampling but also the error of a specific sample statistic, the mean. We typically use the term Standard Error. • A sample statistic standard error is the difference between the mean of a sample and the mean of the population from which it is drawn.

Standard Error Example: What if most humans were 200 pounds and only 1 million globally were 250 pounds? The random sampling error would be low since the chance of collecting a sample consisting heavily of those heavier humans would be unlikely. There would not be much error in general from sampling because of the low variance.

Standard Error • Example continued. Now, when we take a sample, each sample has a mean. If a population has low variance, so should the samples. We should see this reflected in low standard error in the mean of the sample, the sample statistic. • Of course, higher variance in the population also causes higher error in samples taken from it.

Some more notation Random Sampling Error Error in a Sample’s mean is the Standard Error

Central Limit Theorem Remember that if we took an infinite number of samples from a population, the means of these samples would be normally distributed. Hence, the larger the sample relative to the population, the more likely the sample mean will capture the population mean.

Confidence Intervals • We can actually use the information we have about a standard deviation from the mean and calculate the range of values for which a sample would have if they were to fall close to the mean of the population. • This range is based on the probability that the sample mean falls close to the population mean with a probability of .95, or 5% error.

How Confident Are You? • Are you 100% sure? • Social scientists use a 95% as a threshold to test whether or not the results are product of chance. • That is, we take 1 out of 20 chances to be wrong • What do you MEAN? We build a 95% confidence interval to make sure that the mean will be within that range

Confidence Interval (CI) Y = mean Z = Z score related with a 95% CI σ = standard error

Assume the following Building a CI

CI Why do we use 1.96?

Calculating a 95% CI • Let’s look at the class population distribution of height • Is it a normal or skew distribution? • Let’s build a 95% CI around the mean height of the class

Why do we care about CI? • We use CI interval for hypothesis testing • For instance, we want to know if there is an income difference between El Paso and Boston • We want to know whether or not taking class at Kaplan makes a difference in our GRE scores

Mean Difference testing Mean USA Las Cruces Boston El Paso Income levels

Measures of Central Tendency