100 likes | 233 Views
EDUC 502: Introduction to Statistics. Lesson 3: variability 1/24/12. Review from last class. Percentile Simply the percentage of scores at or below a given score
E N D
EDUC 502: Introduction to Statistics Lesson 3: variability 1/24/12
Review from last class • Percentile • Simply the percentage of scores at or below a given score • So you just count the frequency of scores at or below the score you want to know the percentile for then divide by the total frequency of scores • Quartiles • Didn’t actually cover last week, but important to know • Each quartile is 25% • Draw on board with a distribution
What is variability? • Variability is how much scores fluctuate in the sample or population • There are a variety of ways of looking at the variability
Range • = Maximum score – Minimum score • Gives a very rough measure of variability, but is only based on two data points and can obscure the data if there are outliers
Variability around the mean To get a better description of how much scores vary from one another we calculate how much they vary around the mean. First, we need to understand that each score is a certain distance from the mean (x – xbar) and that we can’t just compute the average distance directly because the sum of the distances will always equal zero.
Variance • One way to solve the problem of the distances canceling each other out is to square each distance from the mean • Thus we get s2 = Σ(x – xbar)2/ n • This still isn’t perfect though because we squared everything which makes the variability appear larger than it actually is
Standard Deviation • Because you have squared the differences, we now need to take the square root to get the estimate of the variability back to something that isn’t exaggerated • s = √( Σ(x – xbar)2/ n)
Population Variance and SD • The formulas are essentially the same, but we use different symbols to indicate when we are talking about samples vs populations • Sample = Latin alphabet • Population = Greek alphabet • Variance • σ2x = Σ(X – μ)2 / N • SD • σx = √(Σ(X – μ)2 / N) • The problem is we very rarely have data from the population, so the question is, are our sample measures good enough to use to estimate the population? • NO!
Estimating population variance and SD • The sample statistics are a biased estimation of the sample, they underestimate the variance and SD • So we need an unbiased (or at least less biased) estimator • To do this we simply divide by n – 1 instead of just n • n – 1 is the degrees of freedom • Degrees of Freedom (df) • The number of values in a calculation that are free to vary • If we have a sample of 50 and we know the mean then only 49 values can vary. Once we know 49 we know what the last value is. • There are proofs to show that this really is less biased but we won’t go over them
SD in relation to the normal curve • Because we assume the distribution to be normal we can know what percentage of scores are contained between SDs • 1 SD = +-34% • 2 SD = +-47.5%