240 likes | 259 Views
Lecture 7:. Bivariate Statistics. Properties of Standard Deviation. Variance is just the square of the S.D. If a constant is added to all scores, it has no impact on S.D. If a constant is multiplied to all scores, it will affect the dispersion (S.D. and variance).
E N D
Lecture 7: Bivariate Statistics
Properties of Standard Deviation • Variance is just the square of the S.D. • If a constant is added to all scores, it has no impact on S.D. • If a constant is multiplied to all scores, it will affect the dispersion (S.D. and variance) S = standard deviationX = individual scoreM = mean of all scoresn = sample size (number of scores)
Distributions and Standard Deviations • Example: A distribution has a mean of 40 and a standard deviation of 5. 68% of the distribution can be found between what two values? • 95% of the distribution can be found between what two values?
Standard Error of the Mean • Standard Error is an estimate of how much the mean would vary over many samples drawn from the same population. • It is calculated from a single sample– it is an estimate of the standard deviation of the sampling distribution of the mean. • Smaller S.E. suggests that our sample is likely a good estimate of the population mean.
Common Data Representations • Histograms • Simple graphs of the frequency of groups of scores. • Stem-and-Leaf Displays • Another way of displaying dispersion, particularly useful when you do not have large amounts of data. • Box Plots • Yet another way of displaying dispersion. Boxes show 75th and 25th percentile range, line within box shows median, and “whiskers” show the range of values (min and max)
Estimation and Hypothesis Tests: The Normal Distribution • A key assumption for many variables (or specifically, their scores/values) is that they are normally distributed. • In large part, this is because the most common statistics (chi-square, t, F test) rest on this assumption.
Why do we make this assumption? • Central Limit Theorem • Errors can be viewed as a sum of many independent random effects, thus individual scores will tend to be normally distributed. • Even if Y is not normally distributed, the distribution of the sample mean will tend to be normal as the sample size increases. • Y = µ + ε • A given score (Y) is the sum of the mean of the population (µ) and some error (ε)
The z-score • Infinitely many normal distributions are possible, one for each combination of mean and variance– but all related to a single distribution. • Standardizing a group of scores changes the scale to one of standard deviation units. • Allows for comparisons with scores that were originally on a different scale.
z-scores (continued) • Tells us where a score is located within a distribution– specifically, how many standard deviation units the score is above or below the mean. • Properties • The mean of a set of z-scores is zero (why?) • The variance (and therefore standard deviation) of a set of z-scores is 1.
Area under the normal curve • Example, you have a variable x with mean of 500 and S.D. of 15. How common is a score of 525? • Z = 525-500/15 = 1.67 • If we look up the z-statistic of 1.67 in a z-score table, we find that the proportion of scores less than our value is .9525. • Or, a score of 525 exceeds .9525 of the population. (p < .05) • Z-Score Calculator
Issues with Normal Distributions • Skewness • Kurtosis
Correlation Hypothesis testing an association between two metric variables
Checking for simple linear relationships • Pearson’s correlation coefficient • Measures the extent to which two variables are linearly related • Basically, the correlation coefficient is the average of the cross products of the corresponding z-scores.
Correlations • Ranges from zero to 1, where 1 = perfect linear relationship between the two variables. • Remember: correlation ONLY measures linear relationships, not all relationships!
Correlation Example • General Social Survey 1993 • Education and Age
The t-test Hypothesis testing for the equality of means between two independent groups
Alternative Hypotheses Revisited • Alternative Hypotheses: • H1: μ1 < μc • H0: μ1 > μc • H0: μ1 ≠ μc • How do we test to see if the means between two sample populations are, in fact, different?
The t-test • Where: M = meanSDM = Standard error of the difference between meansN = number of subjects in groups = Standard Deviation of groupdf = degrees of freedom
Degrees of freedom • d.f. = the number of independent pieces of information from the data collected in a study. • Example: Choosing 10 numbers that add up to 100. • This kind of restriction is the same idea: we had 10 choices but the restriction reduced our independent selections to N-1. • In statistics, further restrictions reduce the degrees of freedom. • In the t-test, since we deal with two means, our degrees of freedom are reduced by two.
t distribution • As the degrees of freedom increase (towards infinity), the t distribution approaches the z distribution (i.e., a normal distribution) • Because N plays such a prominent role in the calculation of the t-statistic, note that for very large N’s, the sample standard deviation (s) begins to closely approximate the population standard deviation (σ)
Assumptions Underlying the Independent Sample t-test • Assumption of Normality • Assumption of Homogeneity of Variance • The outputs for the t-test in SPSS correspond to the standard t-test (equal variance assumed) and a separate variance t-test (equal variance not assumed)
Practical Example: • Do men and women watch different amounts of TV per week? • General Social Survey 1993