330 likes | 337 Views
This document provides a comprehensive review for the midterm exam in Sociology. It covers topics such as statistical inference, measurement and sampling, measures of central tendency and variation, probability distribution, confidence intervals, significance tests, and types of errors in hypothesis tests.
E N D
Sociology 601: Midterm review, October 15, 2009 • Basic information for the midterm • Date: Tuesday October 20, 2009 • Start time: 2 pm. • Place: usual classroom, Art/Sociology 3221 • Bring a sheet of notes, a calculator, two pens or pencils • Notify me if you anticipate any timing problems • Review for midterm • terms • symbols • steps in a significance test • testing differences in groups • contingency tables and measures of association • equations
Important terms from chapter 1 Terms for statistical inference: • population • sample • parameter • statistic Key idea: You use a sample to make inferences about a population
Important terms from chapter 2 2.1) Measurement: • variable • interval scale • ordinal scale • nominal scale • discrete variable • continuous variable 2.2-2.4) Sampling: • simple random sample • probability sampling • stratified sampling • cluster sampling • multistage sampling • sampling error Key idea: Statistical inferences depend on measurement and sampling.
Important terms from chapter 3 3.1) Tabular and graphic description • frequency distribution • relative frequency distribution • histogram • bar graph 3.2-3.4) Measures of central tendency and variation • mean • median • mode • proportion • standard deviation • variance • interquartile range • quartile, quintile, percentile
Important terms from chapter 3 Key ideas: 1.) Statistical inferences are often made about a measure of central tendency. 2.) Measures of variation help us estimate certainty about an inference.
Important terms from Chapter 4 • probability distribution • sampling distribution • sample distribution • normal distribution • standard error • central limit theorem • z-score Key ideas: 1.) If we know what the population is like, we can predict what a sample might be like. 2.) A sample statistic gives us a best guess of the population parameter. 2.) If we work carefully, a sample can tell us how confident to be about our sample statistic.
Important terms from chapter 5 • point estimator • estimate • unbiased • efficient • confidence interval Key ideas: 1.) We have a standard set of equations we use to make estimates. 2.) These equations are used because they have specific desirable properties. 3.) A confidence interval provides your best guess of a parameter. 4.) A confidence interval provides your best guess of how close your best guess (in part 3.)) will typically be to the parameter.
Important terms from chapter 6 6.1 – 6.3) Statistical inference: Significance tests • assumptions • hypothesis • test statistic • p-value • conclusion • null hypothesis • one-sided test • two-sided test • z-statistic
Key Idea from chapter 6 A significance test is a ritualized way to ask about a population parameter. 1.) Clearly state assumptions 2.) Hypothesize a value for a population parameter 3.) Calculate a sample statistic. 4.) Estimate how unlikely it is for the hypothesized population to produce such a sample statistic. 5.) Decide whether the hypothesis can be thrown out.
More important terms from chapter 6 6.4, 6.7) Decisions and types of errors in hypothesis tests • type I error • type II error • power 6.5-6.6) Small sample tests • t-statistic • binomial distribution • binomial test Key ideas: 1.) Modeling decisions and population characteristics can affect the probability of a mistaken inference. 2.) Small sample tests have the same principles as large sample tests, but require different assumptions and techniques.
Significance tests, Step 1: assumptions • An assumption that the sample was drawn at random. • this is pretty much a universal assumption for all significance tests. • An assumption whether the variable has two outcome categories (proportion) or many intervals (mean). • An assumption that enables us to assume a normal sampling distribution. This is assumption varies from test to test. • Some tests assume a normal population distribution. • Other tests assume different minimum sample sizes. • Some tests do not make this assumption. • Declare α level at the start, if you use one.
Significance Tests, Step 2: Hypothesis • State the hypothesis as a null hypothesis. • Remember that the null hypothesis is about the population from which you draw your sample. • Write the equation for the null hypothesis. • The null hypothesis can imply a one- or two-sided test. • Be sure the statement and equation are consistent.
Significance Tests, Step 3: Test statistic For the test statistic, write: • the equation, • your work, and • the answer. • Full disclosure maximizes partial credit. • I recommend four significant digits at each computational step, but present three as the answer.
Significance tests, Step 4: p-value Calculate an appropriate p-value for the test-statistic. • Use the correct table for the type of test; • Use the correct degrees of freedom if applicable; • Use a correct p-value for a one- or two-sided test, as you declared in the hypothesis step.
Significance Tests, Step 5: Conclusion Write a conclusion • write the p-value, your decision to reject H0 or not; • a statement of what your decision means; • discuss the substantive importance of your sample statistic.
Useful STATA outputs • immediate test for sample mean using TTESTI: . * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500 . ttesti 100 508 100 500, level(95) One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 100 508 10 100 488.1578 527.8422 ------------------------------------------------------------------------------ Degrees of freedom: 99 Ho: mean(x) = 500 Ha: mean < 500 Ha: mean != 500 Ha: mean > 500 t = 0.8000 t = 0.8000 t = 0.8000 P < t = 0.7872 P > |t| = 0.4256 P > t = 0.2128
Useful STATA outputs • immediate test for sample proportion using PRTESTI: • . * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5 • . prtesti 832 .53 .50, level(95) • One-sample test of proportion x: Number of obs = 832 • ------------------------------------------------------------------------------ • Variable | Mean Std. Err. [95% Conf. Interval] • -------------+---------------------------------------------------------------- • x | .53 .0173032 .4960864 .5639136 • ------------------------------------------------------------------------------ • Ho: proportion(x) = .5 • Ha: x < .5 Ha: x != .5 Ha: x > .5 • z = 1.731 z = 1.731 z = 1.731 • P < z = 0.9582 P > |z| = 0.0835 P > z = 0.0418
Useful STATA outputs • Comparison of two means using ttesti • ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal • Two-sample t test with unequal variances • ------------------------------------------------------------------------------ • | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] • ---------+-------------------------------------------------------------------- • x | 4252 18.1 .1978304 12.9 17.71215 18.48785 • y | 6764 32.6 .221294 18.2 32.16619 33.03381 • ---------+-------------------------------------------------------------------- • combined | 11016 27.00323 .1697512 17.8166 26.67049 27.33597 • ---------+-------------------------------------------------------------------- • diff | -14.5 .2968297 -15.08184 -13.91816 • ------------------------------------------------------------------------------ • Satterthwaite's degrees of freedom: 10858.6 • Ho: mean(x) - mean(y) = diff = 0 • Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 • t = -48.8496 t = -48.8496 t = -48.8496 • P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000
Two Independent Groups: Large Samples, Means • It is important to be able to recognize the parts of the equation, what they mean, and why they are used. • Equal variance assumption? NO
Two Independent Groups: Large Samples, Proportions • Equal variance assumption? YES (if proportions are equal then so are variances). • df = N1 + N2 - 2
Two Independent Groups: Small Samples, Means 7.3 Difference of two small sample means: Equal variance assumption: SOMETIMES (for ease) NO (in computer programs)
Two Independent Groups: Small Samples, Proportions Fisher’s exact test • via stata, SAS, or SPSS • calculates exact probability of all possible occurences
Dependent Samples: • Means: • Proportions:
Chapter 8: Analyzing associations • Contingency tables and their terminologies: • marginal distributions and joint distributions • conditional distribution of R, givena value of E. (as counts or percentages in A & F) • marginal, joint, and conditional probabilities. (as proportions in A & F) • “Are two variables statistically independent?”
Descriptive statistics you need to know • How to draw and interpret contingency tables (crosstabs) • Frequency and probability/ percentage terms • marginal • conditional • joint • Measures of relationships: • odds, odds ratios • gamma and tau-b
Observed and expected cell counts • fo, the observed cell count, is the number of cases in a given cell. • fe, the expected cell count, is the number of cases we would predict in a cell if the variables were independent of each other. • fe = row total * column total / N • the equation for fe is a correction for rows or columns with small totals.
Chi-squared test of independence • Assumptions: 2 categorical variables, random sampling, fe >= 5 • Ho: variables are statistically independent (crudely, the score for one variable is independent of the score for the other.) • Test statistic: 2 = ((fo-fe)2/fe) • p-value from 2 table, df = (r-1)(c-1) • Conclusion; reject or do not reject based on p-value and prior -level, if necessary. Then, describe your conclusion.
Probabilities, odds, and odds ratios. • Given a probability, you can calculate an odds and a log odds. • odds = p / (1-p) • 50/50 = 1.0 • 0 ∞ • log odds = log (p / (1-p) ) = log (p) – log(1-p) • 50/50 = 0.0 • -∞ +∞ • odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ] • Given an odds, you can calculate a probability. p = odds / ( 1 + odds)
Measures of association with ordinal data • concordant observations C: • in a pair, one is higher on both x and y • discordant observations D: • in a pair, one is higher on x and lower on y • ties • in a pair, same on x or same on y • gamma (ignores ties) • tau-b is a gamma that adjusts for “ties” • gamma often increases with more collapsed tables • b and both have standard errors in computer output • b can be interpreted as a correlation coefficient