Sociology 601: Midterm review, October 15, 2009

Sociology 601: Midterm review, October 15, 2009 • Basic information for the midterm • Date: Tuesday October 20, 2009 • Start time: 2 pm. • Place: usual classroom, Art/Sociology 3221 • Bring a sheet of notes, a calculator, two pens or pencils • Notify me if you anticipate any timing problems • Review for midterm • terms • symbols • steps in a significance test • testing differences in groups • contingency tables and measures of association • equations

Important terms from chapter 1 Terms for statistical inference: • population • sample • parameter • statistic Key idea: You use a sample to make inferences about a population

Important terms from chapter 2 2.1) Measurement: • variable • interval scale • ordinal scale • nominal scale • discrete variable • continuous variable 2.2-2.4) Sampling: • simple random sample • probability sampling • stratified sampling • cluster sampling • multistage sampling • sampling error Key idea: Statistical inferences depend on measurement and sampling.

Important terms from chapter 3 3.1) Tabular and graphic description • frequency distribution • relative frequency distribution • histogram • bar graph 3.2-3.4) Measures of central tendency and variation • mean • median • mode • proportion • standard deviation • variance • interquartile range • quartile, quintile, percentile

Important terms from chapter 3 Key ideas: 1.) Statistical inferences are often made about a measure of central tendency. 2.) Measures of variation help us estimate certainty about an inference.

Important terms from Chapter 4 • probability distribution • sampling distribution • sample distribution • normal distribution • standard error • central limit theorem • z-score Key ideas: 1.) If we know what the population is like, we can predict what a sample might be like. 2.) A sample statistic gives us a best guess of the population parameter. 2.) If we work carefully, a sample can tell us how confident to be about our sample statistic.

Important terms from chapter 5 • point estimator • estimate • unbiased • efficient • confidence interval Key ideas: 1.) We have a standard set of equations we use to make estimates. 2.) These equations are used because they have specific desirable properties. 3.) A confidence interval provides your best guess of a parameter. 4.) A confidence interval provides your best guess of how close your best guess (in part 3.)) will typically be to the parameter.

Important terms from chapter 6 6.1 – 6.3) Statistical inference: Significance tests • assumptions • hypothesis • test statistic • p-value • conclusion • null hypothesis • one-sided test • two-sided test • z-statistic

Key Idea from chapter 6 A significance test is a ritualized way to ask about a population parameter. 1.) Clearly state assumptions 2.) Hypothesize a value for a population parameter 3.) Calculate a sample statistic. 4.) Estimate how unlikely it is for the hypothesized population to produce such a sample statistic. 5.) Decide whether the hypothesis can be thrown out.

More important terms from chapter 6 6.4, 6.7) Decisions and types of errors in hypothesis tests • type I error • type II error • power 6.5-6.6) Small sample tests • t-statistic • binomial distribution • binomial test Key ideas: 1.) Modeling decisions and population characteristics can affect the probability of a mistaken inference. 2.) Small sample tests have the same principles as large sample tests, but require different assumptions and techniques.

symbols

Significance tests, Step 1: assumptions • An assumption that the sample was drawn at random. • this is pretty much a universal assumption for all significance tests. • An assumption whether the variable has two outcome categories (proportion) or many intervals (mean). • An assumption that enables us to assume a normal sampling distribution. This is assumption varies from test to test. • Some tests assume a normal population distribution. • Other tests assume different minimum sample sizes. • Some tests do not make this assumption. • Declare α level at the start, if you use one.

Significance Tests, Step 2: Hypothesis • State the hypothesis as a null hypothesis. • Remember that the null hypothesis is about the population from which you draw your sample. • Write the equation for the null hypothesis. • The null hypothesis can imply a one- or two-sided test. • Be sure the statement and equation are consistent.

Significance Tests, Step 3: Test statistic For the test statistic, write: • the equation, • your work, and • the answer. • Full disclosure maximizes partial credit. • I recommend four significant digits at each computational step, but present three as the answer.

Significance tests, Step 4: p-value Calculate an appropriate p-value for the test-statistic. • Use the correct table for the type of test; • Use the correct degrees of freedom if applicable; • Use a correct p-value for a one- or two-sided test, as you declared in the hypothesis step.

Significance Tests, Step 5: Conclusion Write a conclusion • write the p-value, your decision to reject H0 or not; • a statement of what your decision means; • discuss the substantive importance of your sample statistic.

Useful STATA outputs • immediate test for sample mean using TTESTI: . * for example, in A&F problem 6.8, n=100 Ybar=508 sd=100 and mu0=500 . ttesti 100 508 100 500, level(95) One-sample t test ------------------------------------------------------------------------------ | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] ---------+-------------------------------------------------------------------- x | 100 508 10 100 488.1578 527.8422 ------------------------------------------------------------------------------ Degrees of freedom: 99 Ho: mean(x) = 500 Ha: mean < 500 Ha: mean != 500 Ha: mean > 500 t = 0.8000 t = 0.8000 t = 0.8000 P < t = 0.7872 P > |t| = 0.4256 P > t = 0.2128

Useful STATA outputs • immediate test for sample proportion using PRTESTI: • . * for proportion: in A&F problem 6.12, n=832 p=.53 and p0=.5 • . prtesti 832 .53 .50, level(95) • One-sample test of proportion x: Number of obs = 832 • ------------------------------------------------------------------------------ • Variable | Mean Std. Err. [95% Conf. Interval] • -------------+---------------------------------------------------------------- • x | .53 .0173032 .4960864 .5639136 • ------------------------------------------------------------------------------ • Ho: proportion(x) = .5 • Ha: x < .5 Ha: x != .5 Ha: x > .5 • z = 1.731 z = 1.731 z = 1.731 • P < z = 0.9582 P > |z| = 0.0835 P > z = 0.0418

Useful STATA outputs • Comparison of two means using ttesti • ttesti 4252 18.1 12.9 6764 32.6 18.2, unequal • Two-sample t test with unequal variances • ------------------------------------------------------------------------------ • | Obs Mean Std. Err. Std. Dev. [95% Conf. Interval] • ---------+-------------------------------------------------------------------- • x | 4252 18.1 .1978304 12.9 17.71215 18.48785 • y | 6764 32.6 .221294 18.2 32.16619 33.03381 • ---------+-------------------------------------------------------------------- • combined | 11016 27.00323 .1697512 17.8166 26.67049 27.33597 • ---------+-------------------------------------------------------------------- • diff | -14.5 .2968297 -15.08184 -13.91816 • ------------------------------------------------------------------------------ • Satterthwaite's degrees of freedom: 10858.6 • Ho: mean(x) - mean(y) = diff = 0 • Ha: diff < 0 Ha: diff != 0 Ha: diff > 0 • t = -48.8496 t = -48.8496 t = -48.8496 • P < t = 0.0000 P > |t| = 0.0000 P > t = 1.0000

Chapter 6: Significance Tests for Single Sample

Equations for tests of statistical significance

Chapter 7: Comparing scores for two groups

Two Independent Groups: Large Samples, Means • It is important to be able to recognize the parts of the equation, what they mean, and why they are used. • Equal variance assumption? NO

Two Independent Groups: Large Samples, Proportions • Equal variance assumption? YES (if proportions are equal then so are variances). • df = N1 + N2 - 2

Two Independent Groups: Small Samples, Means 7.3 Difference of two small sample means: Equal variance assumption: SOMETIMES (for ease) NO (in computer programs)

Two Independent Groups: Small Samples, Proportions Fisher’s exact test • via stata, SAS, or SPSS • calculates exact probability of all possible occurences

Dependent Samples: • Means: • Proportions:

Chapter 8: Analyzing associations • Contingency tables and their terminologies: • marginal distributions and joint distributions • conditional distribution of R, givena value of E. (as counts or percentages in A & F) • marginal, joint, and conditional probabilities. (as proportions in A & F) • “Are two variables statistically independent?”

Descriptive statistics you need to know • How to draw and interpret contingency tables (crosstabs) • Frequency and probability/ percentage terms • marginal • conditional • joint • Measures of relationships: • odds, odds ratios • gamma and tau-b

Observed and expected cell counts • fo, the observed cell count, is the number of cases in a given cell. • fe, the expected cell count, is the number of cases we would predict in a cell if the variables were independent of each other. • fe = row total * column total / N • the equation for fe is a correction for rows or columns with small totals.

Chi-squared test of independence • Assumptions: 2 categorical variables, random sampling, fe >= 5 • Ho: variables are statistically independent (crudely, the score for one variable is independent of the score for the other.) • Test statistic: 2 = ((fo-fe)2/fe) • p-value from 2 table, df = (r-1)(c-1) • Conclusion; reject or do not reject based on p-value and prior -level, if necessary. Then, describe your conclusion.

Probabilities, odds, and odds ratios. • Given a probability, you can calculate an odds and a log odds. • odds = p / (1-p) • 50/50 = 1.0 • 0 ∞ • log odds = log (p / (1-p) ) = log (p) – log(1-p) • 50/50 = 0.0 • -∞  +∞ • odds ratio = [ p1 / (1-p1) ] / [ p2 / (1-p2) ] • Given an odds, you can calculate a probability. p = odds / ( 1 + odds)

Measures of association with ordinal data • concordant observations C: • in a pair, one is higher on both x and y • discordant observations D: • in a pair, one is higher on x and lower on y • ties • in a pair, same on x or same on y • gamma (ignores ties) • tau-b is a gamma that adjusts for “ties” • gamma often increases with more collapsed tables • b and  both have standard errors in computer output • b can be interpreted as a correlation coefficient

Sociology 601: Midterm review, October 15, 2009