160 likes | 412 Views
Sociology 601 Class 23: November 17, 2009. Homework #8 Review spurious, intervening, & interactions effects stata regression commands & output F-tests and inferences (A&F 11.4). Review: Types of 3-variable Causal Models. Spurious x 2 causes both x 1 and y
E N D
Sociology 601 Class 23: November 17, 2009 • Homework #8 • Review • spurious, intervening, & interactions effects • stata regression commands & output • F-tests and inferences (A&F 11.4)
Review: Types of 3-variable Causal Models • Spurious • x2 causes both x1 and y • e.g., age causes both marital status and earnings • Intervening • x1 causes x2 which causes y • e.g., marital status causes more hours worked which raises annual earnings • No statistical difference between these models. • Statistical interaction effects: The relationship between x1 and y depends on the value of another variable, x2 • e.g., the relationship between marital status and earnings is different for men and women.
Review: Causal Models with earnings & marital status • bivariate relationship: • married earnings • spuriousness: • 2. married earnings • age • intervening: • 3. married hours earnings • interaction effect: • married earnings • gender
Review: Stata Commands • describe • summarize • tab • tab xcat, sum(yvar) • drop if / keep if • gen / replace • ttest • regress • predict / predict, residuals • histogram / scattergram • graph box yvar, over(xvar)
Review: Regression models using Stata • see: • http://www.bsos.umd.edu/socy/vanneman/socy601/conrinc.do
Review: Regression models with Earnings, Marital status and Age • bivariate relationship: • . * association of earnings and marital status: • . regress conrinc married • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 1, 723) = 31.29 • Model | 1.9321e+10 1 1.9321e+10 Prob > F = 0.0000 • Residual | 4.4645e+11 723 617501240 R-squared = 0.0415 • -------------+------------------------------ Adj R-squared = 0.0402 • Total | 4.6577e+11 724 643334846 Root MSE = 24850 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • married | 10383.4 1856.279 5.59 0.000 6739.057 14027.74 • _cons | 35065.27 1380.532 25.40 0.000 32354.94 37775.6 • ------------------------------------------------------------------------------ • . spuriousness (partial): • . * age makes the marriage-earnings relationship partly spurious: • . regress conrinc married age • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 2, 722) = 36.20 • Model | 4.2454e+10 2 2.1227e+10 Prob > F = 0.0000 • Residual | 4.2332e+11 722 586315863 R-squared = 0.0911 • -------------+------------------------------ Adj R-squared = 0.0886 • Total | 4.6577e+11 724 643334846 Root MSE = 24214 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • married | 8243.081 1840.613 4.48 0.000 4629.489 11856.67 • age | 702.0977 111.7749 6.28 0.000 482.6551 921.5403 • _cons | 8836.284 4387.025 2.01 0.044 223.4344 17449.13 • ------------------------------------------------------------------------------
Review: Regression models with Earnings, Marital status and Hours Worked • Intervening variable relationship (hours worked): • . * hours worked explains some of how marital status increases earnings: • . regress conrinc married age hrs1 • Source | SS df MS Number of obs = 664 • -------------+------------------------------ F( 3, 660) = 25.02 • Model | 4.4322e+10 3 1.4774e+10 Prob > F = 0.0000 • Residual | 3.8970e+11 660 590458672 R-squared = 0.1021 • -------------+------------------------------ Adj R-squared = 0.0980 • Total | 4.3402e+11 663 654637868 Root MSE = 24299 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • married | 7328.527 1934.225 3.79 0.000 3530.551 11126.5 • age | 631.5836 117.8463 5.36 0.000 400.1848 862.9824 • hrs1 | 281.3472 71.47315 3.94 0.000 141.0051 421.6894 • _cons | -232.1376 5465.426 -0.04 0.966 -10963.86 10499.58 • ------------------------------------------------------------------------------ • But: problem with N! • Create new hours worked: • . gen hrs=hrs1 • (101 missing values generated) • . replace hrs=hrs2 if hrs1>=. • (24 real changes made, 2 to missing) • . replace hrs=0 if hrs1>=. & wrkstat>=3 • (101 real changes made)
Review: Regression models with Earnings, Marital status and Hours Worked • Intervening variable relationship (revised hours worked): • . regress conrinc married age hrs • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 3, 721) = 36.27 • Model | 6.1081e+10 3 2.0360e+10 Prob > F = 0.0000 • Residual | 4.0469e+11 721 561294582 R-squared = 0.1311 • -------------+------------------------------ Adj R-squared = 0.1275 • Total | 4.6577e+11 724 643334846 Root MSE = 23692 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • married | 7465.107 1805.967 4.13 0.000 3919.526 11010.69 • age | 640.1643 109.891 5.83 0.000 424.4197 855.9089 • hrs | 278.3368 48.31685 5.76 0.000 183.4783 373.1954 • _cons | -493.7634 4587.79 -0.11 0.914 -9500.786 8513.259 • ------------------------------------------------------------------------------ • b(married) reduced to 7465.1 from 8243.1 (N= 725 for both)
Review: Regression models with Earnings Marital status, Age, and Hours worked.
Review: Regression models with Earnings and Marital status, separately by Gender • Statistical Interaction Effect: • . * association of earnings and marital status for men: • . regress conrinc married if sex==1 • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 1, 723) = 31.29 • Model | 1.9321e+10 1 1.9321e+10 Prob > F = 0.0000 • Residual | 4.4645e+11 723 617501240 R-squared = 0.0415 • -------------+------------------------------ Adj R-squared = 0.0402 • Total | 4.6577e+11 724 643334846 Root MSE = 24850 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • married | 10383.4 1856.279 5.59 0.000 6739.057 14027.74 • _cons | 35065.27 1380.532 25.40 0.000 32354.94 37775.6 • ------------------------------------------------------------------------------ • . * association of earnings and marital status for women: • . regress conrinc married if sex==2 • Source | SS df MS Number of obs = 749 • -------------+------------------------------ F( 1, 747) = 0.26 • Model | 106732224 1 106732224 Prob > F = 0.6129 • Residual | 3.1118e+11 747 416578779 R-squared = 0.0003 • -------------+------------------------------ Adj R-squared = -0.0010 • Total | 3.1129e+11 748 416164546 Root MSE = 20410 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • married | 755.3387 1492.253 0.51 0.613 -2174.17 3684.848 • _cons | 26201 1038.855 25.22 0.000 24161.57 28240.42 • ------------------------------------------------------------------------------
Inferences: F-tests of global model • Ho : β1 = β2 = ... βk = 0 • α or β0 ? • F-tests of H0: • Calculate new test statistic, F • ratio of “explained variance” / “unexplained variance” • F-distribution: ratio of chi-square distributions • df1 (numerator); df2 (denominator) • if df1=1, then F = t2 • Table D, pages 671-673 • Global F-test less useful (almost always significant unless you have a really bad model or very small N). • Base for F-test comparing regression models (later)
F-test: Method 1, STATA output • . regress conrinc married age hrs1 • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 3, 721) = 36.27 • Model | 6.1081e+10 3 2.0360e+10 Prob > F = 0.0000 • Residual | 4.0469e+11 721 561294582 R-squared = 0.1311 • -------------+------------------------------ Adj R-squared = 0.1275 • Total | 4.6577e+11 724 643334846 Root MSE = 23692 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • married | 7465.107 1805.967 4.13 0.000 3919.526 11010.69 • age | 640.1643 109.891 5.83 0.000 424.4197 855.9089 • hrs | 278.3368 48.31685 5.76 0.000 183.4783 373.1954 • _cons | -493.7634 4587.79 -0.11 0.914 -9500.786 8513.259 • ------------------------------------------------------------------------------ • df1 = 3 (= k = # parameters = β(married), β(age), β(hrs) ) • df2 = 721 [ = N – (k+1) = 725 – (3+1) ] • F(3,721) = 2.60 (α = .05); 36.27 >> 2.60
F-test: Method 3, using SSE and Model SS F = 2.0360e+10 / 561294582 =36.27
Inferences: βi • H0: βi = 0 • what we are usually most interested in • test statistic:
Next: Regression with Dummy Variables • Agresti and Finlay 12.3 • (skim 12.1-12.2 on analysis of variance) • Example: marital status, 3 categories • currently married • never married • widowed • separated • divorced