200 likes | 221 Views
Sociology 601 Class 25: November 24, 2009. Homework 9 Review dummy variable example from ASR (finish) regression results for dummy variables Quadratic effects example: earnings and age plotting F-tests comparing models Example from Sociology of Religion.
E N D
Sociology 601 Class 25: November 24, 2009 • Homework 9 • Review • dummy variable example from ASR (finish) • regression results for dummy variables • Quadratic effects • example: earnings and age • plotting • F-tests comparing models • Example from Sociology of Religion
Review: Regression with Dummy Variables Create dummy variables for age: why? age is an interval variable, what advantage is there to creating a series of dummies? gen byte age25=0 if age<. /* new variable, age25, will be missing if age is missing */ replace age25=1 if age>=25 & age<=29 gen byte age30=0 if age<. replace age30=1 if age>=30 & age<=34 gen byte age35=0 if age<. replace age35=1 if age>=35 & age<=39 gen byte age40=0 if age<. replace age40=1 if age>=40 & age<=44 gen byte age45=0 if age<. replace age45=1 if age>=45 & age<=49 gen byte age50=0 if age<. replace age50=1 if age>=50 & age<=55 * check age dummies (agecheck should =1 for all cases) egen byte agecheck=rowtotal(age25-age50) tab agecheck, missing
Stata Shortcut for Dummy Variables gen byte agecat= floor(age/5)*5 tab agecat, gen(age) * floor function deletes decimal places: * e.g., at age 23: floor(23/5)*5 = floor(4.6)*5 = 4*5 = 20 * check age dummies (agecheck should =1 for all cases) egen byte agecheck=rowtotal(age1-age6) tab agecheck, missing drop if age<25 | age>54
Regression with Age Dummy Variables . regress conrinc age2-age6 if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 5, 719) = 12.79 Model | 3.8044e+10 5 7.6089e+09 Prob > F = 0.0000 Residual | 4.2773e+11 719 594895739 R-squared = 0.0817 -------------+------------------------------ Adj R-squared = 0.0753 Total | 4.6577e+11 724 643334846 Root MSE = 24390 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age2 | 8220.236 3143.413 2.62 0.009 2048.872 14391.6 age3 | 16495.6 3122.571 5.28 0.000 10365.16 22626.05 age4 | 17274.8 3112.55 5.55 0.000 11164.03 23385.57 age5 | 21532.53 3288.812 6.55 0.000 15075.7 27989.35 age6 | 20013.57 3406.607 5.87 0.000 13325.48 26701.66 _cons | 26954.2 2325.541 11.59 0.000 22388.54 31519.86 ------------------------------------------------------------------------------ Same R-squared and overall F, but different b’s and t’s (although same relative order): . regress conrinc age1-age5 if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 5, 719) = 12.79 Model | 3.8044e+10 5 7.6089e+09 Prob > F = 0.0000 Residual | 4.2773e+11 719 594895739 R-squared = 0.0817 -------------+------------------------------ Adj R-squared = 0.0753 Total | 4.6577e+11 724 643334846 Root MSE = 24390 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age1 | -20013.57 3406.607 -5.87 0.000 -26701.66 -13325.48 age2 | -11793.33 3266.455 -3.61 0.000 -18206.26 -5380.405 age3 | -3517.968 3246.403 -1.08 0.279 -9891.531 2855.595 age4 | -2738.771 3236.766 -0.85 0.398 -9093.413 3615.872 age5 | 1518.956 3406.607 0.45 0.656 -5169.13 8207.043 _cons | 46967.77 2489.343 18.87 0.000 42080.52 51855.02 ------------------------------------------------------------------------------
Plot Earnings by Age . tab age, sum(conrinc) | Summary of respondent income in age of | constant dollars respondent | Mean Std. Dev. Freq. ------------+------------------------------------ 25 | 16277.936 10757.323 47 26 | 22712.5 12540.689 46 27 | 21188.725 11802.539 40 28 | 25593.444 18395.24 54 29 | 27021.244 17314.169 45 30 | 29687.902 16242.466 61 31 | 30723.709 21631.857 55 32 | 30218.871 19739.067 62 33 | 26096.263 15751.154 57 34 | 30685.51 20528 51 35 | 37709.106 26704.259 47 36 | 29178.255 21877.287 51 37 | 33702.843 20378.26 70 38 | 39046.871 30994.531 62 39 | 40338.326 29449.024 43 40 | 35442.909 23448.711 55 41 | 38218.979 31804.641 48 42 | 34377.678 26582.113 59 43 | 37867.069 25189.647 58 44 | 34885.268 23017.34 41 45 | 35212.378 20559.449 45 46 | 41641.308 28233.297 39 47 | 39708.14 29503.584 50 48 | 41391.807 26493.252 57 49 | 38324.964 23601.741 55 50 | 42443.892 29193.688 37 51 | 37255.357 25395.935 42 52 | 35165.655 20471.181 29 53 | 44005.892 30812.439 37 54 | 36918.065 26556.129 31 ------------+------------------------------------ Total | 33571.775 24047.119 1474
Regression Test for Curvilinearity • test whether x has a curvilinear relationship with y: • testing for a quadratic relationship is the most common, but not the only method of testing for curvilinearity. • yi = β0 + β1xi + β2xi2 + ei • test whether β2 ≠ 0 • if β2 > 0, then U-shape curve (or part) • if β2 < 0, then inverted-U curve (or part) • if β2 !> 0 & β2 !< 0, then revert to linear equation by dropping x2 • β1 is rather irrelevant in this test • if p(β2 ≠ 0)>.05 and p(β1 ≠ 0)>.05, that does not mean there is no linear relationship.
Curvilinear Regression Equation: β2 yi = β0 + β1xi + β2xi2 + ei β2 (quadratic coefficient) determines how steeply the curve accelerates: y = 2x2 ; y = x2 ; y = .5 x2
Curvilinear Regression Equation: β2< 0 yi = β0 + β1xi + β2xi2 + ei β2 (quadratic coefficient) < 0 then curve is inverted-U y = -2x2 ; y = -x2 ; y = -.5 x2
Curvilinear Regression Equation: Inflexion Point = Maximum | Minimum yi = β0 + β1xi + β2xi2 + ei inflexion point = value of x when y is a maximum or minimum = - β1 / 2β2 y = -20x2 + 800x + 62000 inflexion= -800 / (-20 * 2) = 20 (i.e., below observed x values) y = -100x2 + 8000x – 90000 inflexion = -8000 / (-100 *2) = 40 (i.e., within the x range) y = -20x2 + 2400x + 800 inflexion = -2400 / (-20 * 2) = 60 (i.e., above observed values)
Curvilinear Regression Equation: Inflexion Point = Maximum | Minimum yi = β0 + β1xi + β2xi2 + ei for completeness, when β2 is positive: inflexion point = value of x when y is a maximum or minimum = - β1 / 2β y = 20x2 - 800x + 50000 inflexion= --800 / (20 * 2) = 20 (i.e., below observed x values) y = 100x2 - 8000x + 205000 inflexion = -8000 / (-100 *2) = 40 (i.e., within the x range) y = 20x2 - 2400x + 114000 inflexion = -2400 / (-20 * 2) = 60 (i.e., above observed values)
Example: Regression with Curvilinear Age . gen int agesq=age*age . summarize age agesq Variable | Obs Mean Std. Dev. Min Max -------------+-------------------------------------------------------- age | 1860 38.84355 8.309941 25 54 agesq | 1860 1577.839 655.309 625 2916 . regress conrinc age agesq if sex==1 Source | SS df MS Number of obs = 725 -------------+------------------------------ F( 2, 722) = 32.08 Model | 3.8016e+10 2 1.9008e+10 Prob > F = 0.0000 Residual | 4.2776e+11 722 592463841 R-squared = 0.0816 -------------+------------------------------ Adj R-squared = 0.0791 Total | 4.6577e+11 724 643334846 Root MSE = 24341 ------------------------------------------------------------------------------ conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- age | 4764.733 1134.778 4.20 0.000 2536.875 6992.591 agesq | -50.27083 14.30126 -3.52 0.000 -78.34785 -22.19381 _cons | -65221.92 21786.08 -2.99 0.003 -107993.6 -22450.29 ------------------------------------------------------------------------------ tagesq = -3.52; p < .001, so: curvilinear; bagesq = negative, so: inverted U; inflexion point = -bage / (2 * bagesq)) = - 4764.7 / (2 * -50.27) = 47.4 so maximum earnings at age 47 and a half.
Cubic Polynomials • Occasionally (actually, rarely), it is worthwhile to investigate whether a more complex polynomial would better describe the curvilinear relationship. • Add a cubic term (x3) to the previous quadratic equation: • yi = β0 + β1xi + β2xi2 + β3xi3 + ei • Test β3 ≠ 0 • if you can’t show β3 ≠ 0, then revert to quadratic model • if p(β3 ≠ 0) > .05, then don’t interpret β2 and β1 • if β3 ≠ 0, then curve has at least two bends (although not necessarily over the range of observed x’s)
Cubic Polynomials: Earnings and Age Example • . regress conrinc age agesq agecu if sex==1 • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 3, 721) = 21.36 • Model | 3.8020e+10 3 1.2673e+10 Prob > F = 0.0000 • Residual | 4.2775e+11 721 593278929 R-squared = 0.0816 • -------------+------------------------------ Adj R-squared = 0.0778 • Total | 4.6577e+11 724 643334846 Root MSE = 24357 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • age | 3971.837 8901.06 0.45 0.656 -13503.26 21446.93 • agesq | -29.64795 230.0667 -0.13 0.897 -481.3286 422.0327 • agecu | -.1739568 1.936886 -0.09 0.928 -3.976566 3.628653 • _cons | -55354.68 112007 -0.49 0.621 -275253.4 164544.1 • ------------------------------------------------------------------------------ • Note: after age cubed in entered, none of the coefficients are statistically significant (even though age and age squared were in the quadratic model). • So, since βagecubed is not statistically significant, revert to the quadratic model (DON’T conclude that age has no relationship with earnings!)
Inferences: F-tests Comparing models Comparing Regression Models, Agresti & Finlay, p 409: Where: Rc2 = R-square for complete model, R r2 = R-square for reduced model, k = number of explanatory variables in complete model, g = number of explanatory variables in reduced model, and N = number of cases.
Example: F-tests Comparing models • Complete model: men’s earnings on • age, • age square, • age cubed, • education, and • currently married dummy. • Reduced model: men’s earnings on • education and • currently married dummy. • F-test comparing model is whether age variables, as a group, have a significant relationship with earnings after controls for education and marital status
Example: F-tests Comparing models • Complete model: men’s earnings • . regress conrinc age agesq agecu educ married if sex==1 • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 5, 719) = 45.08 • Model | 1.1116e+11 5 2.2233e+10 Prob > F = 0.0000 • Residual | 3.5461e+11 719 493199914 R-squared = 0.2387 • -------------+------------------------------ Adj R-squared = 0.2334 • Total | 4.6577e+11 724 643334846 Root MSE = 22208 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • age | 5627.049 8127.377 0.69 0.489 -10329.18 21583.27 • agesq | -75.30909 210.0421 -0.36 0.720 -487.6781 337.0599 • agecu | .1985975 1.768176 0.11 0.911 -3.272807 3.670003 • educ | 3555.331 317.9738 11.18 0.000 2931.063 4179.599 • married | 8664.627 1690.098 5.13 0.000 5346.51 11982.74 • _cons | -127148.4 102508.3 -1.24 0.215 -328399.8 74103.01 • ------------------------------------------------------------------------------ • Note: none of the three age coefficients are, by themselves, statistically significant. • Rc2 = .2387; k = 5.
Example: F-tests Comparing models • Reduced model: men’s earnings • . regress conrinc educ married if sex==1 • Source | SS df MS Number of obs = 725 • -------------+------------------------------ F( 2, 722) = 80.20 • Model | 8.4666e+10 2 4.2333e+10 Prob > F = 0.0000 • Residual | 3.8111e+11 722 527850916 R-squared = 0.1818 • -------------+------------------------------ Adj R-squared = 0.1795 • Total | 4.6577e+11 724 643334846 Root MSE = 22975 • ------------------------------------------------------------------------------ • conrinc | Coef. Std. Err. t P>|t| [95% Conf. Interval] • -------------+---------------------------------------------------------------- • educ | 3650.611 328.1065 11.13 0.000 3006.454 4294.767 • married | 10721.42 1716.517 6.25 0.000 7351.457 14091.38 • _cons | -16381.3 4796.807 -3.42 0.001 -25798.65 -6963.944 • ------------------------------------------------------------------------------ • Rr2 = .1818; g = 2.
Inferences: F-tests Comparing models F = ( 0.2387 – 0.1818) / (5 – 2) df1=5-2; df1=725-6 ( 1 - .2387) / (725 – 6) = 0.0569/3 0.7613/719 = 26.87, df=(3,719), p < .001 (Agresti & Finlay, table D, page 673)
Next: Regression with Interaction Effects • Examples with earnings: • married x gender • age x gender • age x education • marital status x gender