450 likes | 579 Views
Statistics Workshop Qualitative Variables in Regression Spring 2009 Bert Kritzer. Extending Regression. Qualitative predictors Dichotomous dependent variables Nonlinear relationships Time series data Panel models “ Limited” dependent variables
E N D
Statistics Workshop Qualitative Variables in RegressionSpring 2009Bert Kritzer
Extending Regression • Qualitative predictors • Dichotomous dependent variables • Nonlinear relationships • Time series data • Panel models • “Limited” dependent variables • Nominal (including dichotomous) dependent variables • Count variables • “Selection” models • Tobit • Switching • Mutual causation models
Dummy Variable: Region South Nonsouth
CODING DUMMY VARIABLES • Dichotomy • one category coded as 0 and one as 1 • coefficient represents deviation of the category coded 1 from that coded 0 • k (more than two) categories • Choose one category a “base” which is always coded 0 • create k-1 variables, each coded 1 for one category (other than the base) and zero for all others
Region example • Code South as 1 and Nonsouth as 0 • For Nonsouth, South = 0, yielding • For South, South = 1, yielding • Which we can rewrite as
Dummy Variable Results Note: south=1 for south, 0 for nonsouth
k CategoriesBazemore v. Friday, 478 U.S. 385 (1986) • Four ranks • “Chairman” • Agent • Associate Agent • Assistant Agent • Chose one to omit • Assistant agent • Create dummy variables for “Chairman”, Agent, and Associate Agent
Regression in Wage Discrimination CasesBazemore v. Friday, 478 U.S. 385 (1986)
University of Wisconsin1997 Gender Equity Pay StudyCollege of Letters & Science Regression Model: MODEL1 Dependent Variable: LNSAL ln(Salary) Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 54 45.42831 0.84127 43.187 0.0001 Error 781 15.21362 0.01948 C Total 835 60.64193 Root MSE 0.13957 R-square 0.7491 Dep Mean 11.03245 Adj R-sq 0.7318 C.V. 1.26508 Parameter Estimates Parameter Standard T for H0: Variable Variable DF Estimate Error Parameter=0 Prob > |T| Label INTERCEP 1 11.163387 0.07575212 147.367 0.0001 Intercept GENDER 1 -0.021302 0.01263912 -1.685 0.0923 Male WHITE 1 -0.010214 0.01651535 -0.618 0.5364 White/Unknown PROF 1 0.175458 0.01853981 9.464 0.0001 Full Professor ASST 1 -0.193622 0.02286049 -8.470 0.0001 Assistant Prof ANYDOC 1 0.017376 0.03510405 0.495 0.6208 Any Terminal Degree COH2 1 -0.085045 0.02458236 -3.460 0.0006 Hired 1980-88 COH3 1 -0.153097 0.03408703 -4.491 0.0001 Hired 1989-93 COH4 1 -0.168758 0.04543305 -3.714 0.0002 Hired 1994-98 DIFYRS 1 0.003513 0.00156769 2.241 0.0253 YRS SINCE DEG BEFORE UW INASTYRS 1 -0.018596 0.00380222 -4.891 0.0001 YRS AS INSTR/ASST PROF ASSOYRS 1 -0.020570 0.00244673 -8.407 0.0001 YRS AS UW ASSOC FULLYRS 1 0.003528 0.00146692 2.405 0.0164 YRS AS UW FULL PROF LNRATIO 1 0.481871 0.21528902 2.238 0.0255 ln(mkt ratio) PLUS 41 DEPARTMENT “FIXED EFFECTS”
Equity Study: Fixed Effects DEPARTMENT FIXED EFFECTS Parameter Standard T for H0: Variable Variable DF Estimate Error Parameter=0 Prob > |T| Label AFRLANG 1 -0.037307 0.07287210 -0.512 0.6088 ANTHRO 1 -0.042490 0.05677832 -0.748 0.4545 AFRAMER 1 0.067777 0.06028682 1.124 0.2613 ARTHIST 1 -0.009346 0.06446204 -0.145 0.8848 ASTRON 1 0.025805 0.05767292 0.447 0.6547 BOTANY 1 -0.023055 0.06263077 -0.368 0.7129 COMMUN 1 -0.043242 0.06234593 -0.694 0.4882 CHEM 1 0.007705 0.04325153 0.178 0.8587 CLASSICS 1 -0.013697 0.07344295 -0.186 0.8521 COMMDIS 1 0.035164 0.05853836 0.601 0.5482 COMPLIT 1 -0.027078 0.07883924 -0.343 0.7313 COMPUT 1 0.198201 0.04934743 4.016 0.0001 EASIALG 1 -0.053194 0.06957342 -0.765 0.4448 ECON 1 0.169280 0.05319197 3.182 0.0015 ENGLISH 1 -0.053755 0.05584121 -0.963 0.3360 FRENITAL 1 -0.073378 0.05724591 -1.282 0.2003 GEOG 1 -0.014052 0.05781558 -0.243 0.8080 GEOLOGY 1 0.007804 0.05502894 0.142 0.8873 GERMAN 1 -0.079744 0.06744970 -1.182 0.2375 HEBREW 1 0.016752 0.09408135 0.178 0.8587 HISTORY 1 -0.031301 0.05059288 -0.619 0.5363 HISTSC 1 0.047905 0.07102221 0.675 0.5002 JOURNAL 1 -0.045840 0.05939580 -0.772 0.4405 LIBRYSC 1 -0.079658 0.06446705 -1.236 0.2170 LINGUIS 1 -0.105136 0.07404040 -1.420 0.1560 MATH 1 -0.034484 0.04433476 -0.778 0.4369 METEOR 1 -0.020649 0.05059822 -0.408 0.6833 MUSIC 1 -0.084759 0.06710503 -1.263 0.2069 PHILOS 1 -0.060066 0.05534808 -1.085 0.2782 PHYSICS 1 0.035945 0.04208888 0.854 0.3934 POLISC 1 0.001526 0.04407509 0.035 0.9724 PSYCH 1 0.043498 0.04718937 0.922 0.3569 SCAND 1 -0.068544 0.09877777 -0.694 0.4879 SLAVIC 1 0.081673 0.06944784 1.176 0.2399 SOCWORK 1 0.038894 0.05518913 0.705 0.4812 SOCIOL 1 0.034492 0.04455797 0.774 0.4391 SASIAN 1 -0.146444 0.07595848 -1.928 0.0542 SPANPORT 1 -0.102875 0.06176804 -1.666 0.0962 THEATRE 1 -0.076231 0.06933522 -1.099 0.2719 URBPLAN 1 -0.013524 0.05830072 -0.232 0.8166 ZOOL 1 -0.055001 0.05418789 -1.015 0.3104
Alternate Coding Schemes • 0-1 coding is most common • Another coding scheme is -1, 0, +1 • Create k-1 dummies • Always code base category as -1 • Code each of the others +1, leaving others (except base which is -1) as 0 • Reduces to -1,+1 for a dichotomy • Sometimes see -½,0,+½
Dummy Variables & Standardized Coefficients Source: David Balduset al., Comparative Review of Death Sentences: An Empirical Study of the Georgia Experience 74 J. Crim L. & Criminology 661, 684-85 (1983).
THE CONCEPT OF INTERACTIONS • Models so far assume that effects of one variable do not depend on the effects of another variable. • Parallel lines in tort reform example • What if we wanted to allow effects to be different? • Predictors “interact” • Add multiplicative terms
Slope “Dummies” • Most straightforward when one variable is quantitative and the other is qualitative • Region and Liberalism • What you produce are multiple lines with difference slopes
Region example • Code South as 1 and Nonsouth as 0 • For Nonsouth, South = 0, yielding • For South, South = 1, yielding • Which we can rewrite as
RECASTING INTERACTION MODELS AS “CONDITIONAL” MODELS Lib Nonsouth= Lib for Nonsouth and 0 for South Lib South= 0 for Nonsouth and Lib for South
Conditional & Multiplicative Models Compared Multiplicative: Conditional:
Dichotomous Dependent Variables • Predicting the probability of an outcome • Observe the realization of the probability • Linear model • Dummy variable as dependent variable (coded 0 and 1) • Out of range results • The rubber band problem • Logistic regression • Probit analysis (probit regression)
Logistic Regression: Interpretation odds 20, male, white .537/1 20, female, white .310/1 20, male, black .986/1 50, male, white .283/1
Measuring Impact on ProbabilityLogistic Regression b = .4 up down up down Pinitial .5 .8 odds .5/.5 = 1 .8/.2 = 4 Linitial 0.0 1.386 Lshifted .4 -.4 1.786 0.986 odds 1.49 .67 5.97 2.68 Pshifted .598 .401 .857 .728 b* .098 .098 .057 .072
STOPPED-Logistic Regression Probability Odds 20, male, white 34.9% .537/1 20, female, white 23.6% .310/1 20, male, black 49.6% .986/1 50, male, white 13.3% .283/1
Estimates of Probability of Being Stopped by Age, Race & Gender
Probit Logit Curve
STOPPED-Probit Probit Logistic Regression 20, male, white 34.9% 34.1% 20, female, white 23.6% 23.6% 20, male, black 49.6% 47.0% 50, male, white 13.3% 13.3%
Estimates of Probability of Being Stopped by Age, Race & Gender Logistic Regression & Probit Compared Logistic Regression Probit
Logistic Regression Uses • Discrimination • Hiring • Promotion • Capital punishment • jury sentencing: impact of race of defendant/victim • death qualified jurors & conviction proneness • prosecutor’s decision to seek the death penalty • Other sentencing discrimination • Jail vs. Probation
McClesky v. Kemp481 U.S. 279 (1987) • Georgia case • Racial discrimination in capital sentencing • Study of capital sentencing in Georgia by David Baldus (U of Iowa) • Study found differential sentencing based on race of the victim: killers of whites were more likely to be sentenced to death
Logistic Regression from Baldus Study Source: David Balduset al., Equal Justice and the Death Penalty: A Legal and Empirical Analysis 319-20 (1990)
Baldus’s Interval Estimates Source: David Balduset al., Equal Justice and the Death Penalty: A Legal and Empirical Analysis 321 (1990)