1 / 44

Statistics Workshop Qualitative Variables in Regression Spring 2009 Bert Kritzer

Statistics Workshop Qualitative Variables in Regression Spring 2009 Bert Kritzer. Extending Regression. Qualitative predictors Dichotomous dependent variables Nonlinear relationships Time series data Panel models “ Limited” dependent variables

dominy
Download Presentation

Statistics Workshop Qualitative Variables in Regression Spring 2009 Bert Kritzer

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Statistics Workshop Qualitative Variables in RegressionSpring 2009Bert Kritzer

  2. Extending Regression • Qualitative predictors • Dichotomous dependent variables • Nonlinear relationships • Time series data • Panel models • “Limited” dependent variables • Nominal (including dichotomous) dependent variables • Count variables • “Selection” models • Tobit • Switching • Mutual causation models

  3. Dummy Variable: Region South Nonsouth

  4. CODING DUMMY VARIABLES • Dichotomy • one category coded as 0 and one as 1 • coefficient represents deviation of the category coded 1 from that coded 0 • k (more than two) categories • Choose one category a “base” which is always coded 0 • create k-1 variables, each coded 1 for one category (other than the base) and zero for all others

  5. Region example • Code South as 1 and Nonsouth as 0 • For Nonsouth, South = 0, yielding • For South, South = 1, yielding • Which we can rewrite as

  6. Dummy Variable Results Note: south=1 for south, 0 for nonsouth

  7. k CategoriesBazemore v. Friday, 478 U.S. 385 (1986) • Four ranks • “Chairman” • Agent • Associate Agent • Assistant Agent • Chose one to omit • Assistant agent • Create dummy variables for “Chairman”, Agent, and Associate Agent

  8. Regression in Wage Discrimination CasesBazemore v. Friday, 478 U.S. 385 (1986)

  9. Explicating Bazemore Results

  10. University of Wisconsin1997 Gender Equity Pay Study

  11. University of Wisconsin1997 Gender Equity Pay StudyCollege of Letters & Science Regression Model: MODEL1 Dependent Variable: LNSAL ln(Salary) Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 54 45.42831 0.84127 43.187 0.0001 Error 781 15.21362 0.01948 C Total 835 60.64193 Root MSE 0.13957 R-square 0.7491 Dep Mean 11.03245 Adj R-sq 0.7318 C.V. 1.26508 Parameter Estimates Parameter Standard T for H0: Variable Variable DF Estimate Error Parameter=0 Prob > |T| Label INTERCEP 1 11.163387 0.07575212 147.367 0.0001 Intercept GENDER 1 -0.021302 0.01263912 -1.685 0.0923 Male WHITE 1 -0.010214 0.01651535 -0.618 0.5364 White/Unknown PROF 1 0.175458 0.01853981 9.464 0.0001 Full Professor ASST 1 -0.193622 0.02286049 -8.470 0.0001 Assistant Prof ANYDOC 1 0.017376 0.03510405 0.495 0.6208 Any Terminal Degree COH2 1 -0.085045 0.02458236 -3.460 0.0006 Hired 1980-88 COH3 1 -0.153097 0.03408703 -4.491 0.0001 Hired 1989-93 COH4 1 -0.168758 0.04543305 -3.714 0.0002 Hired 1994-98 DIFYRS 1 0.003513 0.00156769 2.241 0.0253 YRS SINCE DEG BEFORE UW INASTYRS 1 -0.018596 0.00380222 -4.891 0.0001 YRS AS INSTR/ASST PROF ASSOYRS 1 -0.020570 0.00244673 -8.407 0.0001 YRS AS UW ASSOC FULLYRS 1 0.003528 0.00146692 2.405 0.0164 YRS AS UW FULL PROF LNRATIO 1 0.481871 0.21528902 2.238 0.0255 ln(mkt ratio) PLUS 41 DEPARTMENT “FIXED EFFECTS”

  12. Logged Dependent VariablesInterpretation as Percent Change

  13. Basis of Percentage Interpretation

  14. Equity Study: Fixed Effects DEPARTMENT FIXED EFFECTS Parameter Standard T for H0: Variable Variable DF Estimate Error Parameter=0 Prob > |T| Label AFRLANG 1 -0.037307 0.07287210 -0.512 0.6088 ANTHRO 1 -0.042490 0.05677832 -0.748 0.4545 AFRAMER 1 0.067777 0.06028682 1.124 0.2613 ARTHIST 1 -0.009346 0.06446204 -0.145 0.8848 ASTRON 1 0.025805 0.05767292 0.447 0.6547 BOTANY 1 -0.023055 0.06263077 -0.368 0.7129 COMMUN 1 -0.043242 0.06234593 -0.694 0.4882 CHEM 1 0.007705 0.04325153 0.178 0.8587 CLASSICS 1 -0.013697 0.07344295 -0.186 0.8521 COMMDIS 1 0.035164 0.05853836 0.601 0.5482 COMPLIT 1 -0.027078 0.07883924 -0.343 0.7313 COMPUT 1 0.198201 0.04934743 4.016 0.0001 EASIALG 1 -0.053194 0.06957342 -0.765 0.4448 ECON 1 0.169280 0.05319197 3.182 0.0015 ENGLISH 1 -0.053755 0.05584121 -0.963 0.3360 FRENITAL 1 -0.073378 0.05724591 -1.282 0.2003 GEOG 1 -0.014052 0.05781558 -0.243 0.8080 GEOLOGY 1 0.007804 0.05502894 0.142 0.8873 GERMAN 1 -0.079744 0.06744970 -1.182 0.2375 HEBREW 1 0.016752 0.09408135 0.178 0.8587 HISTORY 1 -0.031301 0.05059288 -0.619 0.5363 HISTSC 1 0.047905 0.07102221 0.675 0.5002 JOURNAL 1 -0.045840 0.05939580 -0.772 0.4405 LIBRYSC 1 -0.079658 0.06446705 -1.236 0.2170 LINGUIS 1 -0.105136 0.07404040 -1.420 0.1560 MATH 1 -0.034484 0.04433476 -0.778 0.4369 METEOR 1 -0.020649 0.05059822 -0.408 0.6833 MUSIC 1 -0.084759 0.06710503 -1.263 0.2069 PHILOS 1 -0.060066 0.05534808 -1.085 0.2782 PHYSICS 1 0.035945 0.04208888 0.854 0.3934 POLISC 1 0.001526 0.04407509 0.035 0.9724 PSYCH 1 0.043498 0.04718937 0.922 0.3569 SCAND 1 -0.068544 0.09877777 -0.694 0.4879 SLAVIC 1 0.081673 0.06944784 1.176 0.2399 SOCWORK 1 0.038894 0.05518913 0.705 0.4812 SOCIOL 1 0.034492 0.04455797 0.774 0.4391 SASIAN 1 -0.146444 0.07595848 -1.928 0.0542 SPANPORT 1 -0.102875 0.06176804 -1.666 0.0962 THEATRE 1 -0.076231 0.06933522 -1.099 0.2719 URBPLAN 1 -0.013524 0.05830072 -0.232 0.8166 ZOOL 1 -0.055001 0.05418789 -1.015 0.3104

  15. Alternate Coding Schemes • 0-1 coding is most common • Another coding scheme is -1, 0, +1 • Create k-1 dummies • Always code base category as -1 • Code each of the others +1, leaving others (except base which is -1) as 0 • Reduces to -1,+1 for a dichotomy • Sometimes see -½,0,+½

  16. Dummy Variables & Standardized Coefficients Source: David Balduset al., Comparative Review of Death Sentences: An Empirical Study of the Georgia Experience 74 J. Crim L. & Criminology 661, 684-85 (1983).

  17. THE CONCEPT OF INTERACTIONS • Models so far assume that effects of one variable do not depend on the effects of another variable. • Parallel lines in tort reform example • What if we wanted to allow effects to be different? • Predictors “interact” • Add multiplicative terms

  18. Slope “Dummies” • Most straightforward when one variable is quantitative and the other is qualitative • Region and Liberalism • What you produce are multiple lines with difference slopes

  19. Slope Dummy

  20. Region example • Code South as 1 and Nonsouth as 0 • For Nonsouth, South = 0, yielding • For South, South = 1, yielding • Which we can rewrite as

  21. INTERPRETING RESULTS INVOLVING INTERACTIONS

  22. Slope Dummy: Separate Graphs

  23. RECASTING INTERACTION MODELS AS “CONDITIONAL” MODELS Lib Nonsouth= Lib for Nonsouth and 0 for South Lib South= 0 for Nonsouth and Lib for South

  24. Conditional & Multiplicative Models Compared Multiplicative: Conditional:

  25. Dichotomous Dependent Variables • Predicting the probability of an outcome • Observe the realization of the probability • Linear model • Dummy variable as dependent variable (coded 0 and 1) • Out of range results • The rubber band problem • Logistic regression • Probit analysis (probit regression)

  26. Logistic Curve

  27. Logistic Regression Model

  28. Logistic Regression Output

  29. Logistic Regression: Interpretation odds 20, male, white .537/1 20, female, white .310/1 20, male, black .986/1 50, male, white .283/1

  30. Interpreting Coefficients

  31. Measuring Impact on ProbabilityLogistic Regression b = .4 up down up down Pinitial .5 .8 odds .5/.5 = 1 .8/.2 = 4 Linitial 0.0 1.386 Lshifted .4 -.4 1.786 0.986 odds 1.49 .67 5.97 2.68 Pshifted .598 .401 .857 .728 b* .098 .098 .057 .072

  32. STOPPED-Logistic Regression Probability Odds 20, male, white 34.9% .537/1 20, female, white 23.6% .310/1 20, male, black 49.6% .986/1 50, male, white 13.3% .283/1

  33. Estimates of Probability of Being Stopped by Age, Race & Gender

  34. Interactions in Logistic Regression Models

  35. Estimates of Probability of Being Stopped with Interaction

  36. The Probit Alternative

  37. The Probit Model

  38. Probit Logit Curve

  39. STOPPED-Probit Probit Logistic Regression 20, male, white 34.9% 34.1% 20, female, white 23.6% 23.6% 20, male, black 49.6% 47.0% 50, male, white 13.3% 13.3%

  40. Estimates of Probability of Being Stopped by Age, Race & Gender Logistic Regression & Probit Compared Logistic Regression Probit

  41. Logistic Regression Uses • Discrimination • Hiring • Promotion • Capital punishment • jury sentencing: impact of race of defendant/victim • death qualified jurors & conviction proneness • prosecutor’s decision to seek the death penalty • Other sentencing discrimination • Jail vs. Probation

  42. McClesky v. Kemp481 U.S. 279 (1987) • Georgia case • Racial discrimination in capital sentencing • Study of capital sentencing in Georgia by David Baldus (U of Iowa) • Study found differential sentencing based on race of the victim: killers of whites were more likely to be sentenced to death

  43. Logistic Regression from Baldus Study Source: David Balduset al., Equal Justice and the Death Penalty: A Legal and Empirical Analysis 319-20 (1990)

  44. Baldus’s Interval Estimates Source: David Balduset al., Equal Justice and the Death Penalty: A Legal and Empirical Analysis 321 (1990)

More Related