230 likes | 244 Views
Learn how to use dummy variables in multiple regression analysis and interpret coefficients. Explore different applications of dummy variables and their importance in statistical analysis.
E N D
Multiple Regression Analysis with Qualitative Information • Dummy variables as an independent variable • Dummy variable trap • Importance of the "reference group" • Using dummy variables to test for equal means • Dummy variables for • Multiple categories • Ordinal variables • Interaction terms allowing different slope across groups • Testing for equal coefficients across groups • Dummy variables as dependent variable • Linear Probability Model • Heteroskedasticity and other issues • Interpretation of coefficients
Dummy variable as independent variable • Dummy variables can be used to present qualitative information • Examples: gender, race, industry, occupation, year, month, … • Can be measured with a set of "dummy variables" • 1 if true; 0 if false • Example: A single dummy independent variable = the wage gain/loss if the person is a woman rather than a man (holding education fixed) Dummy variable: =1 ifthepersonis a woman =0 ifthepersonis man
Dummy variable as independent variable • Graphical Illustration Alternative interpretationofcoefficient: i.e. thedifference in mean wage betweenmenandwomenwiththe same levelofeducation. Interceptshift
Dummy variable trap • The above model cannot be estimated because of perfect collinearity. • male+female=1 and is perfectly collinear with intercept • Infinite number of parameters yield same sum of squared errors – no unique estimates that minimize SSE. • To "fix" dummy variable trap, must omit one of the dummies or the intercept.
Wage equation as example. • Estimated wage equation with intercept shift • What would coefficient be if • male dummy replaced female dummy? • Intercept was dropped, but male & female dummies included? Does the above regression imply that women are discriminated against? • Omitted variables bias • Walmart class action gender discrimination case Holding education, experience, and tenure fixed, women earn $1.81 less per hour than men
Comparing means of subpopulations described by dummies Not holding other factors constant, women earn $2.51 per hour less than men, i.e. the difference between the mean wage of men and that of women is $2.51. • Simple regression can be used to test whether whether difference in means is significant • The wage difference between men and women is larger if no other things are controlled for • Part of the difference in wages is due to differences in education, experience, and tenure between men and women • -2.51 without controls vs -1.81 with controls
Dummy variables for treatment effects • Effects of training grants on hours of training • This is an example of program evaluation • Treatment group (= grant receivers) vs. control group (= no grant) • Is the effect of treatment on the outcome of interest causal? • Not if treatment is endogenous • Treatment is endogenous if cov(treatment, error) Hourstraining per employee Dummy variable indicating whether firm received a training grant
Dummy variables in log regressions. • Using dummy explanatory variables in equations for log(y) Dummyindicatingwhetherhouseisofcolonial style As thedummyforcolonial style changesfrom 0 to 1, thehousepriceincreasesby 5.4 percentagepoints
Dummy variables for multiple categories • Define membership in each category by a dummy variable • Leave out one category (which becomes the base category or reference group) • Could leave out intercept instead. • How would coefficients change if marrmale was made reference group? • What hypotheses do t-statistics on dummies test?
Incorporating ordinal information using dummy variables Creditratingfrom 0-4 (0=worst, 4=best) Municipalbond rate • Example: City credit ratings and municipal bond interest rates Thisspecificationwouldprobably not beappropriateasthecreditratingonlycontainsordinalinformation. A betterwaytoincorporatethisinformationistodefinedummies: • Other examples: • Education groups • Age groups • Monthly or seasonal effects
Interactions involving dummy variables • Interactions with dummies allow different slopes across groups. example: • Interesting hypotheses Interaction term = intercept men = slope men = intercept women = slope women The returntoeducationisthe same formenandwomen The whole wage equationisthe same formenandwomen
Interactions involving dummy variables • Graphical illustration Interactingboththeinterceptandtheslopewiththefemaledummyenablesoneto model completelyindependent wage equationsformenandwomen
Interactions involving dummy variables Estimated wage equation with interaction term Doesthismeanthatthereisnosignificantevidenceoflowerpayforwomenatthe same levelsofeduc, exper, andtenure? No: thisisonlytheeffectforeduc = 0. Toanswerthequestiononehastorecentertheinteractionterm, e.g. aroundeduc = 12.5 (= averageeducation). Noevidenceagainsthypothesisthatthereturntoeducationisthe same formenandwomen
Testing for differences in regression functions across groups High school rank percentile Standardizedaptitudetest score College grade pointaverage • Unrestricted model (contains full set of interactions) • Restricted model (same regression for both groups) Total hoursspent in collegecourses F-test for equal regressions. How many degrees of freedom in numerator? Denominator?
Testing for differences in regression functions across groups All interactioneffectsarezero, i.e. the same regressioncoefficientsapplytomenandwomen • Null hypothesis • Estimation of the unrestricted model Testedindividually, thehypothesisthattheinteractioneffectsarezerocannotberejected
Multiple Regression Analysis with Qualitative Information Null hypothesisisrejected • Joint test with F-statistic • Chow test: alternative way to compute F-statistic in the given case • Run separate regressions for men and for women; the unrestricted SSR is given by the sum of the SSR of these two regressions • Run regression for the restricted model and store SSR • Important: Test assumes a constant error variance accross groups
The linear probability model • Linear regression when the dependent variable is binary Ifthedependent variable onlytakes on thevalues 1 and 0 Linear probability model (LPM) In the linear probability model, thecoefficientsdescribetheeffectoftheexplanatory variables on theprobabilitythat y=1
The linear probability model • Example: Labor force participation of married women =1 if in laborforce, =0 otherwise Non-wifeincome (in thousanddollars per year) Ifthenumberofkidsundersixyearsincreasesbyone, the pro- probabilitythatthewomanworks falls by 26.2% Does not look significant (but is it "exogenous" – i.e. Cov(kids, error)=0?
Multiple Regression Analysis with Qualitative Information • Example: Female labor participation of married women (cont.) Graph for nwifeinc=50, exper=5, age=30, kindslt6=1, and kidsge6=0 The maximumlevelofeducation in the sample iseduc=17. Forthegi-vencase, thisleadsto a predictedprobabilitytobe in thelaborforceofabout 50%. Negative predictedprobability but noproblembecausenowoman in the sample haseduc < 5.
Multiple Regression Analysis with Qualitative Information • Disadvantages of the linear probability model • Predicted probabilities may be larger than one or smaller than zero • Marginal probability effects sometimes logically impossible • The linear probability model is necessarily heteroskedastic • Heteroskedasticity consistent standard errors need to be computed • Advantanges of the linear probability model • Easy estimation and interpretation • Estimated effects and predictions are often reasonably good in practice VarianceofBer-noulli variable
Multiple Regression Analysis with Qualitative Information • More on policy analysis and program evaluation • Example: Effect of job training grants on worker productivity The firm‘s scrap rate =1 if firm receivedtraininggrant, =0 otherwise Noapparenteffectofgrant on productivity Treatment group: grant receivers,Control group: firms that received no grant Grants weregiven on a first-come, first-servedbasis. Thisis not the same asgivingthem out randomly. Itmightbethecasethatfirmswithlessproductiveworkerssaw an opportunitytoimproveproductivityandappliedfirst.
Multiple Regression Analysis with Qualitative Information • Self-selection into treatment as a source for endogeneity • In the given and in related examples, the treatment status is probably related to other characteristics that also influence the outcome • The reason is that subjects self-select themselves into treatment depending on their individual characteristics and prospects • Experimental evaluation • In experiments, assignment to treatment is random • In this case, causal effects can be inferred using a simple regression The dummyindicatingwhetheror not there was treatmentisunrelatedtootherfactorsaffectingtheoutcome.
Multiple Regression Analysis with Qualitative Information • Further example of an endogenous dummy regressor • Are nonwhite customers discriminated against? • It is important to control for other characteristics that may be important for loan approval (e.g. profession, unemployment) • Omitting important characteristics that are correlated with the non-white dummy will produce spurious evidence for discrimination Dummyindicatingwhetherloan was approved Racedummy Creditrating