120 likes | 133 Views
Learn about qualitative dependent variables and how to incorporate dummy variables into regression models. Understand how to interpret regression results with dummy variables.
E N D
Chapter 15, part D Qualitative Dependent Variables
VI. Qualitative Dependent Variables For most of our models we have restricted our independent variables to quantitative data, values that can take any value in a range. Past examples include: Salary, G.P.A., # of Customers, Repair Cost $ Qualitative (dummy) variables are those that take two or more values (Gender, Political Party, Region of Country).
A. A Dummy Variable The simplest of dummy variables is one in which there are only two possibilities for a qualitative variable. You arbitrarily assign a value of 1 to one possibility and a value of 0 to the other. Examples: X=1 if Female; X=0 if Male X=1 if Union worker; X=0 if Nonunion X=1 if College Graduate; X=0 if not
B. Inclusion in a Regression Problem #38 builds a model to relate Age (x1), Blood Pressure (x2) and Smoking (x3) to the Risk of Strokes (y). Smoking is a dummy variable, X3 =1 if a smoker; X3=0 if a non smoker.
Output Overall, what do you make of these results?
C. Interpretation The estimated coefficient on the Dummy for smoking is 8.74. Since X3=1 for a smoker, this means the probability a patient has a stroke in the next 10 years rises by 8.74% if they’re a smoker. You can’t do much about your age, but if you lower your blood pressure by 10 points, you lower the risk by 2.5%. Hmmm, what should a person do?
D. Multi-level Dummy Variables There are many wage/salary regression models that wish to examine differences in a wage variable by region of the country. For example, we could divide the country into 4 regions and assign a value of 1 to a worker from that region and 0 for all other regions.
Example Suppose we have 3 workers in a set of data. Franklin is from the North, Elly May is from the South, and Chet is from the West. Our table of data might look like this:
The Model • If you have 4 levels for the qualitative variable “Region”, you can only include 3 in the equation. Including all 4 makes it impossible for least-squares to minimize the sum of squared residuals. • The omission of one region creates a benchmark and allows you to compare all other regions to the one omitted.
Hypothetical Regression Results Let’s say that we leave out “East” and we find the following: Wage(Y) = 100 + 50(North) - 25(South) - 10(West) Remember, “North”=1 only if a worker is from the North and all other regions “South” and “West” are 0 for that worker.
Interpretation Franklin is from the North, so “North”=1 and “South”=“West”=0. His estimated wage is then 100+50=$150. Thus we could say that a worker from the North, all else held constant, would see a $50 increase in his/her wage
Continued... Elly May is from the South, so “South”=1 and “North”=“West”=0. Her estimated wage is then 100-25=$75.