190 likes | 532 Views
Multiple Regression. MR Example with dummy variables. Problem / Background.
E N D
Multiple Regression MR Example with dummy variables
Problem / Background • The manager of a small sales force wants to know whether average monthly salary is different for males and females in the sales force. He obtains data on monthly salary and experience (in months) for each of the 9 employees as shown on the next slide.
You can use the data in the table below for replicating results shown on the following slides
Creating a dummy variable for gender • Categorical data is included in regression analysis by using dummy variables • For example, we can assign a value of 0 for males and 1 for females in our data so that a MR model can be developed
What are dummy variables? • Dummy variables, also called indicator variables allow us to include categorical data (like Gender) in regression models • A dummy variable can take only 2 values, 0 (absence of a category) and 1 (presence of a category) • In our example, we set the dummy variable gender to 1 for females and 0 when the employee is not a female • When interpreting results for gender, we remember that when dummy variable is 0 (not a female), we are talking about males
Regression analysis: Salary vs. Gender Predicted salary for males: Salary=9.7-1.175*0=9.7 Predicted salary for females: Salary=9.7-1.175*1=8.525 But, the difference in male / female salaries is NOT statistically significant because the p-value for gender is not significant (p=0.389).
More on the intercept and slope • The value of the intercept, 9.70, is the average salary for males (as we coded gender=1 for females and 0 for males) • The value of the slope, -1.175, tells us that the average females salary is lower than the average male salary by 1.175
Coding issues • What would have happened if we had used 0 for females and 1 for males in our data? Would our results be any different? • Not really – With coding as above, the intercept would change to 8.525 (the average female salary), the slope for gender would still be 1.175, but now it would have a positive sign (reflecting that average male salary is higher than average female salary by 1.175). Predicted salaries from the model for males / females would not change no matter how dummy variable is coded
The analyst decides to use additional information to explain employee salary – employee’s experience at this company (months employed) Gender is coded as 0 for males and 1 for female Using additional information
Multiple regression: Salary vs. Gender and Experience Is the model valid? YES; significance F is much smaller than 0.10 Is gender significant (a=0.1)? YES, p-value is smaller than 0.10
Is the multiple regression model better than the simple regression model? • Was gender significant in the simple regression model? • How do you explain the significant effect of gender in the multiple regression model? • What is the salary equation for men? • What is the salary equation for women?
More on dummy variables • For gender, we had only 2 categories – female and male – thus we used a single 0/1 variable for this • When there are more than 2 categories, the number of dummy variables that should be used equals the number of categories minus 1 • No. of Dummy Variables = No. of levels -1
Example: Salary vs. Job Grade • In this example, the categorical variable job grade has 3 levels, 1 (lowest grade), 2, and 3 (highest job grade)
Dummy variables for a categorical variable with 3 levels • We could create 3 dummy variables for job grade as follows: • Job_1=1 if job grade=1, zero otherwise • Job_2=1 if job grade=2, zero otherwise • Job_3=1 if job grade=3, zero otherwise • However, we should only use (any) 2 in the regression model to represent the three levels (the reason is technical – creating a dummy for each level leads to redundancy)
Representing 3-level job grade with two dummy variables • In the scheme below, job grades 1 and 2 will be explicitly represented using their own dummy variable while grade 3 will become reference level: • For each employee, we create 2 new dummy variables called Job_1 and Job_2 • For employees whose Job grade=1, we set Job_1 equal to 1 and Job_2 equal to zero • For employees whose Job grade=2, we set Job_1 equal to zero and Job_2 equal to 1 • Employees whose Job grade=3 are represented when both Job_1 and Job_2 are equal to zero (thus job grade=3 becomes the ‘reference’ or ‘default’ category)
Representing 3-level Job Grade using dummy variables Job_1 and Job_2 Job Grade 3 is the reference category
Ready to use dummy variables? Drug Effectiveness Study: A pharmaceutical company wants to study the effectiveness of three different versions of a drug. The company refers to these versions as A, B and C. A clinical study whereby patients are treated using one of the three versions is administered. Data on drug effectiveness, age, and the version of the drug taken are provided for 36 patients in the spreadsheet. The company wants to know whether the three versions of the drug equal in their effectiveness. It also wants to know whether age influences the effectiveness of these three versions. Use a multiple regression model with dummy variables for Drug version for answering these questions. Create dummy variables for Drug versions A and B making version C as the reference level. You can follow the method described on slides 15 / 16 for creating dummy variables.
Model Results • Your regression equation is right if it matches the one shown below: • Effectiveness = 22.29 + 0.66*Age + 10.25*Drug_A + 0.45*Drug_B • For a 50 year old patient, predict effectiveness if she takes: • Drug_A • Drug_B • Drug_C