1 / 36

Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly

Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly. Chapter 17 Learning Objectives (LOs). LO 17.1: Use dummy variables to capture a shift of the intercept. LO 17.2: Test for differences between the categories of a qualitative variable.

yule
Download Presentation

Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Business Statistics: Communicating with Numbers By Sanjiv Jaggia and Alison Kelly

  2. Chapter 17 Learning Objectives (LOs) LO 17.1:Use dummy variables to capture a shift of the intercept. LO 17.2:Test for differences between the categories of a qualitative variable. LO 17.3:Use dummy variables to capture a shift of the intercept and/or slope. LO 17.4:Use a linear probability model to estimate a binary response variable. LO 17.5:Interpret the results from a logit model.

  3. Is There Evidence of Wage Discrimination? • Three Seton Hall professors recently learned in a court decision that they could pursue their lawsuit alleging the University paid higher salaries to younger instructors and male professors. • Mary Schweitzer works in human resources at another college and has been asked by the college to test for age and gender discrimination in salaries. • She gathers data on 42 professors, including the salary, experience, gender, and age of each.

  4. Is There Evidence of Wage Discrimination? • Using this data set, Mary hopes to: • Test whether salary differs by a fixed amount between males and females. • Determine whether there is evidence of age discrimination in salaries. • Determine if the salary difference between males and females increases with experience.

  5. 17.1Dummy Variables LO 17.1 Use dummy variables to capture a shift of the intercept. • In previous chapters, all the variables used in regression applications have been quantitative. • In empirical work it is common to have some variables that are qualitative: the values represent categories that may have no implied ordering. • We can include these factors in a regression through the use of dummy variables. • A dummy variable for a qualitative variable with two categories assigns a value of 1 for one of the categories and a value of 0 for the other.

  6. Variables with Two Categories LO 17.1 • For example, suppose we are interested in determining the impact of gender on salary. We might first define a dummy variable d that has the following structure: Let d = 1 if gender = “female” and d = 0 if gender = “male.” • This allows us to include a measure for gender in a regression model and quantify the impact of gender on salary.

  7. Regression with a Dummy Variable LO 17.1

  8. Regression with a Dummy Variable LO 17.1

  9. Regression with a Dummy Variable LO 17.1 Graphically, we can see how the dummy variable shifts the intercept of the regression line.

  10. Salaries, Gender, and Age LO 17.1

  11. Estimation Results LO 17.1

  12. Testing the Significance of Dummy Variables LO 17.2 Test for differences between the categories of a qualitative variable. • The statistical tests discussed in Chapter 15 remain valid for dummy variables as well. • We can perform a t-test for individual significance, form a confidence interval using the parameter estimate and its standard error, and conduct a partial F test for joint significance.

  13. Example 17.2 LO 17.2

  14. Multiple Categories LO 17.2

  15. Multiple Categories LO 17.2

  16. Avoiding the Dummy Variable Trap LO 17.2 • Given the intercept term, we exclude one of the dummy variables from the regression, where the excluded variable represents the reference category against which the others are assessed. • If we included as many dummy variables as categories, this would create perfect multicollinearity in the data, and such a model cannot be estimated. • So, we include one less dummy variable than the number of categories of the qualitative variable.

  17. 17.2 Interactions with Dummy Variables LO 17.3 Use dummy variables to capture a shift of the intercept and/or slope.

  18. Modeling Interaction LO 17.3

  19. Shifts in the Intercept and the Slope LO 17.3 Graphically, we can see how both the intercept and the slope might be impacted.

  20. Testing for Significance LO 17.3

  21. Example 17.4 LO 17.3 • We return to the introductory case, and are interested in whether gender impacts salary differently at different levels of experience. Does additional experience get a higher reward for one gender over the other? • Since age was not significant earlier, we consider three models, one with a dummy variable for gender, one with an interaction variable between gender and experience, and one with both a dummy variable and an interaction variable. • As before, we keep experience as a quantitative explanatory variable.

  22. Estimation Results LO 17.3

  23. Predicted Salaries LO 17.3 • The interaction term allows for male professors to have a different slope coefficient than female professors. • Conceptually, experience impacts the salary of each gender differently.

  24. 17.3: Binary Choice Models LO 17.4 Use a linear probability model to estimate a binary response variable. • So far, we have been considering models where dummy variables are used as explanatory variables. • There are, however, many applications where the variable of interest, the response variable, is binary. • Consumer choice literature has many applications including whether to buy a house, join a health club, or go to graduate school.

  25. The Linear Probability Model LO 17.4

  26. Weakness of LPM LO 17.4

  27. Example 17.5 LO 17.4

  28. Estimation Results LO 17.4

  29. Interpreting the Coefficients LO 17.4 • The coefficient of 0.0188 on Down Payment indicates that a 1 percent increase in the down payment will increase the probability of getting a loan by 0.0118. • Similarly, a 1 percent increase in the income-to-loan ratio will increase the probability of getting a loan by 0.0258. • One shortcoming of the LPM: if the down payment and the income-to-loan ratios are 60% and 30%, respectively, then the predicted probability is a whopping 1.0338, an impossible probability!

  30. The Logit Model LO 17.5 Interpret the results from a logit model. • In order to address the problem of the LPM that the predicted probabilities may be negative or greater than 1, we consider an alternative specification called the logistic model, more typically referred to as a logit model. • A logit model uses a nonlinear regression function that insures that the result is always in the interval [0,1]. • However, this feature comes with a cost, as interpreting the coefficients becomes more complicated and estimation cannot be done by OLS.

  31. Logistic Regression LO 17.5

  32. Logit versus Linear Probability Model LO 17.5

  33. Example 17.6 LO 17.5 • An educator wants to determine if a student’s interest in science is linked with the student’s GPA. • She uses Minitab to estimate a logit model where a student’s choice of field (1=science, 0=other) is predicted by GPA. • With a p-value of 0.0012, GPA is indeed a significant factor in predicting whether a student chooses science.

  34. Predicted Field Choice LO 17.5

  35. Example 17.7 LO 17.5

  36. Prediction Comparison LO 17.5 • Compared to the linear probability model, the logit model does not predict probabilities less than zero or greater than one. • Therefore, whenever possible, it is generally preferable to use the logit model rather than the linear probability model.

More Related