330 likes | 480 Views
Welcome to BUAD 310. Instructor: Kam Hamidieh Lecture 26, Wednesday April 30, 2014. Agenda & Announcement. Do a big Multiple Regression problem. This will serve as a review for the final too. Cover a couple of minor but interesting topics
E N D
Welcome to BUAD 310 Instructor: Kam Hamidieh Lecture 26, Wednesday April 30, 2014
Agenda & Announcement • Do a big Multiple Regression problem. This will serve as a review for the final too. • Cover a couple of minor but interesting topics • Go over slides on ANCOVA (Analysis of Covariance), Chapter 25, on your own. (Not on the final exam.) BUAD 310 - Kam Hamidieh
Some Important Dates • Case Study due today April 30 by 5 PM. • Homework 7, now posted, is due on Friday, May 2, by 5 PM. I will have extra office hours on Thursday May 1, 3-5 PM. • Final Exam on Thursday May 8th, 11 AM – 1:00 PM, in room THH 101. See http://web-app.usc.edu/maps/ BUAD 310 - Kam Hamidieh
About The Final Exam • Time: Thursday May 8th, 2014, 11 AM – 1 PM • Location: THH 101See: http://web-app.usc.edu/maps/ • No cell phones are allowed! • It is comprehensive. Anything from anytime during the semester is fair. However, 40% to 50% of the questions are on multiple regression. • A high level list of topics which covers midterm 2 to now will be posted. • You’ll have 40 questions. • The coversheet will be posted a few days before the exam. You’ll need to sign it. • You’ll be allowed three cheat sheets, both sides okay, hand written. BUAD 310 - Kam Hamidieh
How To Study For the Final • Review multiple regression *very* thoroughly from our slides and in class exercises • Retake our exam 1 & exam 2 • Take the final from last semester • Rework our review problems for exams 1 and 2 • Retake last semester's exams 1 & 2 • Go over the slides and rework carefully the in class exercises • Rework homework problems While you are doing the above, prepare your cheat sheets. All final exam related material will be posted. BUAD 310 - Kam Hamidieh
Extra Office Hours • My extra office hours: • Tuesday 5/6: 3:00-6:00 PM • Wednesday 5/7: 3:00-6:00 PM • Courtney Paulson, our TA, office hours at ACC 303: • Monday, 5/5: 1:00-3:00 • Tuesday, 5/6: 9:00-11:00 • Wednesday, 5/7: 9:00-11:00 and 1:00-3:00 • Thursday, 5/8: 9:00-11:00 BUAD 310 - Kam Hamidieh
What to Bring to the Final Exam • Your three page cheat sheet, both sides ok, hand written, 8.5 by 11 inches • Z and T tables (I will not have any extras!) • Scantron sheet • Pencils • Eraser • Calculator (must have natural log and exp functions) BUAD 310 - Kam Hamidieh
From Last Time • Multicollinearity: • when predictors are highly correlated, you can get unstable regression results. • You can check: plots, correlation matrix, VIF, regression output • Confidence and Prediction Intervals: • Approximate 95% prediction interval: • Use software to get CI for mean response & PI for a single value BUAD 310 - Kam Hamidieh
In Class Exercise 1 This will be handed out in class. BUAD 310 - Kam Hamidieh
Set up for Simpson’s Paradox(Source: Introduction to the Practice of Statics by Moore, McCabe, Craig, 7thed) A customer service center has a goal of resolving customer questions in 10 minutes or less. Below are two representatives. The data were collected over a two week period. Who has the better success rate, Arnold or Paul? BUAD 310 - Kam Hamidieh
A Closer Look at the Data Now the data are broken down by week. Who did better in week 1? Who did better in week 2? How do the week by week result compare to the combined results? BUAD 310 - Kam Hamidieh
Simpson’s Paradox • The combined results showed that Arnold did better. However, the disaggregated (weekly) shows that Paul beat Arnold every week. Strange?!?! • An association or comparison that holds for several groups can reverse direction when the data are combined to form a single group. This reversal is called Simpson’s Paradox. BUAD 310 - Kam Hamidieh
More on Simpson’s Paradox • The results can be explained by a lurking variable: week. • Simpson’s paradox is an extreme form of the fact that observed associations can be misleading when there are lurking variables. • When we aggregated the weekly data in our example, we ignored the variable week, which then becomes a lurking variable. • Conclusions that seem obvious when we look only at aggregated data can become quite different when the data are examined in more detail. • See http://en.wikipedia.org/wiki/Simpson's_paradoxfor more examples. BUAD 310 - Kam Hamidieh
Properties of Estimators • Throughout the course we have been estimating unknown parameters. Some examples: • The estimators are random variables. • Statisticians study the theoretical properties of the estimator. BUAD 310 - Kam Hamidieh
Properties of Estimators • An estimator used to estimate a parameter is unbiased if the mean of its sampling distribution is equal to the true value of the parameter being estimated. • Theoretically:Unbiased if : E[ Estimator ] = population parameterBiased if : E[ Estimator ] ≠ population parameter • All of the estimators we have talked about are unbiased (as long as our assumptions are met.) • Now I can explain why we divided by n-1 and not n in the definition of the sample standard deviation! BUAD 310 - Kam Hamidieh
Bias in Sample Standard Deviation • It seems more “natural” to define sample standard deviation, which estimates the population standard deviation, as: • However, it can be shown that E[ s ] ≠ σ, that is s as defined above is biased! • Interestingly, if we replace n by n – 1 in the above definition of s, we get E[ s ] = σ. BUAD 310 - Kam Hamidieh
Analysis of Covariance (On your own!) • In multiple regression model, all your variables are numerical; none are categorical. • How do you incorporate categorical variables into your regression model? • Analysis of covariance (ANCOVA) combines categorical and numerical predictor variables. BUAD 310 - Kam Hamidieh
Example We have a sample of salaries of n = 174 mid-level managers at a large firm. Variable are: • Salary = in 1,000’s of dollars • Experience = years of experience • Group = 1 for men, 0 for women • Sex = male/female Is there a difference between the salaries of the two groups? BUAD 310 - Kam Hamidieh
Example – Summary Statistics Letμmen = population mean salary for men in this firm μwomen= population mean salary for women in this firm Summary Statistics: Do you think there is a statistically significant difference between the two group? What should you do? BUAD 310 - Kam Hamidieh
Example – 95% CI for Difference Letμmen = population mean salary for men in this firm μwomen= population mean salary for women in this firm Results of 95% confidence interval for the difference μmen- μwomen: 95% CI says that on average men make $830 to $8,600 more than the women in this firm. BUAD 310 - Kam Hamidieh
Example – Separate Regressions Note: Both lines were statistically significant with small r2 values. (See next slide.) • The line rises faster for women. How could you tell? (Steeper slope) • If parallel, there would be salary gap. Why? (For a given X, years of experience, one group would always be higher than the other.) • Over 11 years, it appears that women have higher mean salary. Why? (Red line is above blue line) BUAD 310 - Kam Hamidieh
Simple Regression Outputs From statistical perspective, the lines could be overlapping: they could have the same slope and intercept. Once adjusted for the years of experience, there is no difference in the salaries. BUAD 310 - Kam Hamidieh
ANCOVA • We can combine the separate regression analysis into one regression. • By combing the analyses into one, we can obtain better estimates since we have more data. • We’ll need two new variables: a dummy variable and an interaction variable. BUAD 310 - Kam Hamidieh
Dummy Variables & Interaction • Combining the separate regressions for men and women requires a dummy variable identifying whether a manager is male or female (Group = 1 for men; Group = 0 for women). • A dummy variable is a numerical variable usually consisting of 0s and 1s used to code a categorical variable. • The group that gets coded as zero forms the baseline group. • The interaction term here will be just formed by new variable Group × Years. • When there is interaction between two predictor variables, the effect on the response variable of one predictor variable depends on the specific value of another predictor variable. • Seehttp://www.webmd.com/news/20121127/grapefruit-some-medications-risky BUAD 310 - Kam Hamidieh
Our Regression Model Our Regression model is:μSalary|years, group = B0 + B1(Years) + B2(Group) + B3(Group × Years) When Group = 0: μSalary|years, group=0 = B0 + B1(Years) (How?) Note: The intercept and the slope of years are the intercept and slope for the equation of female manager. When Group = 1: μSalary|years, group=1 = (B0 + B2)+ (B1 + B3)(Years) (How?) Note: The slope of the dummy variable is the difference between estimated intercept. The slope of the interaction is the difference between estimated slopes in the separate simple regressions. BUAD 310 - Kam Hamidieh
Some Complications • All we need to do is to add two more columns to our data and perform the multiple regression. • The use of dummy and interaction variables do not change the conditions of multiple regression model. • However, when performing ANCOVA, we must have that the two groups have the same variance. We’ll use plots to check. (How? See Slide 28.) • Sometimes the interaction terms can introduce multicollinearity; but these can be detected with what we learned from last time. BUAD 310 - Kam Hamidieh
Performing the ANCOVA BUAD 310 - Kam Hamidieh
Checking for Similar Variance Residuals from the model were obtained. Comparing the two groups of residuals seems to confirm that the variance for the two groups is the same. BUAD 310 - Kam Hamidieh
Estimated Regression Model Estimated Mean Salary = 131 + 1.18(Years) + 4.61(Group) - 0.41(Group ×Years) BUAD 310 - Kam Hamidieh
Estimated Regression Model Estimated Mean Salary = 131 + 1.18(Years) + 4.61(Group) - 0.41(Group ×Years) Estimated Coefficients: b0 = 131, b1 = 1.18, b2 = 4.61, b3 = -0.41 For Women:Estimated Mean Salary Women = 131 + 1.18(Years) For Men:Estimated Mean Salary Men = (131 + 4.61) + (1.18 – 0.41) (Years) 0.77 135.61 Interpretation:b0 and b1 have the same interpretation as before but for women only. b2 : On average male managers with no experience make $4611 more than women with no experience. (Years = 0) b3 : The average salary increases by $410 per year of experience faster for women than for men BUAD 310 - Kam Hamidieh
Modifying Our Model New Model: Estimated Mean Salary = 133 + 0.85(Years) + 1.02(Group) Note that this new model implies that the slopes are the same for women and men. BUAD 310 - Kam Hamidieh
Conclusion • This model finds no statistically significant difference between the average salaries of male and female managers when comparing managers with equal years of experience. • The initial t-test can be explained by the difference in experience rather than whether the manager is male or female. The coefficient for the Group is not statistically significant. BUAD 310 - Kam Hamidieh
A Follow Up… Just a aside, many comprehensive studies have shown that the wage gap exists. See for example: http://www.npc.umich.edu/publications/working_papers/paper1/03-1.pdf http://www.gao.gov/products/A83444 BUAD 310 - Kam Hamidieh