140 likes | 219 Views
Regression Continued. Example: Y [team finish] = + X [spending] Values of the Y variable (team finish) are a function of some constant, plus some amount of the X variable.
E N D
Example: Y [team finish] = + X [spending] Values of the Y variable (team finish) are a function of some constant, plus some amount of the X variable. The question we are interested in is how much change in the Y variable (team finish) is associated with a one-unit change in the X variable (spending). The answer lies in β (beta), this is know as the regression coefficient. In terms of the baseball example, it would be the amount of improvement in team finish (1=first, 2 = second, …. and 7 = last) associated with an additional $1 million in spending on players’ salaries. Q: Would we expect the relationship to be positive or negative?
Todd Donovan using 1999 season data and a bivariate regression found: Team finish = 4.4 – 0.029 x spending (in $millions) This means that the slope of the relationship between spending and team finish was –0.029. Or, for each million dollars that a team spends, there is only a 3 percent change in division position. This result is significant at the .01 level. These results show that a team spending $70 million on players will finish close to second place. We can also show that any given team would have to spend $35 million more to improve its team finish by one position (-0.029 x $35million = 1.105). Correlation was -0.39 which means spending explains only 15 percent of variation in the team’s finish (r2 = .15).
Another Baseball Example • Testing Causality Between Team Performance and Payroll : The Cases of Major League Baseball and English Soccer • By Stephen Hall, Stefan Szymanski and Andrew S. Zimbalist • Journal of Sports Economics 2002
Multiple Regression Multiple regression contains a single dependent variable and two or more independent variables. Multiple regression is particularly appropriate when the causes (independent variables) are inter-correlated, which again is usually the case.
II. Assumptions • Normality of the Dependent Variable: Inference and hypothesis testing require that the distribution of e is normally distributed. • Interval Level Measures: The dependent variable is measured at the interval level • The effects of the independent variables on Y are additive: For each independent variable Xi, the amount of change in E(Y) associated with a unit increase in Xi (holding all other ind. Variables constant) is the same. • The regression model is properly specified. This means there is no specification bias or error in the model, the functional form, i.e., linear, non-linear, is correct, and our assumptions of the variable are correct.
Multivariate Regression is a powerful tool to examine how multiple factors (independent variables) influence a dependent variable. It differs from bivariate regression in that it can identify the independent effect a variable has on a dependent variable by holding all other variables constant? What other variables would we include in the baseball model to predict winning %?
Why do we need to hold the independent variables constant? • Figures 1 and 2 may help clarify. Each circle may be thought of as representing the variance of the variable. The overlap in the two circles indicates the proportion of variance in each variable that is shared with the other.
Y X2 X1
Y X2 X1 c
In figure 1 the fact that X1 and X2 do not overlap means that they are not correlated, but each is correlated with Y. This is great and means we don’t need sophisticated analysis, just two separate bivariate regressions. In figure 2, X1 and X2 are correlated. The area C is created by the correlation between X1 and X2; c represents the proportion of the variance in Y that is shared jointly with X1 and X2. How do we deal with C? We can’t count it twice or we will get a variation that is greater than 100%. Multivariate Regression
Other Types of Regression Logit (Logistic) – dichotomous dep. variable Probit – dichotomous dep. variable Ordered logit or probit – ordinal dep. variable Multinomial logit/probit – nominal dep. Variable (more than 2 categories) Conditional logit/probit – similar to multinomial Probit Hierarchical Linear Models – for data that is clustered by time, space (and other conditions)