220 likes | 344 Views
Instrumental Variables: 2-Stage and 3-Stage Least Squares Regression of a Linear Systems of Equations. 2009 LPGA Performance Statistics and Prize Winnings www.lpga.com
E N D
Instrumental Variables: 2-Stage and 3-Stage Least Squares Regression of a Linear Systems of Equations 2009 LPGA Performance Statistics and Prize Winnings www.lpga.com S.J. Callan and J.M. Thomas (2007). “Modeling the Determinants of a Professional Golfer’s Tournament Earnings,” Journal of Sports Economics, Vol. 8, No. 4, pp. 394-411
Data Description • Prize Winnings and Performance Statistics for n = 146 professional women (LPGA) golfers for 2009 season • Exogenous Performance Variables: • Average Driving Distance • Percentage of Fairways reached on Drive • Percentage of Greens Reached in Regulation • Percentage of Sand Saves (in hole in 2 shots from close traps) • Average Putts per hole on greens reached in regulation • Numbers of Events, Events Completed, Rounds • Endogenous Result (Dependent & Independent) Variables: • Average Score per Round • Average Rank (Percentile in Tournaments) • Log(Prize Winnings)
Variables in Systems of Equations • Endogenous Variables – Jointly dependent (response) variables that are system determined. They can also appear as predictor variables in other equations • Exogenous Variables – Independent variables that do not depend on the endogenous variables • Predetermined Variables – Exogenous and lagged Endogenous variables • Instrumental Variables – Predetermined variables used to predict endogenous variables in first-stage regressions, with predicted values being used in place of the endogenous predictors in system of equations
System of Equations (Callan and Thomas, 2007) • Average Score (per 18 holes) is related to the golfers’ skills and experience (number of rounds played) • Average Rank (transformed to percentile) in tournaments is related to average score and the number of events she competed in • Season Earnings is related to average rank and the number of tournaments she completed
Potential Problems with Endogenous Predictors • When endogenous variables are included as predictors, they can be correlated with error terms for that equation, particularly when there are omitted variables that may be related to the outcome. This causes Ordinary Least Squares Estimates to be biased and inconsistent. • In equation 2, SCORE may be correlated with the error term without a variable measuring average course difficulty (Callan and Thomas, p. 402). • In equation 3, Rank may be correlated with the error term without a variable measuring golfer’s human capital investment such as diet and concentration level (Callan and Thomas, p. 402).
Model Building Process • Regress all endogenous variables (Score, Rank, and ln(Prize)) on all exogenous variables • Obtain the predicted values for each endogenous variable, based on the Regressions from 1. • In the system of equations, replace any “right hand side” endogenous predictors with their fitted values from 2. • Note that software (e.g. SAS and STATA) will fit all the regressions in 1., even if that variable does not appear as a predictor (ln(Prize) in this example). • This method provides correct estimates, but not ANOVA table or correct standard errors
First Stage Regressions for Score and Rank The fitted (predicted) values for SCORE will be used in equation 2 in place of SCORE, and the fitted values for RANK in equation 3. Equation 1 has no right hand side endogenous variables
Equation 1) - SCORE is related to SKILLS and experience • All variables except average driving distance are significant. • All else equal: • Average SCORE decreases as Percent Fairways Hit Increases (a 10% increase in fairways hit corresponds to a 0.19 decrease in SCORE) • Average SCORE decreases by 1.36 with a 10% increase in Greens in regulation • Average SCORE decreases by 0.16 with a 10% increase in Sand Saves • Average SCORE increases by 1.32 with a 0.1 increase in putts per Green in Regulation hole • Average SCORE decreases by 0.08 for 10 Round Increase in Rounds played
Equation 2) - Rank is related to SCORE and Events • Rank (as Percentile, with 100 meaning golfer won every tournament she played in) is: • Negative associated with predicted SCORE (decreases by 12.5 with unit increase in average SCORE) • Positively associated with number of Events (increases by 0.28 with a unit increase in # of EVENTS played) • Note: The estimated coefficients are correct, but the standard errors, t-tests, and Analysis of Variance are incorrect (see slide 11)
Equation 3) – ln(Prize) is related to Rank and Completed Events • Prize Winnings (in log form): • Increase with (Predicted) Rank. A 10% increase in Rank (percentile) increases ln(Prize) by 0.56 • Increase with Completed Events. For each tournament completed, ln(Prize) increases by 0.080. • Note: The estimated coefficients are correct, but the standard errors, t-tests, and Analysis of Variance are incorrect (see slide 11)
Robust Estimate of Variance of 2SLS Estimator Exact same method for equation 3
3-Stage Least Squares • Extension of 2-Stage Least Squares that allows for a covariance structure among the system of equations • Errors from 2SLS are obtained, and used to estimate the within individual (golfer) variance-covariance structure among the equations • The response vector is stacked with the n responses from model 1, being stacked over the n responses from model 2, which are stacked over the n responses from model 3. • The X matrices are “blocked” out diagonally, with 0 matrices off the blocked diagonal
Estimation Results EQ1 EQ2 EQ3
SAS Program data lpga2009;infile 'lpga2009.dat';input golfer drive fairway green putts sandsv prize lnprize events girputts complete aveposrank rounds strokes;lnprize1=log(prize);run; procsyslin 2sls out=regout;instruments drive fairway green girputtssandsv rounds events complete;strokes: model strokes = drive fairway green girputtssandsvrounds; output residual=e1;rank: model aveposrank = strokes events; output residual=e2;prize: model lnprize1 = aveposrankcomplete; output residual=e3;run;procsyslin 3sls data=lpga2009 itprint out=regout3;instruments drive fairway green girputtssandsv rounds events complete;strokes: model strokes = drive fairway green girputtssandsv rounds / xpx;output residual=e1;rank: model aveposrank = strokes events / xpx;output residual=e2;prize: model lnprize1 = aveposrank complete / xpx;output residual=e3;run;
STATA Program insheet using lpga_2009_meq.csv generate lnprize=ln(prize) reg3 (avestrokes=drive fairway green sandsvpctgirputtshole rounds) /// (averagepospct=avestrokes events) (lnprize=averagepospct completed), /// 2sls reg3 (avestrokes=drive fairway green sandsvpctgirputtshole rounds) /// (averagepospct=avestrokes events) (lnprize=averagepospct completed), /// 3sls