310 likes | 423 Views
How do Lawyers Set fees?. Learning Objectives. Model i.e. “Story” or question Multiple regression review Omitted variables (our first failure of GM) Dummy variables. Model. An example of how we can use the tools we have learned
E N D
Learning Objectives • Model i.e. “Story” or question • Multiple regression review • Omitted variables (our first failure of GM) • Dummy variables
Model • An example of how we can use the tools we have learned • Simple analyses that don’t have a complicated structure can often be useful • Question: Lawyers claim that they set fees to reflect the amount of legal work done • Our suspicion is that fees are set to reflect the amount of money at stake • Form of second degree price discrimination
Model • How to translate a story into econometrics and then test the story? • Our Idea: Fees are determined by the size of the award rather than the work done • Percentage fees • Price discrimination • Careful to consider alternatives: Insurance
Analysis • As always summarize and describe the data • Graph variables of interest (see over) • Regression to find percentage price rule
reg ins_allow award Source | SS df MS Number of obs = 91 -------------+------------------------------ F( 1, 89) = 133.05 Model | 2.7940e+09 1 2.7940e+09 Prob > F = 0.0000 Residual | 1.8689e+09 89 20999331.5 R-squared = 0.5992 -------------+------------------------------ Adj R-squared = 0.5947 Total | 4.6629e+09 90 51810441.4 Root MSE = 4582.5 ------------------------------------------------------------------------------ ins_allow | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- award | .1519855 .0131763 11.53 0.000 .1258046 .1781665 _cons | 5029.183 827.947 6.07 0.000 3384.07 6674.296 ------------------------------------------------------------------------------
Formulate Story as Hypothesis • Story is that lawyers charge a fee based on award • So null hypothesis is that coefficient on award is not zero • H0: b= 0 H1: b≠ 0 • Test hypothesis that award is not statistically significant • Stata does it automatically
H0: b= 0 H1: b≠ 0 • Calculate the test statistic assuming that H0 is true. t=(0.1519855-0)/0.0131763)=11.53 • Either find the test statistic on the t distribution and calculate p-value Prob (t>11.53)=0.000 Or compare with one of the traditional threshold (“critical”) values: N-k degrees of freedom 5% significance level: 1.96 • |t|>all the critical values and Prob (t>11.53)=0.0005 • So we reject the null hypothesis
Type 1 error • Note how we set up the hypothesis test • Null was that percentage charge was zero • Type one error is reject the null when it is true • The prob of type 1 error is the significance level • So there is a 5% chance of saying that lawyers charge a % fee when they do not
Some Comments • You could formulate the test as one sided • H0: b> =0 H1: b< 0 • H0: b<= 0 H1: b> 0 • Exercise to do this and think about which is best • Could also test a particular value • H0: b= 0.2 H1: b≠ 0.2
Omitted Variables • Our first Failure of GM Theorem • Key practical issue • Always some variables missing (R2<1) • When does it matter? • When they are correlated with the included variables • OLS becomes inconsistent and biased • Often a way to undermine econometric results • Discuss in two ways • State the issue formally • Use the lawyers example
Formally • Suppose we have model with z omitted yi = a+ xi + gzi + ui true model yi = a + bxi + uiestimated • Then we will have: • E(b) • b is a biased estimator of effect of x on y • also inconsistent: bias does not disappear as N • The bias will be determined by the formula • E(b) = + mg • = direct effect of x on y • g = direct effect of z on y • m= effect of z on x (from regression of z on x)
In Practice • OLS erroneously attributes the effect of the missing z to x • Violates GM assumption that E(u|x)=0 • From the formula, the bias will go away if • g=0 : the variable should be omitted as it doesn’t matter • m=0: the missing variable is unrelated to the included variable(s) • In any project ask: • are there missing variables that ought to be included (g≠0)? • could they be correlated with any included variables (m≠0) ? • What is the direction of bias?
Lawyers Example • Suppose we had the simple model of lawyers fees as before. • A criticism of this model is that it doesn’t take account of the work done by lawyers • i.e. measure of quantity and quality of work are omitted variables • This invalidates the est of b • This is how you could undermine the study
Is the criticism valid? • these variables ought to be included as they plausibly affect the fee i.e. g≠0 • They could be correlated with the included award variable (m≠0) • it is plausible that more work may lead to higher award • or higher wards cases may require more work • Turns out not to matter in our case because award and trial are uncorrelated • Not always the case: use IV
Dummy Variables • Record classifications • Dichotomous: “yes/no” e.g. gender, trial, etc • Ordinal e.g. level of education • OLS doesn’t treat them differently • Need to be careful about how coefficients are interpreted • Illustrate with “trial” in the fee regression • Trial =1 iff case went to court • Trial =0 iff case settled before court
Our basic model is feei = 1 + 2awardi + ui • This can be interpreted a predicting fees based on awards i.e. E[feei]= 1 + 2E[awardi] • Suspect that fee is systematically different if case goes to trial feei = 1 + 2awardi + 3Triali + ui
Now theprediction becomes: E[feei]= 1 + 2 E[awardi]+ 3 iff trial E[feei]= 1 + 2 E[awardi] iff not • Note that “trial” disappears when it is zero • This translates into separate intercepts on the graph • The extra € for bringing a case to trial • Testing if 3 is significant is test of significant difference in fees between the two groups • For price discrimination story: award still significant
regress ins_allow award trial Source | SS df MS Number of obs = 91 -------------+------------------------------ F( 2, 88) = 78.43 Model | 2.9871e+09 2 1.4936e+09 Prob > F = 0.0000 Residual | 1.6758e+09 88 19043267.3 R-squared = 0.6406 -------------+------------------------------ Adj R-squared = 0.6324 Total | 4.6629e+09 90 51810441.4 Root MSE = 4363.9 ------------------------------------------------------------------------------ ins_allow | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- award | .1489103 .0125847 11.83 0.000 .1239009 .1739197 trial | 5887.706 1848.795 3.18 0.002 2213.616 9561.797 _cons | 4798.368 791.7677 6.06 0.000 3224.896 6371.84 ------------------------------------------------------------------------------
Interaction • While the intercept could be different the slope could be also i.e. the degree of price discrimination could be different between the two groups • Model this by an “interaction term” feei = 1 + 2awardi + 3Triali + 4 awardi*Triali + ui
Now theprediction becomes: E[feei]= 1 + (2 + 4 )*E[awardi]+ 3 iff trial E[feei]= 1 + 2 E[awardi] iff not • Note that “trial” disappears when it is zero • This translates into separate intercepts and slopes on the graph • The extra € for bringing a case to trial and an extra % • Testing if 4 is significant is test of significant difference in % fee between the two groups
gen interact=trial*award regress ins_allow award trial interact Source | SS df MS Number of obs = 91 -------------+------------------------------ F( 3, 87) = 52.34 Model | 3.0004e+09 3 1.0001e+09 Prob > F = 0.0000 Residual | 1.6625e+09 87 19109443.6 R-squared = 0.6435 -------------+------------------------------ Adj R-squared = 0.6312 Total | 4.6629e+09 90 51810441.4 Root MSE = 4371.4 ------------------------------------------------------------------------------ ins_allow | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- award | .1468693 .012842 11.44 0.000 .1213445 .1723941 trial | 2444.119 4526.143 0.54 0.591 -6552.081 11440.32 interact | .0561776 .0673738 0.83 0.407 -.0777352 .1900904 _cons | 4901.306 802.6927 6.11 0.000 3305.868 6496.745 ------------------------------------------------------------------------------
Multiple Hypotheses • A little weird that the interact and trial variables are insignificant • Possible that they are jointly significant • Formally: H0: 4=0 and 3=0 H1: 4≠0 and 3≠0 • This is not the same as two t-tests in sequence • Use F-test of “Linear Restriction” • Turns out t-test is a special case
Procedure • Estimate the model assuming the null is true i.e. impose the restriction • Record R2 for the restricted model • R2r=0.5992 • Estimate the unrestricted model i.e. assuming the null is false • Record the R2 for the unrestricted model • R2u= 0.64350.5992
Form the Test statistic r = number of restrictions (count equals signs) N = number of observations Ku = number of variables (and constant) in the unrestricted model • Compare with the critical value from F tables: F (r, N- Ku) • If test statistic is greater than critical value: reject H0 • F(2,87)= 3.15 at 5% significance level
Comments/Intuition • Imposing a restriction must make the model explain less of the dep variable • If it is “a lot” less then we reject the restriction as being unrealistic • How much is “a lot”? • Compare the two R2 (not “adjusted R2”) • Scale the difference • Compare to a threshold value • Critical value is fn of 3 parameters: df1, df2, significance level • Note doesn’t say anything about the component hypotheses • Could do t-tests this way: stata does • Sata automatically does H0: 2=0 …k=0
Conclusions • We had four learning objectives • Model i.e. “Story” or question • Multiple regression review • Dummy variables • Omitted variables (the first failure of GM) • What’s Next? • More examples • More problems for OLS