410 likes | 506 Views
What is the MPC?. Learning Objectives. Use linear regression to establish the relationship between two variables A more formal approach to hypothesis testing. Consumption Function. Keynesian Consumption function income today, consumption today C= a+b *Y
E N D
Learning Objectives • Use linear regression to establish the relationship between two variables • A more formal approach to hypothesis testing
Consumption Function • Keynesian Consumption function • income today, consumption today • C=a+b*Y • Econometrics : quantify economic relationships • What are “a” and “b”
Look at some data • Look at individual level data: individual.dta • Stata: scatter cons nmwage • This gives a scatter plot with the first variable on the vertical axis and the second variable on the x axis
Two Obvious facts • Observe many households at different income levels • There is clearly a positive relationship • cons depends on income but households with same income will not have same consumption • other factors influence consumption
How do we Calculate the MPC? • Draw a line • Many possible lines • Intuition tells us that an “average” line would be a better estimate • We will show why this intuition is correct later • Any line we draw (even the “best”) will not go through all the points • There will be deviations from the line
Conditional Expectation • As an alternative to the line we could follow the logic of the gender example from the pervious section and look at conditional expectation • Recall we answered the question of gender discrimination by comparing the average wage of two groups • The expected waged conditional on being a man or woman • we used the “summ if” command • Formally • E(hwage|gender==1)=6.701875 • E(hwage|gender==2)=5.451302
Conditional Expectation • We can apply the same logic to the consumption function. • Divide in two groups • Rich: nmwage>1000 • Poor:nmwage<1000 • generate rich=(nmwage>1000) • Compare the average consumption of each using summ if
Conditional Expectation • We get average consumption conditional on being rich or poor • E(Cons|Rich)= 1297.3 • E(Cons|Poor)= 599.89 • We can measure the marginal propensity of consume by taking the average income of each group • E(nmwage|Rich)= 1611.698 • E(nmwag|Poor)= 711.9268
Conditional Expectation • As you move from “poor” to “rich” your income rises by: • 1611 -711=900 • And consumption rises by: 1297-599=698 • So an estimate of the MPC would be 698/900 which is 0.77 • This is a simple and intuitive method that builds on the logic of the gender example • But…..
Obvious Problem • The division between risk and poor was entirely arbitrary • Not natural like gender • We throw away information by forcing individuals into one group or another • Why not have 3 groups or any number of groups you like • Intuitively the more the better • 10 group example • But large numbers of groups would make calculations tedious and would always leave out some information
Compromise • Imagine there are an infinity of groups but the conditional means are all related • Specifically they have a linear relationship • E(cons|nmwage)=a+b*nmwage • From now on we will write in more general notation • E(Y|X)=b1+b2X
Comment • Note this is a restriction and it may not be true in the real world • We impose it on the model • Looks reasonable in the consumption example • If it isn't true then there might be a problem • Linear approx • GIGO • Relationship doesn’t have to be linear but it does have to be parametric • We will see more on this later
So to Recap… • We have data that appears to illustrate a relationship between two variables • Intuitively we will put a line through the data that represents the data in some way • What way? Two ways: • the line links all the conditional means • We choose the particular line that is closest to the data in a defined way • These turn out to be the same
Draw a line to represent the data Show three data points for illustration
An Explanation • Change in notation to be more general • Y is the LHS or dependent variable • X is the RHS or independent variable • E(Y|Xi) = conditional mean i.e. does not describe every observation • Yi = E(Y|Xi) +ui • uirepresents the deviation of each individual observation from the conditional mean • Yi= E(Y|Xi) + ui • Yi = 1+2 Xi+ ui
What is Ui? • Any factor other than income (X) which influences consumption (Y) • individual tastes and unpredictability • approximation error because of assumption of linear relationship • Later we will model this a random variable • Perhaps with a normal distribution • Remember our warnings about the bell curve
OLS Estimation • Find line of “best fit” • Method of Ordinary Least Squares (OLS) to estimate 12 • Objective: find estimates of 12 that minimizes the distance between the regression line and the actual data points, i.e. minimize the error terms • Minimisethe sum of squared deviations i.e. • Aside: why not absolute deviation or others?
Algebra of OLS • min i ui2 i.e. min (u12 + u22+u32+…+ui2) • Yi = 1+2Xi+ui => ui = Yi - 1+2X • i ui2 = i (Yi - 1+2X)2 = S(1 , 2) • => sum of squared errors is a function of 1 , 2 • min S(1 , 2) = min i (Yi - 1+2X)2
To find minimum of any function: differentiate with respect to the arguments and set derivative = 0 i.e. find the point where the slope with respect to the argument = 0.
An Explanation • b1, b2 are the Ordinary Least Squares (OLS) estimators of the true population parameters 1 , 2. • b2 is the estimator of the slope coefficient: the slope coefficient measures the effect on y of a one unit change in x • b1 is the estimator of the intercept: the value of Y which occurs if X=0;
OLS in stata i ui2 regress cons nmwage Source | SS df MS Number of obs = 1330 -------------+------------------------------ F( 1, 1328) = 605.97 Model | 98124170.1 1 98124170.1 Prob > F = 0.0000 Residual | 215041332 1328 161928.714 R-squared = 0.3133 -------------+------------------------------ Adj R-squared = 0.3128 Total | 313165502 1329 235639.956 Root MSE = 402.4 ------------------------------------------------------------------------------ cons | Coef. Std. Err. t P>|t| [95% Conf. Interval] -------------+---------------------------------------------------------------- nmwage | .7562304 .0307205 24.62 0.000 .6959644 .8164964 _cons | 62.47876 25.9165 2.41 0.016 11.63701 113.3205 ------------------------------------------------------------------------------ Estimated coef
The Answer • The regression gives us a measure of the MPC • The OLS estimate of the MPC is 0.756 • What use is this • Prediction • Causation • Statistical inference
Prediction • We can use this to make predictions • What would the consumption be if income were 2500 • Cons=62.47876 + 0.7562304*2500 • This is equal to 1953 • Be careful this is the predicted conditional mean • It is the next point on the line • What people with 2500 would consume on average • What they actually will consume is unknown because we don’t observe their Ui
Predicted Consumption Actual Consumption Predicted Cons
Causation • Remember all this only really identifies variables that move together • It doesn’t show causation • Need theory for that • Obvious in the gender example (wages don’t cause changes in gender) • Not obvious here causation can run both ways
Statistical Inference • This estimate is generated from a sample • Recall that the issue is whether we can use this fact about the sample to make statements about the world (“population”) • The same issues of statistical inference arise in context of regression • OLS estimates are sample statistics just like the sample average wages in the gender example
More on the Residual (Ui) • The residual is the difference between the line (conditional expectation) and the actual data • Think of every individuals consumption as being made up of two bits • Conditional expectation • Residual • The conditional expectation is that same for everyone with the same X (income) • Residual is potentially different even for those with same income
Random Variable • Residual is unknown in advance so we model it as a random variable • Think of consumption being determined by systematic bit plus a roll of a dice • See diagram • Actual consumption (expectation+residual) is distributed around the mean • All the means are linked
Empirical Distribution • We can use the histcomand in stata to look at this • Just as we got distribution of hwage for men and women • hist cons, by(rich) norm • We could do the same for any income group • hist cons if nmwage<1100 &nmwage >900, norm • All OLS does is draw a line through all the means • Imagine laying all these distributions side by side
Linking the Means • We assume that there is a linear relationship between the means of all these distributions • Imagine taking each and lining them up in order of their average • Get the next diagram
Putting it all together • We usually assume that the residual is a normal random variable • Seems reasonable in this case • But remember our concerns about normal • So the full model is • Yi = 1+2 Xi + ui • WhereE(Y|Xi)= 1+2 Xi • And ui~N(0,s2)