860 likes | 952 Views
Lesson 11:. Regressions Part II. Does watching television rot your mind?. Zavodny , Madeline (2006): “ Does watching television rot your mind? Estimates of the effect on test scores ,” Economics of Education Review , 25 ( 5 ) : 565–573.
E N D
Lesson 11: Regressions Part II
Does watching television rot your mind? • Zavodny, Madeline (2006):“Does watching television rot your mind?Estimates of the effect on test scores,” Economics of Education Review, 25 (5): 565–573 • Television is one of the most omnipresent featuresof Americans’ lives. The average American adultwatches about 15 h of television per week, accountingfor almost one-half of free time. • The substantial amount of time thatmost individuals spend watching television makes itimportant to examine its effects on society, includinghuman capital accumulation and academicachievement.
Data & Regression model • This analysis uses three data sets to examine therelationship between television viewing and testscores: the National Longitudinal Survey of Youth1979 (NLSY), the HSB survey and the NELS. Eachsurvey includes test scores and a question about thenumber of hours of television watched by youngadults. Test score of individual i at time t
Regression results **p<0.01; *p<0.05; †p<0.1
Multiple Linear Regression Model • Relationship Between Variables Is a Linear Function Random Error Y intercept Slope Y = b0 + b1X1 + b2X2 + b3X3 + … + bkXk + e Dependent (Response) Variable Independent (Explanatory) Variable
Finance Application: multifactor pricing model • It is assumed that rate of return on a stock (R) is linearly related to the rate of return on some factor and the rate of return on the overall market (Rm). Rit = b0 + boi Rot+ b1Rmt +e Rate of return on some major stock index Rate of return on a particular oil company stock i at time t The rate of return on crude oil price on date t
Estimation by Method of momentsNumber of moment condition needed Y = b0 + b1X1 + b2X2 + b3X3 + … + bkXk + e • Assumption #1 • E(e) = 0 implies E(y) – b0 – b1 E(x1) – b2 E(x2) - … bk E(xk)= 0 • Assumption #2 • E(ex1) =0 implies E[(y – b0 – b1x1 - … - bkxk)x1]=0 • Since Cov(e, x1) = E(ex1) – E(e)E(x1) = E(ex1), the assumption really imply e and x are uncorrelated. • Assumption #3: E(ex2) =0 • Assumption #4: E(ex3) =0 • … • Assumption #k+1: E(exk) =0 k+1 parameters to estimate. Need k+1 moment conditions.
Estimation of b0, b1, b2,…, bk Method of moments • Two approaches: • Solve the b0, b1, b2,…, bk from the k+1 moment conditions, in terms of covariances, variances and means. Plug in to sample analog of these covariances, variances and means ro produce the sample estimate b0, b1, b2,…, bk • Assume b0, b1, b2,…, bk, solve them from the sample analog of the k+1 moment conditions.
Estimation of b0, b1, b2,…, bk Maximum Likelihood • Assume ei to be independent identically distributed with normal distribution of zero mean and variance s2. Denote the normal density for e be • f(e)=f(y-b0-b1x1-b2x2-…-bkxk) normal density • Choose b0, b1, b2, …, bk to maximize the joint likelihood: • L(b0, b1, b2, …, bk) = f(e1)*f(e2)*…*f(en) f(e)= f(y-b0-b1x1-b2x2-…-bkxk)
To estimate b0 and b1 using ML (Computer) • We do not know b0, b1, b2, …, bk. Nor do we know ei. In fact, our objective is estimate b0, b1, b2, …, bk. • The procedure of ML: • Assume a combination of b0, b1, b2, …, bk, call it b0, b1, b2, …, bk. Compute the implied ei = yi-b0-b1x1i-b2x2i-…-bkxki and f(ei)=f(yi-b0-b1x1i-b2x2i-…-bkxki) • Compute the joint likelihood conditional on the assumed values of b0, b1, b2, …, bk: • L(b0, b1, b2, …, bk) = f(e1)*f(e2)*…*f(en) • Assume many more combination of b0, b1, b2, …, bk, and repeat the above two steps, using a computer program (such as Excel). • Choose the b0, b1, b2, …, bk that yield a largest joint likelihood.
To estimate b0 and b1 using ML (Calculus) • The procedure of ML: • Assume a combination of b0, b1, b2, …, bk, call it b0, b1, b2, …, bk. Compute the implied ei = yi-b0-b1x1i-b2x2i-…-bkxki and f(ei)=f(yi-b0-b1x1i-b2x2i-…-bkxki) • Compute the joint likelihood conditional on the assumed values of b0, b1, b2, …, bk: • L(b0, b1, b2, …, bk) = f(e1)*f(e2)*…*f(en) • Choose b0, b1, b2, …, bk to maximize the likelihood function L(b0, b1, b2, …, bk) – using calculus. • Take the first derivative of L(b0, b1, b2, …, bk) with respect to b0, set it to zero. • Take the first derivative of L(b0, b1, b2, …, bk) with respect to bj, set it to zero. • Solve b0, b1, b2, …, bk using the k+1 equations.
Estimation Ordinary least squares • For each value of X, there is a group of Y values, and these Y values are normally distributed. Yi~ N(E(Y|X1, X2,…,Xk), i2), i=1,2,…,n • The means of these normal distributions of Y values all lie on the straight line of regression. E(Y|X1, X2,…,Xk) = 0+ 1X1 + 2X2 +… + kXk • The standard deviations of these normal distributions are equal. i2= 2 i=1,2,…,n i.e., homoskedasticity
Choosing the line that fits bestOrdinary Least Squares (OLS) Principle • Straight lines can be described generally by yi = b0 + b1x1i+ b2x2i +…+ bkxkii=1,…,n • Finding the best line with smallest sum of squared difference is the same as Min S(b0,b1) = S[yi – (b0 + b1x1i+ b2x2i +…+ bkxki)]2 • It can be shown the minimization yields the similar sample moment conditions as discussed earlier in the method of moments.
It can be shown that the estimators are BLUE • Best: smallest variance • Linear: linear combination of yi • Unbiased: E(b0) = b0, E(b1) = b1 • Estimator
Interpretation of Coefficients yi = b0 + b1x1i + b2x2i + …+ bkxki + ei Prediction: y* = b0 + b1x1 + b2x2 + …+ bkxk • Slope (bj) • Estimated Y changes by bj for each 1 unit increase in Xj,, holding other variables constant y* + Dy= b0 + b1x1 + …+ bj(xj+1)+…+ bkxk Dy= bj More generally, y* + Dy= b0 + b1x1 + …+ bj(xj+Dxj)+…+ bkxkDy= bjDxj Dy/Dx = b1 • Y-Intercept (b0 ) • Estimated value of Y when X1 = X2 = … = Xk = 0
Parameter Estimation Example You’ve collected the following data: RespSizeCirc 1 1 2 4 8 8 1 3 1 3 5 7 2 6 4 4 10 6 • You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00). y x1 x2
Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 0.0640 0.2599 0.246 0.8214 ADSIZE 1 0.2049 0.0588 3.656 0.0399 CIRC 1 0.2805 0.0686 4.089 0.0264 ParameterEstimation Computer Output • Slope (b1): # Responses to Ad is expected to increase by .2049 (20.49) for each 1 sq. in. increase in Ad SizeHolding Circulation Constant • Slope (b2): # Responses to Ad is expected to increase by .2805 (28.05) for each 1 unit (1,000) increase in circulationHolding Ad Size Constant
Interpreting the Standard Error of the Estimate • Assumptions: • Observed Y values are normally distributed around each estimated value of Y* • Constant variance • se measures the dispersion of the points around the regression line • If se = 0, equation is a “perfect” estimator • se may be used to compute confidence intervals of the estimated value
Test of Slope Coefficient (bj) • Tests if there is a linear relationship between Xj & Yafter other variables are controlled for. • Involves population slope bj • Hypotheses • H0: bj= 0 (Xj should not appear in the linear relationship) • H1: bj 0 • Theoretical basis is sampling distribution of slopes
Basis for Inference About the Population Regression Slope • Let j be a population regression slope and bj its least squares estimate based on n data points. Then, if the standard regression assumptions hold and it can also be assumed that the errors i are normally distributed, the random variable t= (bj – bj) / Sbj is distributed as Student’s t with (n – k - 1) degrees of freedom. In addition the central limit theorem enables us to conclude that this result is approximately valid for a wide range of non-normal distributions and large sample sizes, n.
Confidence Intervals for the Population Regression Slope j • If the regression errors i , are normally distributed and the standard regression assumptions hold, a 100(1 - )% confidence interval for the population regression slope j is given by bj - t(n-k-1),a/2 Sbj < bj < bj + t(n-k-1),a/2 Sbj
Some cautions about the interpretation of significance tests • Rejecting H0: bj = 0 and concluding that the relationship between xj and y is significant does not enable us to conclude that a cause-and-effect relationship is present between xj and y. • Causation requires: • Association • Accurate time sequence • Other explanation for correlation Correlation Causation
Some cautions about the interpretation of significance tests • Just because we are able to reject H0: bj = 0 and demonstrate statistical significance does not enable us to conclude that the relationship between x and y is linear. • Linear relationship is a very small subset of possible relationship among variables. • A test of linear versus nonlinear relationship requires another batch of analysis.
EvaluatingtheModel • Are the assumptions valid? • Assumption #1: Linearity • Assumption #2: A set of variables should be included. • Assumption #3: The explanatory variables are uncorrelated with error term. • Assumption #4: The error term has a constant variance. • Assumption #5: The errors are independent of each other. yi = b0 + b1x1i+ b2x2i + … + bkxki + ei
Measures of Variation in Regression • Total Sum of Squares (SST) • Measures variation of observed Yi around the mean,Y • Explained Variation (SSR) • Variation due to relationship between X & Y • Unexplained Variation (SSE) • Variation due to other factors • SST=SSR+SSE
Variation in y (SST) = SSR + SSE SST: =0, as imposed in the estimation, E(ex)=0. SSR SSE
Variation Measures Unexplained Sum of Squares (Yi -Yi*)2 Y SSE Yi Total Sum of Squares (Yi - Y)2 yi* = b0 +b1xi SST Explained Sum of Squares (Yi* - Y)2 SSR Y X X i
Variation in y (SST) = SSR + SSE • R2 (=r2,the coefficient of determination)measures the proportion of the variation in y that is explained by the variation in x. • R2 takes on any value between zero and one. • R2 = 1: Perfect match between the line and the data points. • R2 = 0: There are no linear relationship between x and y.
Adjusted R-square • (unadjusted) R-square increases with the number of variables included. • Thus, using R-square as a measure, we will always conclude a model with more variables are better. • However, adding a new variables is costly. Additional variable may add to the uncertainty of estimating y. • Thus, we would like to have a measure that will penalize the addition of variables. Fix an R2, adjusted R2 decreases with k. Fix k, adjusted R2 increases with R2.
International price discrimination • Cabolis, Christos, Sofronis Clerides, Ioannis Ioannou and Daniel Senft (2007): “A textbook example of international price discrimination,” Economics Letters, 95(1): 91-95.
Motivation • International price comparisons have a long history in economics. Macroeconomists have used themextensively to test for purchasing power parity and the law of one price. International trade economistshave been interested in international price differences as evidence of trade barriers while industrialorganization economists have studied issues of market structure. The popular and business press have alsoshown a keen interest and frequently report intercity price comparisons for standardized products such asthe Big Mac or a Starbucks cappuccino. • The paper documents the existence of very large differences in the prices of textbooksacross countries.
Data • Our data were collected from the Internet sites of Amazon.com in two distinct phases. In May 2002we collected information on prices and characteristics of 268 books that were on sale on both the US andUK websites of Amazon, Inc. This data set includes both textbooks and general audience books and werefer to it as our “broad sample”. In December 2002, we collected additional data on economics textbooks;this is our “econ sample”. In this phase, we broadened our sample by including Canada in the search andcollected more detailed information about each book. • We tested for price differences by running a simple hedonic regression of price on book characteristicsand on dummy variables that aim to capture differences across countries and book types.
Estimates from the board sampledependent variable: ln(p) Notes: Coefficients that are statistically different from zero at 5% and 1% are marked with “*” and “**” respectively.
Estimates from the Economics sample dependent variable: ln(p) Notes: Coefficients that are statistically different from zero at 5% and 1% are marked with “*” and “**” respectively.
Testing for Linearity Key Argument: • If the value of y does not change linearly with the value of x, then using the mean value of y is the best predictor for the actual value of y. This implies is preferable. • If the value of y does change linearly with the value of x, then using the regression model gives a better prediction for the value of y than using the mean of y. This implies y=y* is preferable.
Testing for Linearity • The Global F-test H0: β1 = β2 = … = βk = 0 (no linear relationship) H1: at least one βi≠ 0 (at least one independent variable affects Y) Under the null SSR is either zero or very small!! Test Statistic: F is distributed with k numerator degree of freedom and n-k-1 denominator degree of freedom. Reject H0 if F > Fk,n-k-1,a. [Variation in y] = SSR + SSE. Large F results from a large SSR. Then, much of the variation in y is explained by the regression model. The null hypothesis should be rejected; thus, the model is valid.
F-Test for Overall Significance With 2 and 12 degrees of freedom P-value for the F-Test
H0: β1 = β2 = 0 H1: β1 and β2 not both zero = .05 df1= 2 df2 = 12 F-Test for Overall Significance (continued) Test Statistic: Decision: Conclusion: Critical Value: F = 3.885 Since F test statistic is in the rejection region (p-value < .05), reject H0 = .05 0 F There is evidence that at least one independent variable affects Y Do not reject H0 Reject H0 F.05 = 3.885
Tests on a Subset of Regression Coefficients • Consider a multiple regression model involving variables xj and zj , and the null hypothesis that the z variable coefficients are all zero: yi = b0 + b1 x1i + …+ bk xki + a1 z1i + … + ar zri + ei H0: a1 = a2 = … = ar = 0 H1: at least one of aj≠0 (j=1,…,r) Under the null SSR due to Z is either zero or very small!!
Tests on a Subset of Regression Coefficients • Goal: compare the error sum of squares for the complete model with the error sum of squares for the restricted model • First run a regression for the complete model and obtain SSE • Next run a restricted regression that excludes the z variables (the number of variables excluded is r) and obtain the restricted error sum of squares SSE(r). • Compute the F statistic and apply the decision rule for a significance level Note: SSE/(n-k-1) = Se2
EXAMPLE 1 • A market researcher for Super Dollar Super Markets is studying the yearly amount families of four or more spend on food. Three independent variables are thought to be related to yearly food expenditures (Food). Those variables are: total family income (Income) in $00, size of family (Size), and whether the family has children in college (College).
Example 1 continued Note the following regarding the regression equation. • The variable college is called a dummy or indicator variable. It can take only one of two possible outcomes. That is a child is a college student or not. • Other examples of dummy variables include • gender, • the part is acceptable or unacceptable, • the voter will or will not vote for the incumbent governor. • We usually code one value of the dummy variable as “1” and the other “0.”
EXAMPLE 1 continued • Use a computer software package, such as Excel, to develop a correlation matrix. • From the analysis provided by Excel, write out the regression equation: Y*= 954 +1.09X1 + 748X2 + 565X3 • What food expenditure would you estimate for a family of 4, with no college students, and an income of $50,000 (which is input as 500)?
EXAMPLE 1 continued The regression equation is Food = 954 + 1.09 Income + 748 Size + 565 Student Predictor Coef SE Coef T P Constant 954 1581 0.60 0.563 Income 1.092 3.153 0.35 0.738 Size 748.4 303.0 2.47 0.039 Student 564.5 495.1 1.14 0.287 S = 572.7 R-Sq = 80.4% R-Sq(adj) = 73.1% Analysis of Variance Source DF SS MS F P Regression 3 10762903 3587634 10.94 0.003 Residual Error 8 2623764 327970 Total 11 13386667
EXAMPLE 1 continued From the regression output we note: • The coefficient of determination is 80.4 percent. This means that more than 80 percent of the variation in the amount spent on food is accounted for by the variables income, family size, and student. • Each additional $100 dollars of income per year will increase the amount spent on food by $109 per year. • An additional family member will increase the amount spent per year on food by $748. • A family with a college student will spend $565 more per year on food than those without a college student.
EXAMPLE 1 continued The estimated food expenditure for a family of 4 with a $500 (that is $50,000) income and no college student is $4,491. Y* = 954 + 1.09(500) + 748(4) + 565 (0) = 4491
EXAMPLE 1 continued • Conduct a global test of hypothesis to determine if any of the regression coefficients are not zero. • H0 is rejected ifF>4.07. • From the computer output, the computed value of F is 10.94. • Decision: H0 is rejected. Not all the regression coefficients are zero
EXAMPLE 1 continued • Conduct an individual test to determine which coefficients are not zero. This is the hypotheses for the independent variable family size. • From the computer output, the only significant variable is SIZE (family size) using the p-values. The other variables can be omitted from the model. • Thus, using the 5% level of significance, reject H0 if the p-value<.05