350 likes | 366 Views
Univariate Linear Regression Problem. Model: Y= b 0 + b 1 X+ e Test: H 0 : β 1 =0. Alternative: H 1 : β 1 >0. The distribution of Y is normal under both null and alternative. Under null, var(Y)=σ 0 2 . Under alternative, β 1 >0, and var(Y)=σ 1 2.
E N D
Univariate Linear Regression Problem • Model: Y=b0+b1X+e • Test: H0: β1=0. • Alternative: H1: β1>0. • The distribution of Y is normal under both null and alternative. • Under null, var(Y)=σ02. • Under alternative, β1>0, and var(Y)=σ12.
Step 1: Choose the test statistic and specify its null distribution • Use conditions of the null to find:
Bringing sample size into regression design • The sample size n is hidden in the regression results. That is, let:
Step 2: Define the critical value • For the univariate linear regression test:
Step 3: Define the Rejection Rule • Each test is a right sided test, and so the rule is to reject when the test statistic is greater than the critical value.
Step 4: Specify the Distribution of Test Statistic under Alternative • Use conditions of the null to find:
Step 5: Define a Type II Error • For the univariate linear regression test:
Step 6: Find β • For a univariate linear regression test:
Basic Insight • Notice that all three problems have the same basic structure. • That is, if you understand the solution of the one sample test, then you can derive the answer to the other problems.
Step 7: Phrase requirement on β • For example, we seek to “choose n so that β=0.01.” • That is, “choose n so that Pr1{Accept H0}=β=0.01.
Step 7: Phrase requirement on β • For example, we seek to “choose n so that
Step 7: Phrase requirement on β • Notice the parallel phrasing:
Step 7: Phrase requirement on β • That is, “choose n so that (note that E0=0):
Step 7: Phrase requirement on β • That is, choose n so that (after algebraic clearing out):
Step 8: State the conclusion • The result for a left sided test has to be worked through but is similar. You must remember to keep all entries positive. This is reasonable if both α and β are constrained to be less than or equal to 0.5. The restriction is not a hardship in practice.
Univariate Linear Regression • Note that the σ0 factor is changed to σ0/σX. • There is a similar adjustment for the alternative standard deviation.
Example Problem Group • Two hundred values of an independent variable xi are chosen so that Σ(xi-xbar)2 is equal to 400,000. For each setting of xi, the random variable Yi=β0+β1xi+σZi is observed. Here β0 and β1 are fixed but unknown parameters, σ=400, and the Zi are independent standard normal random variables.
Example Problem Group • The null hypothesis to be tested is H0: β1=0, α=0.01, and the alternative is H1: β1<0. The random variable B1 is the OLS estimate of β1.
Example Question 1 • When H0 is true, what is the standard deviation of B1, the OLS estimate of the slope? • Var(B1)=σ2/Σ(xi-xbar)2=4002/400,000=0.4. • sd(B1)=0.632.
Example Question 2 • What is the probability of a Type II error in the test specified in the common section using B1, the OLS estimator of the slope, as test statistic when β1=-4, α=0.01, σ=400, and Σ(xi-xbar)2 is equal to 400,000?
Solution to Question 2 • The critical value is 0-2.326(0.632)=-1.47 • A Type II error occurs when B1>-1.47. • Under alternative B1 is normal with expected value -4 and standard deviation (error) 0.632. • Pr{B1>-1.47}=Pr{Z>(-1.47-(-4))/0.632} =Pr{Z>4.00}=.000032 • The answer is 0.000032.
Example Question 3 • How many observations n are necessary so that the probability of a Type II error in the test specified in the common section when β1=-4, α=0.01, σ=400, and Σ(xi-xbarn)2 is equal to 2,000n?
Outline of Solution to Problem 3 • For σo term, use (4002/2000)0.5=8.94. • Use same value for σ1 term. • Use |z0.01|=2.326. • Use |E1-E0|=|-4-0|=4. • Square root of sample size is 10.39. • Sample size is 109 or more.
Chapter 21: Residual Analysis • If the assumptions in regression are violated: • Residuals are one way of checking model: Ri = Yi - Fitted value at xi
Checking the Assumptions • Check for normality (test of normality, histogram, q-q plots) • Check variance if it is the same for all values of the independent variable (plot residuals against predicted values) • Check independence (plot residuals against sequence variable) • Check for linearity (plot dependent variable against independent variable)
Residual Plots • Plot residuals against independent variable. • Plot should be flat indicating the same variance. • There should be no fanning out pattern. • Check for influential observations. • Plot residuals against predicted variable. • For univariate regression this is the same as the above plot. There should be no pattern.
What to do if problem? • Can look for transformations of either independent or dependent variable or both. • Using computer this is easy: compute option from menu bar.
Influential Points • An easier way to look for points that have a large impact on the slope is to plot the change in slope against an arbitrary case sequence number.
Example • Data set in the web page • aim: predict final exam score from midterm score • dependent variable: final exam score • independent variable: midterm score • model, check assumptions, predict
Output • Model: Y= b0 + b1 X + e • R2 = 0.508 • F statistics=60.91, Significance=0.0 • b1=1.391117, t statistic=7.805, Significance=0.0 • b0=238.95, t statistic=8.329, Significance=0.0
Next Class • Multiple Regression! • Check web site for your data file