Univariate Linear Regression Problem

Univariate Linear Regression Problem • Model: Y=b0+b1X+e • Test: H0: β1=0. • Alternative: H1: β1>0. • The distribution of Y is normal under both null and alternative. • Under null, var(Y)=σ02. • Under alternative, β1>0, and var(Y)=σ12.

Step 1: Choose the test statistic and specify its null distribution • Use conditions of the null to find:

Bringing sample size into regression design • The sample size n is hidden in the regression results. That is, let:

Step 2: Define the critical value • For the univariate linear regression test:

Step 3: Define the Rejection Rule • Each test is a right sided test, and so the rule is to reject when the test statistic is greater than the critical value.

Step 4: Specify the Distribution of Test Statistic under Alternative • Use conditions of the null to find:

Step 5: Define a Type II Error • For the univariate linear regression test:

Step 6: Find β • For a univariate linear regression test:

Basic Insight • Notice that all three problems have the same basic structure. • That is, if you understand the solution of the one sample test, then you can derive the answer to the other problems.

Step 7: Phrase requirement on β • For example, we seek to “choose n so that β=0.01.” • That is, “choose n so that Pr1{Accept H0}=β=0.01.

Step 7: Phrase requirement on β • For example, we seek to “choose n so that

Step 7: Phrase requirement on β • Notice the parallel phrasing:

Step 7: Phrase requirement on β • That is, “choose n so that (note that E0=0):

Step 7: Phrase requirement on β • That is, choose n so that (after algebraic clearing out):

Step 8: State the conclusion • The result for a left sided test has to be worked through but is similar. You must remember to keep all entries positive. This is reasonable if both α and β are constrained to be less than or equal to 0.5. The restriction is not a hardship in practice.

Univariate Linear Regression • Note that the σ0 factor is changed to σ0/σX. • There is a similar adjustment for the alternative standard deviation.

Example Problem Group • Two hundred values of an independent variable xi are chosen so that Σ(xi-xbar)2 is equal to 400,000. For each setting of xi, the random variable Yi=β0+β1xi+σZi is observed. Here β0 and β1 are fixed but unknown parameters, σ=400, and the Zi are independent standard normal random variables.

Example Problem Group • The null hypothesis to be tested is H0: β1=0, α=0.01, and the alternative is H1: β1<0. The random variable B1 is the OLS estimate of β1.

Example Question 1 • When H0 is true, what is the standard deviation of B1, the OLS estimate of the slope? • Var(B1)=σ2/Σ(xi-xbar)2=4002/400,000=0.4. • sd(B1)=0.632.

Example Question 2 • What is the probability of a Type II error in the test specified in the common section using B1, the OLS estimator of the slope, as test statistic when β1=-4, α=0.01, σ=400, and Σ(xi-xbar)2 is equal to 400,000?

Solution to Question 2 • The critical value is 0-2.326(0.632)=-1.47 • A Type II error occurs when B1>-1.47. • Under alternative B1 is normal with expected value -4 and standard deviation (error) 0.632. • Pr{B1>-1.47}=Pr{Z>(-1.47-(-4))/0.632} =Pr{Z>4.00}=.000032 • The answer is 0.000032.

Example Question 3 • How many observations n are necessary so that the probability of a Type II error in the test specified in the common section when β1=-4, α=0.01, σ=400, and Σ(xi-xbarn)2 is equal to 2,000n?

Outline of Solution to Problem 3 • For σo term, use (4002/2000)0.5=8.94. • Use same value for σ1 term. • Use |z0.01|=2.326. • Use |E1-E0|=|-4-0|=4. • Square root of sample size is 10.39. • Sample size is 109 or more.

Chapter 21: Residual Analysis • If the assumptions in regression are violated: • Residuals are one way of checking model: Ri = Yi - Fitted value at xi

Checking the Assumptions • Check for normality (test of normality, histogram, q-q plots) • Check variance if it is the same for all values of the independent variable (plot residuals against predicted values) • Check independence (plot residuals against sequence variable) • Check for linearity (plot dependent variable against independent variable)

Residual Plots • Plot residuals against independent variable. • Plot should be flat indicating the same variance. • There should be no fanning out pattern. • Check for influential observations. • Plot residuals against predicted variable. • For univariate regression this is the same as the above plot. There should be no pattern.

What to do if problem? • Can look for transformations of either independent or dependent variable or both. • Using computer this is easy: compute option from menu bar.

Influential Points • An easier way to look for points that have a large impact on the slope is to plot the change in slope against an arbitrary case sequence number.

Example • Data set in the web page • aim: predict final exam score from midterm score • dependent variable: final exam score • independent variable: midterm score • model, check assumptions, predict

Output • Model: Y= b0 + b1 X + e • R2 = 0.508 • F statistics=60.91, Significance=0.0 • b1=1.391117, t statistic=7.805, Significance=0.0 • b0=238.95, t statistic=8.329, Significance=0.0

Next Class • Multiple Regression! • Check web site for your data file

Univariate Linear Regression Problem

Univariate Linear Regression Problem

Presentation Transcript

Linear regression

Linear Regression

Linear Regression

Univariate Linear Regression

Linear Regression

Linear Regression

Linear Regression

Linear Regression

Review of Univariate Linear Regression

Linear Regression

Linear Regression

Regression Linear Regression

Linear Regression

LINEAR REGRESSION

Linear Regression

Linear Regression

Linear Regression

Linear Regression

Linear Regression

Linear regression

Linear Regression

Linear Regression