Multiple Regression Analysis: Inference

Multiple Regression Analysis: Inference

Assumptions of the Classical Linear Model (CLM) Given the Gauss-Markov assumptions, OLS is BLUE. Beyond the Gauss-Markov assumptions, we need another assumption to conduct tests of hypotheses (inference). Assume that u is independent of x1, x2, …, xk and u is normally distributed with zero mean and variance σ²: u ~ N(0, σ²).

CLM Assumptions (continued . . .) Under CLM, OLS is BLUE; OLS is the minimum variance unbiased estimator. y|x ~ N(ß0 + ß1x1 +…+ ßkxk, σ²)

Normal Sampling Distributions Under the CLM assumptions, conditional on the sample values of the explanatory variables so that is distributed normally because it is a linear combination of the right-hand side variables.

The t Test Under the CLM assumptions, the expression follows a t distribution (versus a standard normal distribution), because we have to estimate σ² by . Note the degrees of freedom: n – k – 1 .

t Distribution

The t Test • Knowing the sampling distribution allows us to carry out hypothesis tests. • Start with this null hypothesis. • Example: H0: ßj = 0 • If we accept the null hypothesis, then we conclude that xj has no effect on y, controlling for other x’s.

Steps of the t Test • Form the relevant hypothesis. • - one-sided hypothesis • - two-sided hypothesis • 2. Calculate the t statistic. • 3. Find the critical value, c. • - Given a significance level, α, we look up the corresponding percentile in a t distribution with n – k – 1 degrees of freedom and call it c, the critical value. • 4. Apply rejection rule to determine whether or not to accept the null hypothesis.

Types of Hypotheses and Significance Levels Hypothesis: null vs. alternative - one-sided H0: ßj = 0 and H1: ßj < 0 or H1: ßj >0 - two-sided H0: ßj = 0 and H1: ßj 0 Significance level (α) - If we want to have only a 5% probability of rejecting Ho, if it really is true, then we say our significance level is 5%. - α values are generally 0.01, 0.05, or 0.10 - α values are dictated by sample size

Critical Value c What do you need to find c? 1. t-distribution table (Appendix Table B.3, p. 723 Hirschey 2. Significance level 3. Degrees of freedom - n – k – 1, where n is the # of observations, k is the # of RHS variables, and 1 is for the constant.

One-Sided Alternatives yi = ß0 + ß1x1i+ … + ßkxki+ ui H0: ßj = 0 H1: ßj> 0 Fail to reject reject (1 - a) a c 0 Critical value c: the (1 – α)th percentile in a t-dist with n – k – 1 DF. t-statistic: Results: Reject H0 if t-statistic > c; fail to reject Ho if t-statistic < c

One-Sided Alternatives yi = ß0 + ß1x1i+ … + ßkxki+ ui H0: ßj = 0 H1: ßj < 0 Fail to reject reject (1 - α) α -c 0 Critical value c: the (1 – α)th percentile in a t-dist with n – k – 1 DF. t-statistic: Results: Reject Ho if t-statistic < -c; fail to reject Ho if t-statistic > -c

Two-Sided Alternative yi = ß0 + ß1X1i+ … + ßkXki+ ui H0: ßj = 0 H1: fail to reject reject reject (1 - α) α/2 α /2 -c c 0 Critical value: the (1 – α/2)th percentile in a t-dist with n – k – 1 DF. t-statistic: Results: Reject H0 if |t-statistic|> c; fail to reject H0 if |t-statistic|< c

Summary for H0: ßi = 0 • unless otherwise stated, the alternative is assumed to be two-sided. • if we reject the null hypothesis, we typically say “xj is statistically significant at the α% level.” • if we fail to reject the null hypothesis, we typically say “xj is statistically insignificant at the α% level.”

Testing Other Hypotheses • A more general form of the t-statistic recognizes that we may want to test H0: ßj = aj • In this case, the appropriate t-statistic is where aj = 0 for the conventional t-test

t-Test: Example • Tile Example • Q = 17.513 – 0.296P + 0.0661 + 0.036A • (-0.35) (-2.91) (2.56) (4.61) • t-statistics are in parentheses • Questions: • (a) How do we calculate the standard errors? • (b) Which coefficients are statistically different from zero?

Confidence Intervals Another way to use classical statistical testing is to construct a confidence interval using the same critical value as was used for a two-sided test. A (1 – α)% confidence interval is defined as where c is the percentile in a distribution.

Confidence Interval (continued . . .)

Computing p-values for t Tests An alternative to the classical approach is to ask, “what is the smallest significance level at which the null hypothesis would be rejected?” Compute the t-statistic, and then obtain the probability of getting a larger value than this calculated value. The p-value is this probability.

Example: Regression Relation Between Units Sold and Personal Selling expenditures for Electronic Data Processing (EDP), Inc. • Units sold = -1292.3 + 0.09289 PSE • (396.5) + (0.01097) • What are the associated t-statistics for the intercept and slope parameter estimates? • t-stat for = - 3.26 p-value 0.009 • t-stat for = 8.47 p-value 0.000 • If p-value < α, then reject H0: ßi = 0 • If p-value > α, then fail to reject H0: ßi = 0 • (c) What conclusion about the statistical significance of the estimated parameters do you reach, given these p-values?

Testing a Linear Combination of Parameter Estimates Let’s suppose that, instead of testing whether ß1 is equal to a constant, you want to test to see if it is equal to another parameter, that is H0: ß1 = ß2. Use the same basic procedure for forming a t-statistic.

Note:

Overall Significance H0: ß1 = ß2 = … = ßk = 0 Use of F-statistic

F Distribution with 4 and 30 degrees of freedom (for a regression model with four X variables based on 35 observations).

The F Statistic Reject H0 at a significance level if F > c fail to reject Appendix Tables B.2, pp.720-722. Hirschey reject a (1 - a) 0 c F

Example: UNITSt = -117.513 – 0.296Pt + 0.036ADt + 0.006PSEt (-0.35) (-2.91) (2.56) (4.61) Pt = Price ADt = Advertising PSEt = Selling Expenses UNITSt = # of units Sold s standard error of the regression is 123.9 R² = 0.97 n = 32 = 0.958 • Calculate the F-statistic. • What are the degrees-of-freedom associated with the F- statistic? • What is the cutoff value of this F-statistic when α = 0.05? When α = 0.01?

General Linear Restrictions The basic form of the F-statistic will work for any set of linear restrictions. First estimate the unrestricted (UR) model and then estimate the restricted (R) model. In each case, make note of the SSE.

Test of General Linear Restrictions - This F-statistic is measuring the relative increase in SSE, when moving from the unrestricted (UR) model to the restricted (R) model. - q = number of restrictions

Example: Unrestricted Model Restricted Model (under H0); note q = 1

F-Statistic Summary • Just as with t-statistics, p-values can be calculated by looking up the percentile in the appropriate F distribution. • If q = 1, then F = t², and the p-values will be the same.

Summary: Inferences • t-Test • (a) one-sided vs. two-sided hypotheses • (b) tests associated with a constant value • (c) tests associated with linear combinations of parameters • (d) p-values of t-tests • Confidence intervals for estimated coefficients • F-test • p-values of F-tests

Structure of Applied Research

Multiple Regression Analysis: Inference

Multiple Regression Analysis: Inference

Presentation Transcript

Inference for proportions - Inference for a single proportion

Regression in geoDA

Logistic Regression – Simultaneous Entry of Variables

Multiple Regression

Statistical Inference and Regression Analysis: GB.3302.30

Stepwise Binary Logistic Regression

405 ECONOMETRICS Chapter # 8: MULTIPLE REGRESSION ANALYSIS: THE PROBLEM OF ESTIMATION Domodar N. Gujarati

Inference in First Order Logic

Chapter 11

Meeting in a Box

Regression, correlation and liquid association in complex genomic data analysis

PM 515 Behavioral Epidemiology Generalized Linear Regression Analysis

Chapter 6 Further Inference in the Multiple Regression Model

Statistical Inference and Regression Analysis: GB.3302.30

Analysis of Multiple Experiments TIGR Multiple Experiment Viewer (MeV)

What statistical analysis should I use?

Inference in Bayesian Networks

Chapter 12 Multiple Regression

Chapter 12: Multiple Regression and Model Building

Inference in First-Order Logic

Applied Econometrics Second edition