Basics of Regression Analysis

Basics of Regression Analysis

Determination of three performance measures • Estimation of the effect of each factor • Explanation of the variability • Forecasting Error

Two Predictor Variables Population Regression Model: Y = b0 + b1X1 + b2X2 + e , e following N(0, s) Unknown parameters: b0, b1, b2; s

From Data to Estimates of Coefficients Principle: Least Squares Normal Equation Systems Estimates of Coefficients Mathematics Computing Algorithm

Least Squares Method Simple Regression Multiple Regression

Matrix Computation for b • Normal Equation System: (XTX) b = XTY • See Text Appendix D.3 • Solution for b: b = (XTX)-1 (XTY)

Standardized Regression Coefficients, • Definition • b0 = 0 • the beta coefficient • Used to show relative weights of predictors. for k = 1, 2

2 SSE = Y - Y i i SSE MSE = se 2 = (n-3) Estimation of s, se - Standard Deviation of Disturbancee • Forecasting Equation • SS of Residuals • Mean SS n S i=1

Standard Error of Coefficients • The variance matrix of b (K+1 x 1)is

The Variability Explained • First, determine the base variability for explanation by the regression Unconditional mean model: Y =my + e e followsN(0, sy) LS fit of the model: Pred_Y = Y SS of Residuals: MSS (DF=n-1):

The Variability Explained – cont. • Second, by subtraction of the variability for still left. • In SS: • In Variance :

Creating ANOVA Table

Test of Significance • F test of significance • T- Test of significance • Two sided alternative • One sided alternative

F - Test of Significance of the variability explained by the regression H0: b1= b2 = 0 Ha: At least one coefficient is not 0 P-Value of F-stat = P{F(2, n-3)> F-stat}

t-Test of Significanceof significance of a variable, X1 - two sided H0: b1 = 0Ha: b1 = 0 P-Value of t-stat = P{ t( n-3)> |t-stat|}

One Sided Test of Significanceof significance of a variable, X1 H0: b1 = 0Ha: b1 > 0 (using the prior knowledge) p-Value of t-stat = P{ t( n-3)> t-stat}

Forecasting • Point forecasting • Sources of forecasting error • Interval forecasting

Forecasting at xm Data of X for regression Value of X for prediction

Sources of Forecasting Error • Data: Y|xm = b0+ b1 x1m + b2 x2m + em • Forecast: • Forecast Error:

Computing Standard Errors

Forecasting Performance Analysis • R2_pred = 1 – Press / SST Press = SS of {yi – yi(i)} (deleted residual) • Sample splitting • Analysis sample (n1) • Validation sample (n2)

Generalization to K Independent Variables • Use n – K – 1 for n – 3 for DF for t. • Use K for the numerator DF and n-K-1 for the denominator DF for F.

Diagnostics • Assumptions for Disturbance • Multi-collinearity • Outliers and Influential Observations

Problematic Data Conditions • Regression Coefficients Are Sensitive to: • Highly Collinear Independent Variables • Contamination By Outliers and Influential Observations

DetectingOutliers and Influential Data • Outliers • Leverage (X-space) distance from the mean • Tresid (Y-space) forecasting error • Influential Data • Idea: with / without comparison • Cook’D • Dfbetas • Dfits

Modeling Techniques • Transformation of Variables • Log • Others • Using Dummy Variables • Symbolic representation • Dummy variables for qualitative variables • Using Scores for Ordinal Variables • Selection of Independent Variables • Forecasting • Computer intensive • Analysis of correlation structure of independent variables

Dummy Variables • DK= “If (X=k,1,0)” • Can be used nominal and also ordinal variables • # of DK = c-1 where c is the number of categories.

Using Scores for Ordinal Variable • Scoring Systems • 1, 2, 3, …c • -2, -1, 0, 1, 2 c:odd

Implications of Variable Selection

Selection of Variables - 1 • Backward elimination • Stepwise (forward) inclusion T-test All X’s Final Regression Best simple Best Two variables Best …. variables Max Increase in R2 Max Increase in R2

Selection of Variables - 2 • All Possible Regression K simple K (K-1) two variable K independent variables Final Regression 1 K variable

Selection Criteria • R2___________________________ • Adj. R2 ______________________ • R2PRED ______________________ • Se __________________________ • Cp___________________________

Cp(= # of coefficients) Select a combination with Cp close to p

What to Look for in Good Regression? • Remember the three functions of regression • Estimation of the effect of each X • Explaining the variability of Y • Forecasting • Populations regressions are assumptions • Needs testing • Data might be contaminated

ExtensionsFor Other Variable Types of Y

Types of Variable Continuous Quantitative Discrete (counting) Variable Ordinal Qualitative Nominal

Generalized Linear Models (GLM) • Regression model: Y = b0 + b1X1 + b2X2 + e , e following N(0, s) • GLM Formulation: • Model for Y: Y is N(m, s) • Model for predictors (Link Function): m = b0 + b1X1 + b2X

Forecasting Counting Data • Model for Y: Poisson Distribution (m) • Link Function:

Basics of Regression Analysis

Basics of Regression Analysis

Presentation Transcript

Illustration of Regression Analysis

Regression Analysis

Regression Analysis Simple Regression

Regression Basics

Regression Analysis

Regression Analysis

Basics of regression analysis

Regression Analysis

The Basics of Regression

Regression analysis

Regression Analysis

Application of regression analysis

Regression Analysis

Purpose of Regression Analysis

The Basics of Regression

Pre-regression Basics

Basics of regression analysis I

Regression Basics

Overview of Regression Analysis

Regression Analysis Simple Regression

Regression analysis

Pre-regression Basics