More on regression

More on regression Petter Mostad 2005.10.24

More on indicator variables • If an independent variable is an indicator variable, cases where it is 1 will just have an addition to the constant term • To use different slopes for these cases, additional variables must be added (products of predictors and indicators) • By viewing the constant term as a data column, we can express the models more symmetrically

Several indicator variables • A model with two indicator variables will assume that the effect of one indicator adds to the effect of the other • If this may be unsuitable, use an additional interaction variable (product of indicators) • For categorical variables with m possible values, use m-1 indicators.

Logistic regression • What if the dependent variable is an indicator variable? • The model then has two stages: First, we predict a value zi from predictors as before, then the probability of indicator value 1 is given by • Given data, we can estimate coefficients in a similar way as before

Experimental design • So far, we have considered data as given; to the extent that we can control what data we have, how should we choose to set the independent variables? • Choice of variables • Choice of values for these variables

Choice of variables • Include variables which you believe have a clear influence on the dependent variable, even if the variable is ”uninteresting”: This helps find the true relationship between ”interesting” variables and the dependent. • Avoid including a pair (or a set) of variables whose values are clearly linearily related

Multicollinearity • To discover it, make plots and compute correlations (or make a regression of one parameter on the others) • To deal with it: • Remove unnecessary variables • Define and compute an ”index” • If variables are kept, model could still be used for prediction

Specification bias • Unless two independent variables are uncorrelated, the estimation of one will influence the estimation of the other • Not including one variable which bias the estimation of the other • Thus, one should be humble when interpreting regression results: There are probably always variables one could have added

Choice of values • Should have a good spread: Again, avoid collinearity • Should cover the range for which the model will be used • For categorical variables, one may choose to combine levels in a systematic way.

Generating experimental designs • For n binary variables, there are 2n ways to set them in different combinations. • If 2n is too big, there are systematic ways to choose from these 2n experiments. • If 2n is too small, we can use several experiments at each setting.

Heteroscedasticity – what is it? • In the standard regression model it is assumed that all have the same variance. • If the variance varies with the independent variables or dependent variable, the model is heteroscedastic. • Sometimes, it is clear that data exhibit such properties.

Heteroscedasticity – why does it matter? • Our standard methods for estimation, confidence intervals, and hypothesis testing assume equal variances. • If we go on and use these methods anyway, our answers might be quite wrong!

Heteroscedasticity – how to detect it? • Fit a regression model, and study the residuals • make a plot of them against independent variables • make a plot of them against the predicted values for the dependent variable • Possibility: Test for heteroscedasticity by doing a regression of the squared residuals on the predicted values.

Heteroscedasticity – what to do about it? • Using a transformation of the dependent variable • log-linear models • If the standard deviation of the errors appears to be proportional to the predicted values, a two-stage regression analysis is a possibility

Dependence over time • Sometimes, y1, y2, …, yn are not completely independent observations (given the independent variables). • Lagged values: yi may depend on yi-1 in addition to its independent variables • Autocorrelated errors: Successive observations yi, yi+1,… depend similarily on unobserved variables

Lagged values • In this case, we may run a multiple regression just as before, but including the previous dependent variable yi-1 as a predictor variable for yi.

Autocorrelated errors • In the standard regression model, the errors are independent. • Using standard regression formulas anyway can lead to errors: Typically, the uncertainty in the result is underestimated. • Example: Taking observations closer and closer together in time will not increase your knowledge about regression parameters beyond a certain point

Autocorrelation – how to detect? • Plotting residuals against time! • The Durbin-Watson test compares the possibility of independent errors with a first-order autoregressive model: Option in SPSS Test statistic:

Autocorrelation – what to do? • It is possible to use a two-stage regression procedure: • If a first-order auto-regressive model with parameter is appropriate, the model will have uncorrelated errors • Estimate from the Durbin-Watson statistic, and estimate from the model above

More on regression

More on regression

Presentation Transcript

More regression

More Regression Information

More Linear Regression

More Multiple Regression

Chapter 12: More About Regression

ANCOVA Regression with more than one line

More on linear regression – regression to the mean

More Multiple Regression

Getting More out of Multiple Regression

More Simple Linear Regression

Stat 1301 More on Regression

Ch 12 - More about Regression

Trajectory Regression on Road Networks

More Regression

Regression Linear Regression Regression Trees

Regression on the Calculator

More on regression

More on Regression

More on Multivariate Regression Analysis

Notes on Logistic Regression

CHAPTER 12 More About Regression

CHAPTER 12 More About Regression