1 / 27

Regression analysis and multiple regression: Here’s the beef*

Regression analysis and multiple regression: Here’s the beef*. *Graphic kindly provided by Microsoft. Generally, regression analysis is used with interval and ratio data Regression analysis is a method of determining the specific function relating y to x ===> Y= f (X)

Download Presentation

Regression analysis and multiple regression: Here’s the beef*

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Regression analysis and multiple regression: Here’s the beef* *Graphic kindly provided by Microsoft.

  2. Generally, regression analysis is used with interval and ratio data • Regression analysis is a method of determining the specific function relating y to x ===> Y=f(X) • Not really cause and effect(s) but… how the independent variables combine to help predict the dependent variable • Widely used in the social sciences • Provides a value called R2 (R-squared) which tells how well a set of variables explains a dependent variable

  3. To explain means to reduce errors when predicting the dependent variables scores on the basis of information about the independent variables • The regression results measure the direction and size of the effect of each variable on the dependent variable • The form of the regression line is: Y=a+bX, where Y is the dependent variable, a is the intercept, b is the slope, and X is the independent variable

  4. Regression analysis • Examples: If we know Tara’s IQ, what can we say about her prospects of satisfactorily completing a university degree? Knowing Nancy’s prior voting record, can we make any informed guesses concerning her vote in the coming provincial election? Kendra is enrolled in a statistics course. If we know her score on the midterm exam, can we make a reasonably good estimate of her grade on the final?

  5. forms of regression analysis, depending on the complexity of the relationships being studied. • The simplest is known as linear regression. Assumes a perfect linear association between two variables. The straight line connecting points together is called the regression line.

  6. The regression line, rarely, cuts through all points in a distribution (e.g., picture a scatterplot). As such, we can draw an approximate line showing the best possible linear representation of the several points. • Recall geometry: a straight line on a graph can be represented by the equation:

  7. To simplify our discussion, let’s start with an example of two variables that are usually perfectly related: monthly salary and yearly income. ===> Y=12X • Let’s add one more factor to this linear relationship. Suppose that we received a Christmas bonus of $500 ===> Y=500+12X • In the above income example, the slope of the line is 12, which means that Y changes by 12 for each change of one unit in X.

  8. Example of linear regression • If we are interested in exploring the relationship between SEI and EDUC using linear regression we would do the following • First, assign SEI as our dependent variable and EDUC as our independent variable • Run SPSS using Analyze-->Regression--> Linear • Interpret the output -- look only at R2 and the unstandardized coefficients and their associated levels of significance

  9. Taking the unstandardized B (beta) coefficients for the constant and the variable EDUC gives us the following regression equation: SEI = -4.321 + (EDUC*3.917) • For example, the predicted SEI for someone with 18 years of education is: SEI = -4.321 + (18*3.917) = 66.2

  10. Let’s look at Pearson’s r for SEI and EDUC

  11. The scatterplot shows the following relationship

  12. Multiple regression example • If we believe that variables other than EDUC influenced SEI we could bring them in to the model using stepwise multiple regression. • Let’s consider the influence of EDUC, AGE, and SEX. • Now remember… we can only use interval/ratio variables in regression, and SEX is nominal. • To get around this we need to use dummy variable re-coding for SEX.

  13. Since SEX is coded 1=male and 2=female in the GSS, and we believe a priori that being male confers status advantages, we will code for “maleness.” • We want to recode so that male=1 and female=0. This allows is to assume that male=100% male and female=0% male. • Use Transform-->Recode-->Into different variables to create a new variable called SEX2 • Run the regression by Analyze-->Regression -->Linear (make sure that method=stepwise)

  14. Model 1 (EDUC) SEI = -4.313 + (EDUC*3.919) • Model 2 (EDUC and AGE) SEI = -11.140 + (EDUC*4.017) + (AGE*.123) • Model 3 (EDUC, AGE, and SEX2) SEI = 11.826 + (EDUC*4.000) + (AGE*.124) + (SEX2*1.819)

  15. Example from Model 3 • What is the predicted SEI score for a 40 year old woman with 13 years of education? SEI = -11.826 + (13*4.000) + (40*.124) + (0*1.819) SEI = 45.13 • What is the predicted SEI score for a 25 year old man with 18 years of education? SEI = -11.826 + (18*4.000) + (25*.124) + (1*1.819) SEI = 65.09

  16. Multiple regression • Viewed as a plane rather than a line.

  17. There are several assumptions associated with using a multiple regression model: • linearity • equal variance: variation around the regression line is constant (known as homoscedastic) • normality: errors are normally distributed • independence: different errors are sampled independently. Multicollinearity occurs when independent variables are highly correlated (usually over .80).

  18. Dummy regression analysis • Multiple regression accommodates several quantitative independent variables, but frequently independent variables of interest are qualitative. Dummy variable regressors permit the effects of qualitative independent variables to be incorporated into a regression equation. • Suppose that, along with a quantitative independent variable X there is a two-category (dichotomous) independent variable thought to influence the dependent variable Y. • For example, if Y is income, X may be years of education and the qualitative independent variable may be gender.

  19. Dummy variable coding a polytomous independent variable • When a qualitative independent variable has several categories (polytomous), its effects can be captured by coding a set of dummy regressor. • A variable with m categories gives rise to m-1 dummy variables. • For example, to add region effects to a regression in which income is the dependent variable and education and labour-force experience are quantitative independent variables:

  20. Dummy regressors Region D1 D2 D3 D4 East 1 0 0 0 Quebec 0 1 0 0 Ontario 0 0 1 0 Prarie 0 0 0 1 B.C.* 0 0 0 0 * arbitrary reference or baseline category Thus the model represents 5 parallel regression planes, one for each region.

  21. Diagnosing and correcting problems in regression • Collinearity: When there is a perfect linear relationship among the independent variables in a regression, the least-squares regression coefficients are not uniquely defined. • Strong, but less than perfect, collinearity (sometimes called multicollinearity) doesn’t prevent the least-squares coefficients from being calculated, but makes them unstable: coefficient standard errors are big; small changes in the data (due even to rounding errors) can cause large changes in the regression coefficients.

  22. The variance inflation factor (VIF) measures the extent to which collinearity affects sampling variance. • The VIF is at a minimum (1) when R2=0 and at a maximum (infinity) when R2=1 • Caveat: The VIF is not very useful when an effect is spread over several degrees of freedom. • Collinearity is a data problem; it does not imply that model is wrong, only that the data are incapable of providing good estimates of model parameters. • If, for example, X1 and X2 are perfectly correlated in a set of data, it’s impossible to separate their effects.

  23. There are, however, several strategies for coping with collinear data • Give up: This is an honest, if unsatisfying answer. • Collect new data • Reconsider the model: perhaps X1 and X2 are better conceived as alternative measures of the same construct, in which case their high correlation is indicative of high reliability. Get rid of one of them or combine them in some manner (index)

More Related