420 likes | 790 Views
What is Multiple Regression? . y = b0 b1x1 b2x2.... bkxk.? One dependent variableSeveral independent variables x1,x2..xk. Assumptions. Quantitative dataIndependent observationEach value of IV, the distribution of the DV must be normalThe variance of the distribution of the DV should be co
E N D
1. Multiple Regression
2. What is Multiple Regression? y = b0 + b1x1 + b2x2.... + bkxk.
One dependent variable
Several independent variables x1,x2..xk
3. Assumptions Quantitative data
Independent observation
Each value of IV, the distribution of the DV must be normal
The variance of the distribution of the DV should be constant for all values of the IV (Homoscedasticity)
The relationship between the DV and each IV should be linear (Multicollinearity)
Limited linear correlation among the dependent variables
Residuals of the predicted DV value, should be random
4. Using SWI data set
DV: Trust towards social workers (Q2)
6 IV:
Social workers make people rely on welfare (Q1c)
Social workers bring hopes to those in averse situation (Q1n)
Social workers help the disadvantaged (Q1m)
Age (Q13), Family income (Q17), Sex (Q12)
5. A model (or equation) involves a DV and different combination of the IVs.
With 5 independent variables, there can be models with 0, 1, 2, 3, 4, or 5 IVs. There can be as many as 32 models
9. Method Enter
Include all the independent variables in the model.
Some variables might have little contribution to the model and theyll affect the coefficient of the equation (or the model).
Other methods of inclusion (Forward, Stepwise) include only those with greater contributions (changes in R2) and (usually the case) b (coefficient) significantly differs from 0.
11. Null hypotheses
The ANOVA table above is used to test several equivalent null hypotheses:
there is no linear relationship in the population between the dependent variable and the independent variables,
All of the population partial regression coefficients are 0, and that the population value for multiple R2 is 0.
So the alternative only say at least one partial coefficient is not 0.
12. Null hypothesis:
The population partial regression coefficient for a variable is 0 using the t statistic and its observed significance level.
13. The equation:
Trust of social workers = 3.3 + .759 x Q1n .233 x Q1c + 0.375 x Q1n - 0.145 x Q17
14. Understanding partial correlation Removes from both the given IV and the DV all variance accounted for by the control IVs, then correlates the unique component of the IV with the unique component of the DV.
It is the correlation between each IV and the DV after the variance explained by other IVs (called the controlling variables) was removed.
We will say: the partial correlation of an IV is its correlation with the DV after the influence of other IVs in the model is being controlled.
16. Method Stepwise: This method enters IVs one each time by selecting the one that is significant at 0.05 level (you can change it) and made the most changes in R2 the model.
It does one more thing, when an IV enters into the model, it checks whether any of the existing variables will have its partial correlation reduced and the significant level became 0.1 (this is the default, you can change it by yourself). If so, it get excluded from the model
20. Examining the standardized residual
PRED (the standardized predicted values of the dependent variable based on the model). These values are standardized forms of the values predicted by the model.
ZRESID (the standardized residuals, or errors). These values are the standardized differences between the observed data and the values that the model predicts
22. Multicollinearity The existence of a high degree of linear correlation amongst two or more independent variables in a multiple regression model.
In the presence of multicollinearity, it will be difficult to assess the effect of the independent variables on the dependent variable.
A tolerance of less than 0.20 and/or a VIF of 5 and above indicates a multicollinearity problem (Wiki)
26. Data Transformation Use the dataset (Transformation.sav)
Plot the scatter-plot for Life-satisfaction (Y-axis) against Income (X-axis)
(Graphs/Legacy dialogs/Scatter-dot)
28. After transformation monthly income into LnIncome, we obtain a new scatter-plot.
30. For a curve like this, the best way is to transform the independent variable (dosage of drug) into its inverse (1/x). With the new variable in the X-axis, the plot now looks like this.
32. Dummy variable If you have a categorical (i.e. nominal) variable that you want it to be included in your model as an independent variable, you need to recode it into a number of dummy variables.
For a dichotomous variable, such as gender (male =1, female =0), then gender = 1 is meaning that the case is a male.
You can rename it as male (1=male, 0=not male). Male is called a dummy variable. In a multiple regression equation, we can have something like this:
33. Marital satisfaction (MS) =
a + b1 Year married + b2 Male + b3 Income
Male
MS = a + b1 Year married + b2 (1) + b3 Income
Female
MS = a + b1 Year married + b2 (0) + b3 Income
34. For place of residence (HK, Kowloon and NT)
Suppose you choose HK as a reference group
Two dummy variables: Kowloon and NT
For a person living in Kowloon, enter the value of Kowloon as 1, and NT, 0.
If a person lives in NT, then 0 for Kowloon and 1 for NT.
For a person lives in HK, then both Kowloon and NT will be 0.
35. MS = a + b1 Year married + b2 Male + b3 Income + b4 Kowloon + b5 NT.
37. http://www.stattucino.com/berrie/dsl/regression/regression.html
38. Null hypothesis for the ANOVA test in regression:
There is no linear relationship in the population between the dependent variable and the independent variables
Alternative:
39. Null hypothesis for the t-test of regression coefficient (b)
The slope of the regression line fitting two variables in the population is equal to zero
OR
For the regression equation: y = a + bx
H0: b = 0Ha: b ? 0