370 likes | 383 Views
QUIZ 3 ON TUESDAY, Dec 5 COVERS EVERYTHING FROM QUIZ 2 TO TODAY’S CLASS MULTIPLE CHOICE SHORT ANSWER. Elements of Regression. Total Sum of Squares (SS T ) = Deviation of each score from the DV mean , square these deviations, then sum them.
E N D
QUIZ 3 ON TUESDAY, Dec 5 COVERS EVERYTHING FROM QUIZ 2 TO TODAY’S CLASS MULTIPLE CHOICE SHORT ANSWER
Elements of Regression Total Sum of Squares (SST) = Deviation of each score from the DV mean, square these deviations, then sum them. Residual Sum of Squares (SSR) = Deviation of each score from the regression line, squared, then sum all these squared residuals. Model Sum of Squares (SSM) = SST – SSR= The amount that the regression slope explains outcome above and beyond the simple mean. R2 = SSM / SST= Proportion of model, (i.e. proportion of variance) explained, by the predictor(s). Measures how much of the DV is predicted by the IV (or IVs). R2 = (SST – SSR) / SST NOTE: What happens to R2 when SSR is smaller? It gets bigger
Assessing Overall Model: The Simple Regression F Test In ANOVA F = Treatment / Error, = MSB / MSW In Regression F= Model / Residuals, = MSM / MSR AKA slope line / random error around slope line MSM= SSM / df (model) MSR= SSR / df (residual) df (model) = number of predictors (bs, not counting intercept) df (residual) = number of observations (i.e., subjects) – estimates (i.e. all bs and intercept). If N = 20, then df = 20 – 2 = 18 F in Regression measures whether overall model does better than chance at predicting outcome.
F Statistic in Regression SSM df = No. predictors (reprimands) = 1 SSR df = subjects – (coefficients) = 20 – (intercept, reprimands) = 18 Regression F “Regression” = model SSM SSR MSR MSM
Assessing Individual Predictors Is the predictor slope significant, i.e. does IV predict outcome? b1 = slope of sole predictor in simple regression. If b1 = 0 then change in predictor has zero influence on outcome. If b1 > 0, then it has some influence. How much greater than 0 must b1 be in order to have significant influence? t stat tests significance of b1 slope. b1 observed – b1 expected (null effect b1; i.e., b1 = 0) t = SEb1 b1 observed t df = n – 1 – predictors (bs) = n - 2 t = SEb1 Note: Predictors = bs
t Statistic in Regression predictor t sig. of t B = slope; Std. Error = Std. Error of slope t = B / Std. Error Beta = Standardized B. Shows how many SDs outcome changes per each SD change in predictor. Beta allows comparison between predictors, of predictor strength.
Interpreting Simple Regression Overall F Test: Our model of reprimand having an effect on aggression is confirmed. t Test: Reprimands lead to more aggression. In fact, for every 1 reprimand there is a .61 aggressive act, or roughly 1 aggressive act for every 2 reprimands.
Key Indices of Regression R = Degree to which entire model correlates with outcome R2 = Proportion of variance model explains F = How well model exceeds mean in predicting outcome b = The influence of an individual predictor at influencing outcome. beta = b transformed into standardized units t of b = Significance of b (b / std. error of b)
Multiple Regression Class 22
How Much Do Teacher Reprimands Lead to Bullying, Controlling for Family Stress?
Multiple Regression (MR) Y = bo + b1 + b2 + b3 + ……bx + ε Multiple regression (MR) can incorporate any number of predictors in model. “Regression plane” with 2 predictors, after that it becomes increasingly difficult to visualize result. MR operates on same principles as simple regression. MR = correlation between observed Y and Y as predicted by total model (i.e., all predictors at once).
Two Variables Produce "Regression Plane" Aggression Reprimands Family Stress
Elements of Multiple Regression Total Sum of Squares (SST) = Deviation of each score from DV mean, square these deviations, then sum them. Residual Sum of Squares (SSR) = Each residual from total model (not a simple line), squared, then summed. Model Sum of Squares (SSM) = SST – SSR = The amount that the total model explains result above and beyond the simple mean. R2 = SSM / SST= Proportion of variance explained, by the total model. Adjusted R2 = R2, but adjusted to having multiple predictors NOTE: Main diff. between these values in mutli. regression and simple regression is use of total model rather than single slope. Math much more complicated, but conceptually the same.
Multiple Regression Example Is aggression by bullies predicted by teacher reprimands controlling for family stress? • This is model with 2 predictors (reprimands, fam. stress). • This Multiple Regression model shows: • Effect of total model • (reprimands, fam. stress) • Effect of fam. stress • Effect of reprimands after accounting for stress. • NOTE: Could also test: • Effect of fam stress controlling for reprimands (switch IV order, above) • Effect of (family stress +Effect of reprimands). • DEPENDS ON METHOD SELECTED Y = bo + b1 + b2 + ε Y = __ Aggression bo = __ Intercept family stress b1 = __ reprimands b2 = __ ε = __ error
Methods of Regression Forced Entry: All predictors forced into model simultaneously. Hierarchical: 1. Predictors selected based on theory or past work 2. Predictors entered into analysis in order of importance, or by established influence. 3. New predictors are entered last, so that their unique contribution can be determined. Stepwise: Program automatically searches for strongest predictor, then second strongest, etc. Predictor 1—is best at explaining entire model, accounts for say 40% . Predictor 2 is best at explaining remaining 60%, etc. Controversial method. In general, Hierarchical is most common and most accepted. Avoid “kitchen sink” Limit number of predictors to few as possible, and to those that make theoretical sense.
Sample Size in Regression Simple rule: The more the better! Field's Rule of Thumb: 15 cases per predictor. Green’s Rule of Thumb: Overall Model: 50 + 8k (k = #predictors) Specific IV: 104 + k Unsure which? Use the one requiring larger n
Multiple Regression in SPSS REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA CHANGE /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT aggression /METHOD=ENTER family.stress /METHOD=ENTER reprimands. “OUTS” refers to variables excluded in, e.g. Model 1 “CRITERIA” relates to Stepwise Regression only; refers to which IVs kept in at Step 1, Step 2, etc. “NOORIGIN” means “do show the constant in outcome report”..
SPSS Multiple Regression Output: Descriptives What are IVs? What is DV? stress, reprimands aggression
SPSS Regression Output: Model Effects correlation of model to outcome Power of regress. model; how much total model correlates with DV R = Amount var. explained by model R2 = Adj. R2 = Adjusts for # predictors. Always ≤ R2 R2 change = Amount explained by each new model Sig. F Change = Does new model explain signif. amount of added variance ANOVA sig. = Significance of TOTAL model
SPSS Regression Output: Predictor Effects Constant refers to what? Intercept; Value of DV when model = 0 Slope; influence of specific IV on DV B refers to what? Variance around the specific IV slope Std. Error refers to what? Beta refers to what? Standardization of B t refers to what? B / Std. Error Sig. refers to what? Significance of effect of IV on DV, sig. of slope
Reporting Hierarchical Multiple Regression Table 1: Effects of Family Stress and Teacher Reprimands on Bullying B SE B β Step 1 Constant -0.54 0.42 Fam. Stress 0.74 0.11 .85 ** Step 2 Constant 0.71 0.34 Fam. Stress 0.57 0.10 .67 ** Reprimands 0.33 0.10 .38 ** Note: R2 = .72 for Step 1, Δ R2 = .11 for Step 2 (p = .004); ** p < .01
Requirements and Assumptions (These apply to Both Simple and Multiple Regression) Variable Types: Predictors must be quantitative or categorical (2 values only, i.e. dichotomous); Outcomes must be interval. Non-Zero Variance: Predictors have variation in value. No Perfect multicollinearity: No perfect 1:1 (linear) relationship between 2 or more predictors (more on this in a minute). Predictors uncorrelated to external variables: No hidden “third variable” confounds. MODEL 1: daydreamingtest scoresanxiety attention skills
Requirements and Assumptions (These apply to Both Simple and Multiple Regression) Linearity: The changes in outcome due to each predictor are described best by a straight line.
Requirements and Assumptions (Continued) Independent Errors: Residuals for Sub. 1 ≠ residuals for Sub. 2. For example Sub. 2 sees Sub 1 screaming as Sub 1 leaves experiment. Sub 1 might influence Sub 2. If each new sub is affected by preceding sub, then this influence will reduce independence of errors, i.e., create autocorrelation. Autocorrelation is bias due to temporaladjacency. Assess: Durbin-Watson test. Values range from 0 - 4, "2" is ideal. Closer to 0 means neg. correl, closer to 4 = pos. correl. Sub 1 Funny movie Sub 2 Funny movie Sub 3 Sad movie Sub 4 Sad movie Sub 5 Funny movie Sub 6 Funny movie r (s1 s2) + r (s2 s3) + r (s3 s4) - r (s4 s5) - r (s5 s6) +
Durbin-Watson Test of Autocorrelation DATASET ACTIVATE DataSet1. REGRESSION /DESCRIPTIVES MEAN STDDEV CORR SIG N /MISSING LISTWISE /STATISTICS COEFF OUTS R ANOVA CHANGE /CRITERIA=PIN(.05) POUT(.10) /NOORIGIN /DEPENDENT crytotl /METHOD=ENTER age upset /RESIDUALS DURBIN.
Regression Assumes Errors are normally, independently, and identically Distributed at Every Level of the Predictor (X) X3 X1 X2 Independence of DV: All outcome values are independent from one another, i.e., each response comes from a separate subject who is uninfluenced by other subjects. e.g., Joe and Joelle are a competitive dyad; Joe looses every time Joelle succeeds. DV is enjoyment playing games. Joe’s enjoyment is not independent of Joelle’s.
Multicollinearity In multiple regression, statistic assumes that each new predictor is in fact a unique measure. If two predictors, A and B, are very highly correlated, then a model testing the added effect of Predictors A and B might, in effect, be testing Predictor A twice. If so, the slopes of each variable are not orthogonal (go in different directions, but instead run parallel to each other (i.e., they are co-linear). Non-orthogonal Orthogonal
Mac Collinearity: A Multicollinearity Saga • Suffering negative publicity regarding the health risks of fast food, the fast food industry hires the research firm of Fryes, Berger, and Shayque(FBS) to show that there is no intrinsic harm in fast food. • FBS surveys a random sample, and asks: • To what degree are you a meat eater? (carnivore) • How often do you purchase fast food? (fast.food) • What is your health status? (health) • FBS conducts a multiple regression, entering fast.food in step one and carnivore in step 2.
FBS Fast Food and Carnivore Analysis “See! See!” the FBS researchers rejoice, “Fast Food negatively predicts health in Model 1, BUT the effect of fast food on health goes away in Model 2, when being a carnivore is considered.”
Not So Fast, Fast Food Flacks • Colinearity Diagnostics • Correlation table • Collinearity Statistics • VIF (should be < 10) and/or • Tolerance should be more than .20
Assessing Homoscedasticity Select: Plots Enter: ZRESID for Y and ZPRED for X Ideal Outcome: Equal distribution across chart
Extreme Cases * * Cases that deviate greatly from expected outcome (> ± 2.5) can warp regression. First, identify outliers using Casewise Diagnostics option. Then, correct outliers per outlier-correction options, which are: * * * * * * * * * * 1. Check for data entry error 2. Transform data 3. Recode as next highest/lowest plus/minus 1 4. Delete outlier
Casewise Diagnostics Print-out in SPSS Possible problem case
Casewise Diagnostics for Problem Cases Only In "Statistics" Option, select Casewise Diagnostics Select "outliers outside" and type in how many Std. Dev. you regard as critical. Default = 3, but can change to other value (e.g. 2.5)
What If Assumption(s) are Violated? What is problem with violating assumptions? Can't generalize from test sample to wider population. Overall, not much can be done if assumptions are substantially violated (i.e., extreme heteroscedasticity, extreme auto- correlation, severe non-linearity). Some options: 1. Heteroscedasticity: Transform raw data (sqr. root, etc.) 2. Non-linearity: Attempt logistic regression
A Word About Regression Assumptions and Diagnostics Are these conditions complicated to understand? Somewhat Are they laborious to check and correct? Somewhat Do most researchers understand, monitor, and address these conditions? No Even journal reviewers are often unschooled, or don’t take time, to check diagnostics. Journal space discourages authors from discussing diagnostics. Some have called for more attention to this inattention, but not much action. Should we do diagnostics? GIGO (Garbage In, Garbage Out), and fundamental ethics.