560 likes | 692 Views
Here, pal! Regress this!. Part 2. presented by. Miles Hamby , PhD Principle, Ariel Training Consultants MilesFlight.20megsfree.com drhamby@cox.net. The Equation. MODEL 3 IV B (Slope) (Constant) 35.577 Age -.117 Gender -.110 Married -4.05E-02 Black .439 Native Am .719 Asian -.553
E N D
Here, pal! Regress this! Part 2 presented by Miles Hamby, PhD Principle, Ariel Training Consultants MilesFlight.20megsfree.com drhamby@cox.net
The Equation MODEL 3 IV B (Slope) (Constant) 35.577 Age -.117 Gender -.110 Married -4.05E-02 Black .439 Native Am .719 Asian -.553 Hispanic -.830 Unknown .531 Alien -.618 GPA -.277 Transfer Cr 4.285E-02 Undergrad -3.259 Tutoring -4.71E-07 Accounting 2.638 Business 2.651 Y = a + bAge + bGen + bMar +bBlk + bNA + bAsn + bHis + bUnk + bAln + bGPA + bXfer + bUndergrad + bTutor + bAcc + bBus Y = 35.57 + (-.11)Age + (-.11)Gen + (-.04)Mar + (.43)Black + (.71)NatAm + (-.55)Asian + (-.83)Hisp + (-.53)Unk + (-.61)Alien + (.27)GPA + (.04)Xfer + (-3.25)Under + (-.04)Tutor + (2.63)Acc + (2.65)Bus
Let’s Predict! What is the predicted Quarters to completion for: Age 36, Male, Single, Black, US citizen, 3.5 GPA, 35 Transfer credits, Undergraduate, no Tutoring, Business major Y = 35.57 - (.11)Age - (.11)Gen - (.04)Mar + (.43)Black + (.71)NatAm - (.55)Asian - (.83)Hisp - (.53)Unk - (.61)Alien - (.27)GPA + (.04)Xfer – (3.25)Under - (.04)Tutor + (2.63)Acc + (2.65)Bus Y = 35.57 - (.11)(36) - (.11)(0) - (.04)(0) + (.43)(1) + (.71)(0) - (.55)(0) - (.83)(0) - (.53)(0) - (.61)(0) - (.27)(3.5) + (.04)(35) – (3.25)(1) - (.04)(0) + (2.63)(0) + (2.65)(1) 35.86 = 35.57 – 3.96 - 0 - 0 + .43 + 0 - 0 - 0 - 0 – 0 - .94 + 1.4 – 3.25 - 0 + 0 + 2.65
What is the predicted Quarters to completion for: Age 45, Female, Married, White, Alien, 3.0 GPA, No Transfer credits, Undergraduate, Tutored, Computer major Y = 35.57 - (.11)Age - (.11)Gen - (.04)Mar + (.43)Black + (.71)NatAm - (.55)Asian - (.83)Hisp - (.53)Unk - (.61)Alien - (.27)GPA + (.04)Xfer – (3.25)Under - (.04)Tutor + (2.63)Acc + (2.65)Bus Y = 35.57 - (.11)(45) - (.11)(1) - (.04)(1) + (.43)(0) + (.71)(0) - (.55)(0) - (.83)(0) - (.53)(0) - (.61)(1) - (.27)(3.0) + (.04)(0) – (3.25)(1) - (.04)(1) + (2.63)(0) + (2.65)(0) 25.8 = 35.57 – 4.95 - .11 - .04 + 0 + 0 - 0 - 0 - 0 - .61 - .81 + 0 - 3.25 - .04 + 0 + 0
Example Profiles Excel
Variation in the DV All three Models are significant (F < .05) Each successive Model explains more of the variation (R2) in the DV (Time to Completion) But, 84.6% or more of the variation is still unexplained
Possible factors? Worklife, children, personal goals, financial aid, company sponsorship The point is – with R2 only .154, there is some other other factor out there contributing more to Time to Completion and we need to find it!
Variation in the Slopes Is the slope of Age (-.117) more or less than slope of GPA (-.277)? Cannot tell by the slopes – cannot compare apples to oranges Apples to apples – i.e., use Standardized ‘Beta’ Beta Age (-.162) more Beta Acc (.016); i.e., unit of Age results in greater change than unit of GPA
Drawing Conclusions Summarize the correlations (Pearson’s R) “There is a statistically significant association between all the variables and Time to Completion.” Summarize the effects (coefficient B) “Academic major and transfer credits, and Undergraduate status seem to have the greatest affects.” Summarize the variation (R2) “However, 86% of the variation in Time to Completion is still unexplained.” Suggest what’s next “Data on worklife, income, finances, and company sponsorship should be collected and anlayed.”
In Summary • Regression measures the strength of association (correlation) for all variables considered at the same time • Regression measures the amount of effect (slope) of each variable on the dependent variable as ameliorated by all other variables • Regression can predict the outcome of any given profile
Regress it, Pal! It’s where it’s at!
Tests of Significance Purpose – determine if there is a significant difference between means of the categories of the nominal variable • t-test for dichotomous variable (two categories) • eg – Is there a difference in GPA between men and women? • F-test - One-way ANOVA for polychomtomous (more than two categories) • eg -- Is there a difference in GPA between African-American, Hispanic, Anglo, and Native American students?
References Lind, D., Marchal, R., Mason (2001); Statistical Techniques in Business & Economics, 11th ed., McGraw-Hill Companies, Inc., New York, NY. ISBN 0-07-112318-0 McClendon, J. (1994); Multiple Regression and Causal Analysis, F.E. Peacock Pulishers, Inc., Itasca, IL. ISBN 0-87581-384-4 SPSS (1999); SPSS Base 9.0 Applications Guide, SPSS, Inc., Chicago, IL. ISBN 0-13-020401-3
Shortcoming of t-test and F ~ They do not predict. Regression predicts! eg - Can we predict the GPA of a student based on gender? Can we predict the level of satisfaction with a course based on gender? Can we predict the likelihood of graduation of a student based on gender?
Examples - Means of t-test and F Dichotomous - Find the mean GPA of males and that of females and compare them with a t-test. Polychotomous - Find the mean GPA for African-Americans, Hispanics, and Anglos and compare them with a one-way ANOVA
Example 1 Data Arbitrarily Code ‘gender’ (nominal variables) Female = 1 Male = 0
Example 1 (a) Correlation r (SPSS ‘R’) = .846 Interpretation – GPA is strongly associated with gender type
Example 1 (b) Significance of difference in means of GPA by gender – ANOVA F < 0.05 Interpretation - reject Ho, i.e., there is a statistically significant difference in GPA according to gender
Example 1 (c) Regression model (y=a+bx) GPA = 1.9 + 2.3 (gender code) Interpretation – Male SAT is 1.9, female SAT is 1.9 + 2.3 = 4.2; i.e., mean female GPA is higher than mean male GPA
Example 1 (a) Correlation r (SPSS ‘R’) = .837 Interpretation – GPA is strongly associated with gender type
Example 1 (b) Significance of difference in means of GPA by gender – ANOVA F < 0.05 Interpretation - reject Ho, i.e., there is a statistically significant difference in GPA according to gender
a Coefficients Standardi zed Unstandardized Coefficien Coefficients ts Model B Std. Error Beta t Sig. 1 (Constant) 2.620 .112 23.377 .000 Gender 1.030 .158 .837 6.498 .000 a. Dependent Variable: GPA Example 1 (c) Regression model (y=a+bx) GPA = 2.62 + 1.03 (gender code) Interpretation – Male GPA is 2.62, female GPA 2.62 + 1.03 = 3.65; i.e., mean female GPA is higher than mean male GPA
For dichotomous variable, coding number is not important Nonsense coding – randomly assigning a random number to a nominal variable eg – Male = 1, Female = 2 Male = 13, Female = 43 Male = 0, Female = 1 Hispanic = 35, African-American = 72, Anglo = 87 Regardless of the number assigned to a nominal variable, the strength of association is unaffected, ie, r (correlation), r2 (coef. of determination) and B (slope)
– 1.03 - 1 Slope B= Ym – Yf Xm - Xf 2.62 – 3.65 0 - 1 = = = 1.03 BUT – slopes and intercepts coded nonsense are difficult to interpret, unless coded ‘0’ or ‘1’ eg - Male = 0, Female = 1 Mean GPA for Male (Ym) = 2.8, Mean GPA for Female (Yf) = 3.5 Result - the mean GPA of the category coded 0 = the Y-intercept
Y 3.65 2.62 – 1.03 - 1 Slope B= Ym – Yf Xm - Xf 2.62 – 3.65 0 - 1 = = = 1.03 1 (Female) 0 (Male) X B = 1.03 Interpretation – Female GPAs tend to be predictably higher than Male GPAs
Y 3.65 2.62 B = - 1.03 1.03 - 1 Slope B= Ym – Yf Xm - Xf 3.65-2.62 0 - 1 = = = -1.03 1 (Male) 0 (Female) X Recode Male = 1, Female = 0: Interpretation – same result Female GPAs tend to be predictably higher than Male GPAs
Y 3.65 2.62 1 (Male) 0 (Female) X With one variable category = 0, (eg female) then Y intercept is the mean of that category and the slope predicts the other category Thus, regression equation is: Y = 3.65 – 1.03X Interpretation – We can predict GPA based on male or female
Fine – but what about polychotomous variables? Cannot use single dummy variable for more than two categories. Why? This would assume the nominal categories were actually interval, ie, one was more of the other. • eg, if ethnic variable were coded thus: • Hispanic = 1, African-Am = 2, Anglo = 3, • the regression would assume that Anglo is 2 units greater than Hispanic, etc
Regressionalso interprets a dichotomous variable (eg male=0, female=1) as female being 1 unit more than male. But, with dichotomous, the mean score of code ‘0’ is the intercept, and the mean score of code ‘1’ is the intercept + the slope. However, with more than two categories, this is not true.
Therefore, must treat each category as a unique variable – a ‘Dummy’ variable • Code each category/variable as: • 1 = ‘presence of characteristic’ or • 0 = ‘absence of characteristic’ eg – Ethnic Category (Depicts 3 students – one in each ethnic category)
Coding Polychotomous Nominal Variables As Dummy Variables For each case/subject, code each category as either ‘having it’ or ‘not’ • eg – • Student 1 is an African-American • Student 2 is an Hispanic • Student 3 is an Anglo (Depicts 3 students – one in each ethnic category)
Regression equation would look like: Y = a + bH + bAA + bAn Problem – ‘perfect multi-collinearity’ • i.e., the sum of the three dummies for each case always equals ‘1’. Stud 1 (Hispanic) ~ 0 + 1 + 0 = 1 Stud 2 (Afr-Am) ~ 1 + 0 + 0 = 1 Stud 3 (Anglo) ~ 0 + 0 + 1= 1
The resulting regression equation would return a confusing Y-intercept (a): Y = a + bH + bAA + bAn i.e., what is the reference point from which to determine the actual means of the other variables?
What to do - drop one category from the regression i.e., use only g – 1 dummies eg, - Y = a + bH + bAA (bAn dropped for all cases) Reference group – the category/group chosen to be dropped Choosing the Reference group – the group that has the most normative support
By leaving out a group, not all cases will sum to ‘1’, and therefore: the regression equation predicts the mean Y for the group to which the case/student belongs, in reference to the Y-intercept. Student 1 (African-AM): YAA = a + 0 (b*0H) + b (b*1AA) = a + bAA Student 2 (Hispanic): YH = a + b (b*1H) + 0 (b*0 AA) = a + bH Student 3 (Anglo): YAn = a + 0 (b*0H) + 0 (b*0 AA) = a i.e., Mean Y of reference group ‘Anglo’ is the intercept ‘a’; All other groups are then compared to ‘Anglo’
Example – Satisfaction with a course Satisfaction coding:
Assume that a multiple regression of the affect of ethnicity on satisfaction returned a Y-intercept of 3.0 with slopes H =.4, AA =.2 (An held out as reference group) i.e., Y = 3.0 + .4H + .2AA Thus, predicted satisfaction level for any other Hispanic student would be ~ Y = 3.0 + 0.4*1 +0.2*0 = 3.4 Likewise, predicted satisfaction level for any other Africa-American student would be ~ = 3.0 + 0.4*0 +0.2*1 = 3.2 Predicted satisfaction level for any Anglo student is the intercept ‘a’ = 3.0
Slopes – indicate difference between a specific category/group and the reference group. i.e., Y = 3.0 + 0.4H + 0.2AA ie, the 0.4 slope for Hispanic indicates Hispanic satisfaction is 0.4 more than Anglo. Likewise, African-American satisfaction is 0.2 more than Anglo. Also, relatively, African-American satisfaction is 0.2 less than Hispanic.
Note - this does not predict the satisfaction level (or GPA, etc) of a unique individual student – only one of a particular ethnic background. Because there is no ‘degree’ of the characteristic ‘ethnicity’ i.e., you are either Anglo, or you are not.
Example (re Example Data) Satisfaction and Ethnic Group – Anglo as reference group Regression Model~Y = 3.0 -.333AA + .375H Interpretation - Mean Anglo satisfaction level is 3.0, mean Afr-Am level is 2.667, mean Hispanic level is 3.375
Effect of Multi-colinearity in SPSS – If SPSS detects perfect multi-colinearity within the selected IVs, it drops one IV.
Adding Other Variables To make a prediction more ‘individually unique’, add other variables eg, gender (nominal), age (ratio), time spent on homework (ratio) Y = a + [b*H + b*AA] + b*Age + b*Homework
Example – given above data, the regression prediction model would be: Y(GPA) = a + [b*H + b*AA] + b*Age + b*Homework
Y(GPA) = a + [b*H + b*AA] + b*Age + b*Homework However - things now change Intercept - new ‘a’ intercept is no longer the mean score for Anglo - it is now the individual score for someone who scored ‘0’ age and ‘0’ hours on homework Slopes – now indicates difference between ethnic group and the reference group for individuals who do not differ in ‘age’ or ‘homework’.
Applications Student Opinion Polls (RETENTION???) at Strayer University Research Question - Do gender, culture, or age of a student have an effect on the student’s perception of his/her learning? RETENTION???? That is, can we predict a student’s RETNETION???perception of his/her learning based on his/her gender, culture, and age? And if so, which variable has the greatest effect?
Applications Methodolgy Collect data from a survey asking students to indicate their perception of satisfaction and instructor effectiveness and how they perceived their instructor. Survey must be designed for a regression, i.e., must have DV and IV.
Dependent Variables: Satisfaction – Scale data, 4 through 1 How satisfying was this course? VERY SATISFYING SATISFYING NOT SATISFYING DISAPPOINTING Instructor Effectiveness - Scale data, 4 through 1 How effective do you feel your instructor was? VERY EFFECTIVE EFFECTIVE SOMEWHAT NOT EFFECTIVE
Independent Variables – nominal, four descriptors Which one of the following describes your instructor’s teaching technique? Which one of the following describes your instructor’s involvement with students? Which one of the following describes your instructor’s method of teaching? Which one of the following best describes your instructor?
Independent Variables – nominal, four descriptors Which one of the following describes your instructor’s teaching technique? Which one of the following describes your instructor’s involvement with students? Which one of the following describes your instructor’s method of teaching? Which one of the following best describes your instructor?
SPSS Regression Output – Correlation Instructor Descriptor on Satisfaction (all included) Interpretation – As all descriptors were included, the correlation (multiple R) is difficult to interpret.