1 / 60

Here, pal! Regress this!

Learn how to use regression analysis to predict graduation time based on various demographic and academic variables. Understand the shortcomings of descriptive statistics and how regression analysis can provide predictive insights. Follow a step-by-step procedure to identify independent variables that affect graduation rates and run linear regression models for accurate predictions.

andrewreed
Download Presentation

Here, pal! Regress this!

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Here, pal! Regress this! presented by Miles Hamby, PhD Research & Training Consultants MilesFlight.com

  2. Here, pal! Regress this! presented by Miles Hamby, PhD Director of Institutional Research & Assessment Strayer University 202-419-0402 mile.hamby@strayer.edu

  3. Typical – Descriptive Statistics • Frequencies – numbers of things • eg – 70 out 340 (21% ) of female students have graduated over the last 6 years • Mean – measure of central tendency • eg – The average time to complete an academic program for students with 12 hours transfer credit is 36 terms. • Standard Deviation – measure of dispersion • eg – 68% of completing students graduate between 25 and 42 terms

  4. Shortcoming of Descriptive Statistics They can tell you what it is – but they can’t tell you what it will be They do not predict.

  5. Regression predicts! eg - Can we predict how many female students will graduate and when? Can we predict when a student with no transfer credit will graduate? Can we predict the likelihood of graduation of a student based on gender?

  6. How to Use Regression to Predict Question – What kind of student takes the longest time to graduate? What kind of student never graduates?

  7. Typical way – • Start with specific cohort (eg, Fall 1993) • Select a single group (eg, 1-12 transfer credits) • Count number who graduate each term • Compute percentage ~ • 25 graduated  100 started = 25% Conclusion – For Fall 93 cohort, graduation rate = 25% after 12 terms for those with 1-12 transfer credits

  8. Exiguousness of Typical Method – • DV implied, not specified (and therefore not tested) • Does not measure strength of association (correlation) to graduation time or amount of effect (slope) on graduation time • eg – compare age’s effect to transfer credits’ effect • Graduation Rate does not predict time-in-program or time-to-completion, or even whether or not one will graduate • Must repeat procedure for each time block

  9. X X X Variable Time to Graduation Females ~ 1-12 Xfer Cr ~ Married ~ = 16 terms, S = 5 terms = 13 terms, S = 4 terms = 18 terms, S = 9 terms Typical Method, e.g. Time to graduation for each variable not discrete - includes all other variables

  10. But how about a single, black, man with 17 transfer credits? Must repeat procedure for single students, then repeat for black students, then repeat for males then repeat for 13 – 20 transfer credits, then ‘eyeball’ how they correlate. Is there a way to determine how much of the 16 terms time for females (previous ex.) would be ameliorated by being a single, black, male with 17 transfer credit hours?

  11. There is a way! Regress it! Effects of gender, age, transfer credits, marital status, citizenship, ethnicity, and more, directly on time to complete are measurable and comparable Pick a profile and I’ll tell you how long it will take for that student to graduate!

  12. Procedure – 1. Identify dependent variable (DV) – i.e, the question you are asking – eg, Time to Graduate (Time) 2. Identify independent variables (IV) that possibly effect graduation rates – gender, ethnicity, marital status, age, transfer credits, income 3. Collect data 4. Runlinear regression to determine: (a) correlations between Time and IVs (b) significance of difference in means of IVs (c) regression model (y = a+b1X1…bnXn) to predict Time by IVs

  13. Regression can tell you everything! EG – For a single male, age 32, with 18 transfer credits - we can expect a graduation time of 32 terms # Terms = a + .4*Marital + .2*Gender + .06*Age - .18*xfer # Terms = 33 terms + .4*0 + .2*0 + .06*32 - 1.7*18 32 terms = 33 terms + 0 + 0 + 2 - 3

  14. Adding Variables DV ~ Time to Graduation (# terms - ratio) • IV ~ Gender (F or M - nominal) • Ethnic (B, H, W, NA, API, Alien, Unk - nominal) • Alien (Alien or US - nominal) • Marital status (si, ma, di – nominal) • Age (# years - ratio) • Transfer credits (# hours - ratio) • Tutoring done (# sessions – ratio; Y/N - nominal

  15. Coding Your Variables Scale (ratio) variables (time to completion, age, etc) – use number directly • eg, Age = 32 years, use ’32’ • Time to Comp (terms) = 12 terms, use ’12’

  16. Coding Your Variables Nominal Variables – use ‘dummies’ What are Dummy Variables? Variables used to quantify nominal variables i.e., Nominal (qualitative) variables assigned a quantitative number and treated as a quantitative variable.

  17. Dummy Variables Dichotomous variable – two categories • eg - Male or Female • Married or Single • Has had tutoring or hasn’t • US Citizen or Alien • Graduate student or Undergrad Polychotomous variable – several categories of the variable • eg – Ethnic - African-American, Hispanic, White • Major – Bus, Account, Computers, English, LA • Religion – Christian, Jew, Muslim, Hindu

  18. Dummy Variables • eg, ‘Gender’ • Code Male = 0, Female = 1 (or vice-versa) • 1 = ‘presence of characteristic’ (femaleness) • 0 = ‘absence of characteristic’ ‘Ethnic’ Make B, NA/AN, W, API,H, Unk unique variables Code as1 = ‘presence of characteristic’ (‘Black’-ness) or 0 = ‘absence of characteristic’

  19. Dummy Variables Alien: 1 = yes, 2 = no Marital: 1 = MA/DI 0 = SI Gender: 1 = F, 0 = M Age: number years Transfer credits: number B: 1 = yes, 0 = no AN: 1 = yes, 0 = no W: 1= yes, 0 = no API: 1 = yes, 0 = no H: 1 = yes, 0 = no Unk: 1 = yes, 0 = no # Terms = 3 terms + .2*1 + .3*32 + 1.2*10 + .4*3

  20. As Used in the Regression e.g. ~ Black, US Citizen, single, female, married, 32 years old, 10 transfer credits: • # Terms = 32 terms + [.2*1+.2*0+.2*0 +.2*0] (ethnic) • + .5*0 (Alien) + .4*1 (marital) • + .2*1 (gender) • + .06*32 (age) • - 1.7*10 (xfer credits)

  21. SEX GENDR TUTSES TUTRD LEVEL U/G MARITL MARIT VISA ALIEN F 1 3 1 U 1 SI 0 F-1 1 F 1 2 1 U 1 SI 0 US 0 M 0 1 1 U 1 DI 1 US 0 F 1 0 0 G 0 MA 1 P-R 1 M 0 1 1 U 1 MA 1 GREEN 1 M 0 0 0 U 1 SI 0 US 0 F 1 0 0 G 0 9 F-4 1 • Nominal Variables – Dichotomous - 2 values • Create new column for dummy variable or recode original • 1 = presence of characteristic of interest • 0 = not the characteristic of interest (absence of characteristic)

  22. MAJOR ACC BUS CIS ETHNIC 1BLACK 2NATAM 3WHITE 4ASIAN 5HISP 0UNKN ACC 1 0 0 1 1 0 0 0 0 0 ACC 1 0 0 5 0 0 0 0 1 0 BUS 0 1 0 3 0 0 1 0 0 0 CIS 0 0 1 0 0 0 1 0 0 1 BUS 0 1 0 1 1 0 0 0 0 0 CIS 0 0 1 2 0 1 0 0 0 0 ACC 1 0 0 4 0 0 0 1 0 0 • Nominal Variables – more than 2 values • Create new columns for dummy variables • – one for each value • 1 = presence of characteristic (value) • 0 = absence of characteristic

  23. Run the Regression SPSS ANALYZE/REGRESSION/LINEAR/DV to Dependent, first model IVs to Independent/NEXT/2nd model IVs to Independent/NEXT or STATISTICS/check Model Fit, Descriptives, R Squared Change/Continue/OK

  24. The Results!

  25. Regression Models

  26. Variable Correlations .005 .338 Note – although all variables show correlation to each other, the correlation (R) may not be significant

  27. The Regression ANOVA Test of significance of the F statistic indicates all three the regression models are statistically significant (Sig. < .05) i.e, the variation was not by chance – another set of data would probably show the same results.

  28. 893.215 38.960 F = = 22.926 The Regression ANOVA The larger the F (ratio of the mean square of the Regression and mean square of the Error/Residual), the more robust the regression equation. I.e., the smaller the mean square residual, indicates smaller error or departure from the regression line.

  29. Model 2  y Model 1 Y error  y error  ŷ  ŷ QTRS to Completion + 0 Variation about the Regression Line Interpretation – Mean Square Error/Residual of Model 1 is > Mean Square Error of Model 2

  30. The Regression Correlation (R) Model 3 returns the highest correlation (R = .392) with 15.4% (R2 = .154) of the variation in Time to Completion (in Qtrs) being explained by the variables Alien, Ethnicity, Marital status, Gender, Age, Tutoring, Transfer credits, U/G status, and Major.

  31. The Slopes Model 3 Interpretation • The older the student, the shorter the time to completion (B = -.117)

  32. Model 3 Slopes Graph – AGE Y Interpretation – Age slope shallow, slight effect on Qtrs to Completion AGE B = - .117 35.577 QTRS to Completion 0 yrs 70 yrs

  33. The Slopes Model 3 Interpretation • The older the student, the shorter the time to completion (B = -.117) • Married/Divorced tends to shorten completion time • (B= -.0405), but is not significant (Sig. = .309, >.05)

  34. Model 3 Slopes Graph – Married/Divorced Y Interpretation – Married/Divorced very shallow, but not significant (Sig. <.000) Married B = - .0405 35.577 QTRS to Completion 0 (Single) 1 (Married/Divorced)

  35. The Slopes Model 3 Interpretation • The older the student, the shorter the time to completion (B = -.117) • Married/Divorced tends to shorten completion time • (B= -.0405), but is not significant (Sig. = .309, >.05) • Undergraduates tend to take considerably less time to complete than graduates • (B = -3.259)

  36. Model 3 Slopes Graph – Undergraduate vs Graduate Y Interpretation – Undergraduates steep, tend to shorten Qtrs to Completion considerably over Graduates 35.577 Under B = - 3.259 QTRS to Completion 0 (Graduate) 1 (Undergraduate)

  37. The Slopes Model 3 Interpretation • The older the student, the shorter the time to completion (B = -.117) • Married/Divorced tends to shorten completion time • (B= -.0405), but is not significant (Sig. = .309, >.05) • Undergraduates tend to take considerably less time to complete than graduates • (B = -3.259) • Tutoring shortens time very slightly (B = -.0471), but is not significant (Sig. =.571)

  38. Model 3 Slopes Graph – Undergraduate vs Graduate Y Interpretation – Undergraduates steep, tend to shorten Qtrs to Completion considerably over Graduates, but not significant (Sig. .571 > .05) QTRS to Completion 35.577 Tutored B = - .000000471 0 (No Tutoring) 1 (Tutored)

  39. The Slopes Mode 3 Interpretation • Xfer slightly lengthens time (B=.04285) very slightly; GPA shortens time but is not significant (Sig. >.05)

  40. Model 3 Slopes Graph – GPA & Transfer Credits Interpretation – Xfer & GPA very shallow, but GPA not significant (Sig. <.000) Y GPA B = - .277 35.577 QTRS to Completion Xfer B = - .04285 0 50 100 150 Xfer 0 1.00 2.00 3.00 4.00 GPA

  41. The Slopes Model 3 Interpretation • Xfer lengthens slightly; GPA shortens, but not significant • Female tends to shorten time (B = -.110) over Male

  42. Model 3 Slopes Graph - Gender Y Interpretation – Female Qtrs to Completion tend to be predictably shorter than Male Qtrs 35.577 QTRS to Completion Gender B = - .110 1 (Female) 0 (Male) X

  43. The Slopes Model 3 Interpretation • Xfer lengthens slightly; GPA shortens, but not significant • Female tends to shorten time (B = -.329) over Male • Black, Nat Am & Unkn take longer than Whites (+ B) (NA not significant) Hisp & Asians tend to take shorter than Whites (-B)

  44. Model 3 Slopes Graph - Ethnicity Native Am B = .719 Black B = .439 Y Unknown .531 White B = 0 QTRS to Completion Asian -.553 Hispanic B = - .830 X Interpretation – Black, Asian & Unknown tend to take longer than Whites (+ B); Hispanic & Native American tend to take shorter than Whites (-B)

  45. The Slopes Model 3 Interpretation • Xfer lengthens slightly; GPA shortens, but not significant • Female tends to shorten time (B = -.329) over Male • Black, Nat Am & Unkn take longer than Whites (+ B); Hisp & Asians tend to take shorter than Whites (-B) • Alien tends to take less time than US citizen (B = -.618)

  46. Model 3 Slopes Graph - Alien Y Interpretation – Alien tends to take less time than US citizen (B = .279) QTRS to Completion Alien B = - .618 1 (Alien) 0 (US) X

  47. The Slopes Model 3 Interpretation • Xfer lengthens slightly; GPA shortens, but not significant • Female tends to shorten time (B = -.329) over Male • Black, Nat Am & Unkn take longer than Whites (+ B); Hisp & Asians tend to take shorter than Whites (-B) • Alien tends to take less time than US citizens (B = -.618) • Acc & Bus considerable effect (B= 2.638, 2.651); pos. relative to CIS slope ‘0’

  48. Model 3 Slopes Graph - Major Business B = 2.651 Y Accounting B = 2.638 QTRS to Completion Computers B = 0 X Interpretation – Accounting & Business steepest slopes (2.638, 2.651); positive relative to CIS slope ‘0’

  49. Coffee-break!

  50. The Equation MODEL 3 IV B (Slope) (Constant) 35.577 Age -.117 Gender -.110 Married -4.05E-02 Black .439 Native Am .719 Asian -.553 Hispanic -.830 Unknown .531 Alien -.618 GPA -.277 Transfer Cr 4.285E-02 Undergrad -3.259 Tutoring -4.71E-07 Accounting 2.638 Business 2.651 Y = a + bAge + bGen + bMar +bBlk + bNA + bAsn + bHis + bUnk + bAln + bGPA + bXfer + bUndergrad + bTutor + bAcc + bBus Y = 35.57 + (-.11)Age + (-.11)Gen + (-.04)Mar + (.43)Black + (.71)NatAm + (-.55)Asian + (-.83)Hisp + (-.53)Unk + (-.61)Alien + (.27)GPA + (.04)Xfer + (-3.25)Under + (-.04)Tutor + (2.63)Acc + (2.65)Bus

More Related