10 likes | 130 Views
Staying and Succeeding: Using Institutional Data to Predict Undergraduate Retention and Persistence to Graduation Thomas Geaghan Office of Institutional Research Case Western Reserve University. Study Question 3: Predictive modeling: Which variables are most
E N D
Staying and Succeeding: Using Institutional Data to Predict Undergraduate Retention and Persistence to GraduationThomas GeaghanOffice of Institutional Research Case Western Reserve University Study Question 3: Predictive modeling: Which variables are most related to persistence to graduation? Study Question 2: How do those who graduate in six years differ from those who do not? • ABSTRACT • The purpose of this study is to examine predictors of undergraduate retention and persistence to graduation among students at Case Western Reserve University. It is our hope that this research will provide the campus community with a better understanding of the differing needs and challenges of our students at various points throughout their college careers. Using variables from institutional databases, a series of t-tests were conducted to determine possible factors predictive of four outcomes: retention to the second year, retention to the third year, retention to the fourth year, and persistence to graduation. Next, we split the sample into two groups and these predictors were entered into a logistic regression analyses to determine which, while simultaneously considering the effects of all other variables, were most strongly related to six-year graduation rates. Results indicate that pre-college characteristics (SAT scores, AP tests taken), first semester GPA, engagement (Greek Life participation, varsity athletics) and Ohio residency are predictive of the six year graduation rate. The equation derived from the predictive model we built on part of our sample was then applied to the rest of our sample to develop a risk profile and assess its accuracy. • INTRODUCTION • Background • Past work has established a number of factors related to student retention and persistence to graduation. For example, high school GPA and SAT scores have been linked to retention to the sophomore year (Anonymous, 1997) and persistence to graduation has been shown to be associated with high school GPA, first semester college GPA, ethnicity, and distance from home (Murtaugh, Burns & Schuster, 1999). Still other studies have linked attrition to parent education (Eaton & Bean, 1995) and gender (Goenner & Snaith, 2004). • Importantly, more recent research has utilized hierarchical linear modeling (HLM) to show that predictors of retention often differ by the institution. For example, a factor such as high school rank may predict student success at Case Western Reserve University while being wholly unrelated to success at a peer institution (Mayhew, Vanderlinden, & Kim, 2008). • Though complex analyses such as hierarchical linear modeling, survival analysis, and structural equation modeling can help us to understand the factors that contribute to retention and persistence to graduation, they are not necessary. In fact, researchers have long relied on logistical and linear regression modeling to help them build risk profiles. Based on its intuitive nature, widespread use, and validation in the literature (Dey & Astin, 1993) we have chosen to use logistic regression. • The Current Study • The purpose of this poster is threefold. Our first study question will examine and feed back information to the campus community on the differences between those students who are retained year-to-year and those who are not. Our second study question focuses on differences between those who persist to graduation within six years and those who do not. For our final study question we will build and test a statistical models predicting persistence to graduation in hopes of helping administrators identify students who may be “at risk” for leaving the institution. • METHOD • Sample • In order to be able to focus on the industry standard of the six year graduation rate, we focused our analysis on the population of students who matriculated as first-time first-year students at Case Western Reserve University from the 2000-2001 academic year through the 2003-2004 academic year. Our original sample consisted of 3289 students , though seven students were removed from the analysis due to death leaving us with a final sample of 3282. Of these 3282 students, 61.3% were male, 38.7% were female; 90.7% majority (Caucasian, Asian), 9.3% minority; 3.5% international, 96.5% native. • Outcome Variables • Retention – Retention to the second year, third year, and fourth year were calculated by recoding students who were registered for courses in their third, fifth and seventh semesters into dummy variables (i.e., registered = 1, not registered = 0). Information was downloaded from the student informationsystem. • Persistence to Graduation – Students who graduated within six years of matriculation were coded as graduated (1) and those who had left Case or not graduated within six years were coded as not graduated (0). • Predictor Variables • Demographic Variables – We gathered demographic information from various database systems (ISIS, SIS, CAMS). Variables included gender, ethnicity, international/native status, and Ohio residency. • Pre-College Performance – We collected information from students’ high school transcripts on test performance (SAT, ACT, AP tests) and class rank. We also calculated an SAT Converted variable by using a conversion table from the College Board to convert ACT scores to comparable SAT scores. Finally, students who took a math aptitude test (AP calculus, SAT II calculus, etc) were coded as 1 and those who did not were coded as 0. • Transcript Data – first semester grade point average: This variable was downloaded from our student information system. The variable was not examined in our descriptive analysis but was included in the logistic regression model. • Engagement – We measured engagement by examining students who pledged a fraternity or sorority in their first year and those were members of a varsity athletic team at some point in their college career. Study Question 1: How do those retained to the second, third, and fourth year differ from those who are not retained? Analysis plan We first split our sample into two separate databases. The first database (modeling data) consisted of students who matriculated between 2000 and 2003. The second database (testing data) consisted of students who matriculated in 2004. We then conducted a stepwise linear regression analysis on the first set of students, and then applied that model to our second set of students to test the model’s accuracy. QUESTION 2: RESULTS Logistic Regression Model In order to include as many students as possible, we chose to include in our model only those variables for which we had complete data for the majority of students. Specifically, we entered gender, ethnicity, Ohio residency, international student status, math aptitude test status, number of AP tests taken, athlete status, Greek pledge status, SAT 1600 converted score, and first semester GPA as predictors of whether a student graduated in six or fewer years. Variables were entered in a forward conditional method. Of the ten variables considered, six were predictive of persistence to graduation: Ohio residency, athlete status, Greek pledge status, SAT 1600 converted score, number of AP tests taken, and first semester GPA. Model Efficacy After using our modeling data to build our retention model, we applied the regression equation to the testing data and conducted a probability for each test student. In our model, the probability we calculate is the probability a student will persist to graduation. Our next step was to determine a cutoff point at which we felt a student should be considered “at risk” for leaving the institution before graduation. Using the modeling data, we were able to determine that non-completers had, on average, about a 75% probability for persisting to graduation. Therefore, we set our cutoff point at a probability level of 75%. Any students who had a lower than 75% chance for graduating were considered “at risk.” Our model, then, identified 191 students in our testing data who should be considered “at risk.” The chart below compares graduation rate for at risk students with their low-risk peers: Results of this analysis suggest that the model and risk cutoff, while not perfect, do reasonably well predicting students who may be appropriate targets for some sort of intervention. Indeed, if an intervention was effective in aiding half of the at-risk non-graduates persist to graduation, the overall graduation rate of the university would rise by approximately 5%. Furthermore, our model predicts with this level of success despite the fact that our six predictor variables only account for 14.5% of the variance in our outcome measure. By improving our model, perhaps by adding financial aid or additional admissions data, we feel that we could even better identify students in need of aid. SUMMARY AND CONCLUSION We hope that our results help to inform the campus community about factors related to student retention and persistence to graduation. Specifically, we have shown that gender, ethnicity, pre-college performance, and participation in the campus community are all related to persistence at Case Western Reserve University. We were also able to demonstrate how a logistic regression model can be used, to some degree of effectiveness, in identifying students at risk for leaving the institution. We hope that, with a slightly improved model, we might be able to even better identify students who may be in need of a retention intervention. REFERENCES Anonymous (1997). Freshman-to-sophomore persistence rates, 1983-1997. Postsecondary Education OPPORTUNITY, 60, 1-7. Dey, E.L. & Astin, A.W. (1993). Statistical alternatives for studying college student retention: A comparative analysis of logit, probit, and linear regression. Research in Higher Education, 34, 569-581. Eaton, S.B., & Bean, J.P. (1995). An approach/avoidance behavioral model of college student attrition. Research in Higher Education, 36, 617-645. Goenner, C.F., & Snaith, S.M. (2004). Accounting for model uncertainty in the prediction of university graduation rates. Research in Higher Education, 45, 25-41. Mayhew, M.J., Vanderlinden, K., & Kim, E.K. (2009). Research in Higher Education, Published Online. Murtaugh, P.A., Burns, L.D., & Schuster, J. (1999). Predicting the retention of university students. Research in Higher Education, 40, 355-371. Analysis plan For continuous predictor variables, t-tests were conducted to determine differences between retained and non-retained students. For categorical predictor variables, chi-square analyses were conducted to determine differences between retained and non-retained students. For each column below, those who were retained are compared to all non-retained students, regardless of which semester the non-retained student left the university. In our sample, 91.3% of students were retained to the second year, 84.4% were retained to the third year, and 80.5% of students were retained to the fourth year. QUESTION 1: RESULTS Table 1: Percentage of students retained by demographic characteristics (categorical variables) Analysis plan As with question 1, for continuous predictor variables, t-tests were conducted to determine differences between graduates and non-graduates. For categorical predictor variables, chi-square analyses were conducted to determine differences between graduates and non-graduates. In our sample, 79.5% of students who matriculated at the university graduated within six years. QUESTION 2: RESULTS Table 3: Percentage of students who graduate in 6 years by demographic characteristics (categorical variables) Table 2: Pre-College performance by retention status (continuous variables) Table 4: Pre-College performance by graduation status (continuous variables) The categorical variables related to second year retention were Ohio Residency, math aptitude test, and the two measures of engagement. Specifically, those from Ohio, those who took math aptitude tests, athletes, and those who pledged a fraternity or sorority in their first semester were retained at a significantly higher rate than were their peers. Nearly all of the variables we examined were related to third and fourth year retention, the exception being whether a student was involved in the Greek system in their first year. All but one of the continuous variables we examined were related to second year retention. That one unrelated variable, SAT Verbal score, was related to third and fourth year retention. In fact, all of the continuous variables we examined were related to third and fourth year retention. Those retained had significantly higher ACT Composite scores, SAT Math scores, SAT Verbal scores, and SAT 1600 Converted scores. Those retained also had higher class ranks and had taken more AP tests during high school. As with retention, nearly all variables examined were associated with graduation status. Specifically, women, majority students, Ohio residents, non-international students, those who took a math aptitude test, and student athletes are significantly more likely to graduate in six years than were their peers. Pledging a fraternity or sorority was not related to the six year graduation rate. Similarly, compared to non-graduates, those who graduate in six years had higher ACT composite, SAT Math, SAT Verbal, and SAT 1600 converted scores. Those who graduated also had higher class ranks and had taken more AP tests during high school.