10 likes | 152 Views
Analysing the Retention in University Park by Logistic Regression Benaglia, T.A., Hummel, R.M., Pietras, J., Altman, N. Department of Statistics – Penn State University. Acknowledgment: This project was developed in Stat 511 course – Fall 2004.
E N D
Analysing the Retention in University Park by Logistic Regression Benaglia, T.A., Hummel, R.M., Pietras, J., Altman, N. Department of Statistics – Penn State University • Acknowledgment: • This project was developed in Stat 511 course – Fall 2004. • Thanks to the College of Science for providing the data. • Objective: • For students who enroll as freshmen in the College of Science, what factors most significantly influence whether they stay at University Park or transfer to another Penn State campus? The dependent variable is categorical, that is, students’ choice of campus is categorized as (0) staying at University Park, or (1) leaving University Park. The initial potential predictors are Ethnicity, High School GPA, SAT Math Score, SAT Verbal Score, FTCAP Math 140, FTCAP Math 110, FTCAP Math 40, and FTCAP BMath (where FTCAP scores were from University-wide freshmen testing). • Data Description: • The data was compiled by the College of Science Dean’s Office, from all incoming freshmen enrolling in the College of Science during the Fall 2003 semester.Students with missing information were not considered in this study. Any results obtained in this study will be applicable to any incoming freshmen who first report a major in the College of Science and subsequently transfer their major to another Penn State college. The data were specified in the following way: • CAMPUS is recorded either as (0) if the student chooses to stay at University Park (UP) or as (1) if the student chooses to transfer to another Penn State campus. • Ethnicity is categorized as: • 2 - African/Black American 3 - Asian American • 4 - Latino American 5 - White American • The categories of Native American (1) and Other (6) were not included in the study because there were not enough Native American students to make an appropriate analysis and the category of Other would have no meaning when reporting our analysis based on ethnicity. • High School GPA (HSGPA) is the GPA reported by the incoming student’s graduating high school and is measured on a 4.0 scale. • SAT Math Score (SATMATH) is the SAT Math Score reported by ETS to the University, which is measured on an 800 point scale. • SAT Verbal Score (SATVERB) is the SAT Verbal Score reported by ETS to the University, which is measured on an 800 point scale. • FTCAP Math 140 (FTCAPMATH140) is the freshmen testing score for Math 140. • FTCAP Math 110 (FTCAPMATH110) is the freshmen testing score for Math 110. • FTCAP Math 40 (FTCAPMATH40) is the freshmen testing score for Math 40. • FTCAP BMath (FTCAPBMATH) is the freshmen testing score for B Math. • Data Analysis • Methods. • Since the dependent variable (CAMPUS) is categorical (dicotomic), it is adequate to use Logistic Regression, that is, modeling the probability that a student will stay at UP considering the possible explanatory variables.According to our research question, the first step wasthe application of variable selection methods to determine which predictors were significant in explaining whether or not students who transfer their majors from the College of Science will stay at the UP campus. • Results. • After a backward stepwise regression, the final model will include as explanatory variables: AFRICAN, ASIAN, HISPANIC, and FTCAPMATH110. • Figure 3: Estimated Probabilities versus FTCAPMATH110 with interaction • From figure above, notice that as FTCAPMATH110 scores increase: • The probability for African American students to stay at UP rises quickly and then asymptotes to 1. • The probability for Asian students to stay at UP decreases. • The probability for Hispanic students to stay at UP decreases. • The probability for Caucasian students to stay at UP increases fairly steadily. This probability does not change with the interaction except by the intercept. • This unusual plot and the significance of only the ASIAN*FTCAPMATH110 interaction suggests that there may be a different relationship for Asian students versus non-Asian students. Then, considering the re-categorized observations as Asian or non-Asian (rather than African-American, Hispanic, Asian, and Caucasian) and using logistic regression on this reformed variable yields the following plot. (Only FTCAPMATH110 is used because it was detected before that it is the only significant non-ethnicity predictor.) Figure 1: Estimated Probabilities by African, Asian and Hispanic In the boxplots above, note that African American students have a higher probability, on average, to stay at UP, compared to non-African American students. Asian students, on the other hand, have a lower probability, on average, of staying at UP, compared to non-Asian students. Hispanic students, like African American students, have a higher probability, on average, of staying at UP, compared to non-Hispanic students. Figure 4: Estimated Probabilities versus FTCAPMATH110 by Asian In this plot, the probability of staying at University Park based on FTCAPMATH110 scores for Asian versus non-Asian is significantly different. For Asian students, as FTCAPMATH110 scores increase, the probability of staying at UP increases; for non-Asian students, as FTCAPMATH110 scores increase, the probability of staying at UP decreases. Conclusions: The only significant predictors of whether or not a student will stay at UP are the student’s FTCAPMATH110 score and whether the student is Asian or non-Asian. If the student is Asian, then, as FTCAPMATH110 scores increase, the student’s probability of staying at UP decreases dramatically. If the student is non-Asian, then, as FTCAPMATH110 scores increase, the student’s probability of staying at UP increases almost as dramatically. The probability of an Asian student staying at UP given a very poor grade (0) on the FTCAPMATH110 is nearly 1. This probability sinks to about .64 as the FTCAPMATH110 scores rise to 26 (a perfect score). For a non-Asian student, a low FTCAPMATH110 score (0) yields a probability of approximately .66 that the student will stay at UP. This increases to about .91 for a perfect score. The two groups have the same predicted probability at an FTCAPMATH110 score of approximately 18.5. Figure 2: Estimated Probabilities versus FTCAPMATH110 by Ethnicity Figure above shows a very distinct separation, given the student’s ethnicity, of probability trends for whether or not a student stays at UP based on their FTCAPMATH110 score. It is possible that there may be a relationship between combinations of variables, which is described using interaction terms. For example, Asian students with high FTCAPMATH110 scores may be under significant pressure to stay at the UP campus. But, in this model, only FTCAPMATH110 is significant, sothe next step isto fit the regression of CAMPUSon AFRICAN, ASIAN, HISPANIC, FTCAPMATH110, and the interaction terms AFRICAN*FTCAPMATH110, ASIAN* FTCAPMATH110, HISPANIC* FTCAPMATH110. The only interaction term that is significant is ASIAN* FTCAPMATH110. (The interaction terms with AFRICAN and HISPANIC are not significant.)