270 likes | 286 Views
Study on predicting student enrollment increase in real-time using historical data and logistic regression model. Detailed analysis, programming, and variable selection included for precise predictions. Case study and results on predicting FTIC enrollment by weekly intervals presented. Identifying historical relationships influencing enrollment trends for effective forecasting.
E N D
Predicting Real-Time Percent Enrollment Increase__________________ Mark Hamner Texas Woman’s University Department of Mathematics and Computer Science Preet Ahluwalia Credit Risk Analyst-AmeriCredit
Texas Woman’s University Denton . Dallas . Houston Year 2005 Facts Total Enrollment – 11,344 Undergrad – 6,266 Graduate (Masters) – 4,369 Doctoral - 709 • Campus Enrollment • Denton –9,157 • Dallas – 921 • Houston – 1,266 • Female – 10,368 • Male – 976 59 academic programs (19 doctoral)
Outline Problem Definition Predicting Student Enrollment at Time ‘t’ Using Historical Data • Enrollment Process - For Newly Enrolled • The predictive problem • Logistic Prediction Model • a. Data Issues and programming Solutions • Quadratic Prediction Model • a. Exploratory analysis to Identify Patterns • Combine for overall Prediction: Results
Enrollment • Enrollment predictions can be broken into two fundamental pieces: • The focus of this paper is the prediction of Newly Enrolled students. Newly Enrolled Students Re-Enrolling/ Continuing Students
All Prospective Students Applicants FTIC Transfer Graduate Others Admitted to TWU New12th Day Enrolled New Students: Enrollment Process
Predict Predict Fall 12th Day Begin Prediction Time t Enrollment Prediction at Time ‘t’ • Let Time = t denote the prediction date • For Applicants Before t , we will have data • For Applicants after time t (denoted by t’) , we will not have data Total Enrollment = Enroll_t + Enroll_t’
Predict Predict Week 0 5 17 Weekly Partition of Prediction Interval • The prediction interval will be broken up into weekly Intervals • The diagram below illustrates prediction at Week = 5 • At Week = 5 we have 35 more days of applicant data than at Week = 0 Total Enroll = Enroll_t + Enroll_t’
Enroll_t • Pt= {1, 2, …, Nt} -- Finite set of applicants at week = t • kPt Enrollment is a dichotomous response variable – yk • yk= 1 (student enrolled), yk= 0 (student did not enroll) • Enrollment of all applicants at week = t ,
Model Dichotomous Variable For each yk, k Pt let θk represent the probability that yk = 1 • There exists applicant information for each individual: • xk = (x1k, x2k, …, xpk) = (Distancek, SATk,…, Major_Ratiok) Use Logistic Regression to model θk
Logistic Regression Model • The probability of student k enrolling is • Lk = β0+ β1 Distancek + β2 SATk +…+ βp Major_Ratiok These are predictor variables
Predict Enroll_t • Let Y be the random vector of responses: • Thus, Note: 1is a Nt x 1 vector of ones Estimated Enroll_t is …
Current Year Prediction Year Prior Applicant Data Logistic Model • Predictor variables: Distance, DOB, Major_Ratio, SAT_M, SAT_V, Gender, Personal, etc. • What variables will get picked for model building?
SAS Programming: Exploratory and Variable Creation Start Saturated Model Yes Drop Predictor No Stop Fitted Model Programming and Variable Selection • Use SAS to create possibly significant variables • and dummy code categorical variables • Example: Major_Ratio, Ethnic, etc. • Backward Selection • Slightly different variables are selected for: FTIC, Transfer, and Graduate.
Case Study-Logistic Model Prediction Applicant data for 2003 to predict 2004 FTIC by weekly time intervals • The Logistic Model does not predict after week = t
Enrollment after Week = t • Total Enrollment = Enroll_t + Enroll_t’ • At any week = t, we need to predict Enroll_t’ • Identify historical relationships that may be helpful
Applicant Versus Enrolled by Year • Both applications and enrollment have been increasing • Notice enrollment yield is decreasing Is the % increase in enrollment matching the % increase in apply?
Applicant Yield By Strata • Enrollment is yield from applicant data is decreasing for each strata • How does this affect yearly increase in enrollment?
Percent Increase Applicant Vs. Enrolled • Applicant increase is not a viable indicator of enrollment increase • What patterns are reliable to model?
Cumulative FTIC Enrollment by Week • Notice the parallel lines, which implies equal slopes! • At any week = t, we can relate Enroll_t to Total Enrollment(Week = 17) • Thus, (Total Enroll – Enroll_t) should be very similar from year to year
Relationship Between Enrollment & Total Enrollment • By definition, (Total Enroll – Enroll_t) = Enroll_t’ • Model Enroll_t’ and smooth out the consistent patterns by week
Enroll_t’ Model • Use 2003 Enroll_t’ Model to predict Enroll_t’ for 2004 Estimate of Enroll_t’: (R2 = 0.9857)
Predict 2004 FTIC Total Enroll Total Enrollment = Enroll_t + Enroll_t’ Note: 2004 FTIC Actual Total is 687
Predict 2005 FTIC Total Enroll Total Enrollment = Enroll_t + Enroll_t’ Note: 2005 FTIC Actual Total is 765
- END - Thank you! Any Questions?