490 likes | 639 Views
A Predictive Model of Inquiry to Enrollment. Cullen F. Goenner, PhD Department of Economics University of North Dakota cullen.goenner@und.nodak.edu www.business.und.edu/goenner Kenton Pauls Director of Enrollment Services University of North Dakota kenton.pauls@mail.und.nodak.edu.
E N D
A Predictive Model of Inquiry to Enrollment Cullen F. Goenner, PhD Department of Economics University of North Dakota cullen.goenner@und.nodak.edu www.business.und.edu/goenner Kenton Pauls Director of Enrollment Services University of North Dakota kenton.pauls@mail.und.nodak.edu
Issues Facing Enrollment Managers • Finding new “markets” • Increasing Tuition • Declining population (ND) • Increasing competition • Need to attract a particular type of student • Diversity/Quality • Data driven analysis • Accountability
Questions we will answer today • What is predictive modeling? • How does one build a predictive model? • How can predictive modeling be used by institutions of higher education to improve enrollment?
What is Predictive Modeling? • Predictive modeling uses statistical/econometric methods to quantitatively predict the future behavior of individuals. • Steps include • Data collection on the subject of interest • Build the model based on data analysis • Predictions made out of sample • Model validation/testing
College Choice 3 stage process - Hossler and Gallagher (1987) • Predisposition/aspiration for higher education Encouragement, coursework, and interest. • Search of potential schools Councilors, campus contacts, program availability • Selection SES, Ability, Fit, Geography
Factors Influencing Choice Economic perspective: • Education an investment in human capital • Cost vs Benefit calculus Psychological perspective: • Need of self to find sense of belonging and fulfillment of needs. Sociological perspective: • Social interaction dictated by societal/family norms.
Existing Empirical Work Search Choice • Applications: • DesJardin, Dundar, Hendel (1999) • Weiler (1994) • Interest: SAT scores sent • Toutkoushian (2001)
Existing Models of Enrollment Choice • Model a student’s binary choice to enroll at a particular college while controlling for a student’s characteristics. • Logistic models used • Conditional on students have • Applied • Bruggink and Gambhir (1996) • Thomas, Dawes, and Reznik (2001) • Admitted • DesJardins (2002) • Leppel (1993)
Our Predictive Model • Builds on the models of DesJardins (2002) and Thomas, Dawes, Reznik (2001) • Focus here is on prediction of enrollment of students that inquired of our institution. • “Inquiry model” is relevant because: • Time of information exchange, opinion formation • Allows for early intervention in a student’s decision making process (Target Marketing)
Inquiry Model Challenges • Data collection • Data already collected on those who are admitted or apply. Typically not collected for inquiries. • Quality of data • Applicants provide detailed data describing themselves (demographic data test scores, HSGPA, etc.), which are not available for most student inquiries.
Types of Inquiries We Recorded • Return of information card • Attendance of college fair • Campus visit • Contact via e-mail • Contact via phone • Referral from faculty, coach, or alumni • ACT automatically submitted
How these data were captured • Enrollment Services Prospective Student Network relational database (ESPSN) • Customized system • SQL 2000/Visual Basic
Information Collected From Information Request Card • Name • High School attended • Interested Major (if any) • Address Lacks the demographic data typical to application records and use in most predictive models.
Geodemography • Process of attaching demographic characteristics to geographic characteristics. • Notion is that “Birds of a Feather Flock Together”, i.e. individuals living in the same neighborhood will tend to have similar behavior patterns. • Ex: Neighborhoods homogenous in terms of household income, occupations, family size, and purchases.
Implementation • US Census data aggregated to zip code level • “Geodemographic” variables considered for our model specification: • College age demographic • Population • Average Income • White demographic • Median age
Building the model • Binary choice model: Model whether students, who inquire of UND, either enroll or do not enroll. • 15,827 students made inquiries for Fall 2003 enrollment. Of these students 2067 actually enrolled. • Logistic regression model used.
Candidate Control Variables • Type and Frequency of Contact • Geographic • Academic • Geodemographic • Interaction Effects
Model Specification • Researchers typically assume their model specification is the true model which generates the data. • Difficult to justify a priori the choice of variables to include in model, given each by design is theoretically relevant. • With k candidate variables there are 2k different linear models one could consider.
Consider the case in which several models {M1, … MK} are theoretically possible. • Basing inference on the results of a single model is risky. • Bayesian model averaging (BMA) allows us to account for this type of uncertainty.
BMA The posterior distribution of the parameters given the data in the presence of uncertainty is the posterior distribution under each of the K models, with weights equal to the posterior model probabilities P(Mk/D) . (1)
Posterior Model Probability is (2) Where P(D/Mk) is the likelihood and P(Mk) is the prior probability that model Mk is the true model, given one of the K models is the true model.
Posterior Model Probability Assuming a non-informative prior, (P(M1) = … P(Mk) = 1/K) (3)
The posterior mean and variance summarize the effects of the parameters on the dependent variable. Raftery (1995) reports (9) where (k) and Var(k) are MLE under model k, and the summation is over models that include .
BMA Implementation • SPlus function bic.logit – performs BMA on logistic regression models. • 30 regressors implies summation in equation 1 over 1 billion models. • To manage summation we use Occam’s window.
Occam’s Window Exclude models that predict the data sufficiently less than predictions of the best model. Predictions based on PMP of each model. Models in A’ are included
Results • 26 Models supported by the data • Model with highest PMP receives 21% of total. • Variables that receive strong support for inclusion include: • Geographic: Distance, HY State, HY School, Competitor distance • Geodemog: College Age, Average Income • Contacts: Number, Campus visit, Referral
Out of Sample Predictive Performance • Split the data into two equal parts: • First part used to build/estimate the model • Second part used to test the model’s predictions. • Outcome (enrollment) is binary, while our model generates a probability estimate.
What is a successful prediction? • Greene (2001) - No “correct” choice for probability cutoff. Typical value is .5 • Tradeoff in cutoff choice: • Lower cutoff increases the accuracy of inquiries that are predicted to enroll and who actually enroll (sensitivity) at the expense of inquiries predicted to enroll and do not enroll (false positive rate)
Predictive performance • 89% of observations correctly classified • Specificity: 97% • Sensitivity: 36% • ROC curve describes relation between sensitivity and 1- specificity (false + rate) • Area under ROC curve = .87
79% of enrolled found within 22% of entire population (scores >= 0.2) • Focused efforts without compromising enrollment numbers • Efficiency implications
Practical Applications • Effective regional market segmentation • Targeted tele-counseling efforts • Special projects
Regional Market Segmenting • Target Marketing and Segmentation • Prospect names purchased based on zip code. • Establish a predictive “score” for all zip codes in US based on census-level data
83% of enrolled WA students fell within top scoring zips over three years • Direct Mail Names Purchases • Prior years very open search criteria • MN, CO, SD, MT • This year, much more restrictive to get deeper into broader markets • Only key zips • CO, WA, OR, AZ, IL, MN, etc.
Targeted Tele-Counseling Efforts • Student calling program • Top 20% of all model scores identified • Fluid number excluding applicants • Prompt student to take action
Special Projects • Limited funds but targeted initiatives • Focus on as many of top scoring students • Postcards, brochures, etc.
Possible Future Research • Cluster analysis for better market segmentation • Study of marginal effects
Thank You! Questions?