130 likes | 146 Views
Learn how the University of Alabama used data mining to enhance institutional effectiveness by identifying at-risk students and implementing intervention strategies. Explore the joint effort between the Department of Statistics and the Enrollment Office, and discover how this approach can be applied to your institution's planning unit.
E N D
A way to integrate IR and Academic activities to enhance institutional effectiveness. Introduction The University of Alabama (State of Alabama, USA) was one of the first universities to use data mining for retention and all the way through the cycle to include intervention as well as recruitment. High school students most likely to attend the university and freshman at risk of dropping out are for example identified. Of particular interest is the fact that this was a joint effort between data mining students in the Department of Statistics and the Enrollment Office. This lead to the idea to establish cooperation between the Planning Unit and the Department of Mathematical Statistics and Actuarial Science at the UFS. In the Statistics department a post graduate course in Data Mining is presented using SAS Enterprise Miner. The fact that this course has a practical component which constitutes at least 40% of the final mark, creates the opportunity to involve the students in Institutional Research activities. A total of twenty projects were identified and one assigned to each of the thirty students enrolled for this course. The data used for these projects are from the student database of the UFS.
The Post Graduate Course Currently only an introductory data mining course is presented. For optimal results it will be necessary to introduce a further more advanced data mining course. Course 1: Introduction to Data Mining. In this course SAS Enterprise Miner and the use of predictive models are introduced. A broad overview to the modeling techniques of Logistic Regression, Decision Trees, and Neural Networks are provided. The concepts of data partitioning, model assessment using lifts charts and ROC curves, and model implementation are presented. The project is part of this course. Course 2: Advanced Data Mining. This should provide a more in-depth coverage of the technical aspects of each of the modeling tools discussed in the first course. Topics in Statistical Decision Theory and unsupervised learning can also be included.
The dataset and the variables ID Identification number of student RACE Race of student GENDER Gender of student CAMPUSY1 Campus of registration for year one FACULTYY1 Faculty for year one MINYEARSTOGRAD Minimum years to obtain qualification registered for EXTENDEDPY1 Is the qualification registered for an extended program? (Y,N,NOTAV) AGEY1 Age of the student when registering for year one H_LANGUAGE The home language of the student M_COUNT The M-count obtained by the student NUMCREDITS1Y1 The total number of credits registered for in the first semester of year one PROPCREDITSPASSED1Y1 The proportion of credits passed in the first semester of year one NUMCREDITSY1 The total number of credits registered for in year one PROPCREDITSPASSEDY1 The proportion of credits passed in year one Y TARGET The binary dependent variable which can be 1, to indicate success, and 0 to indicate failure
The projects The following projects were identified: 1. Build predictive models using decision trees, regression, and neural networks to identify successful students in Faculty A at the end of the first year of study. Definitions: Success is defined as the event that a student in Faculty A completes the qualification registered for in year one in the minimum time. Failure is defined as the event that a student in Faculty A fails to complete the qualification registered for in year one in the minimum time. Variables included in dataset: ID, RACE, GENDER, CAMPUSY1, MINYEARSTOGRAD, EXTENDEDPRY1, AGEY1, H_LANGUAGE, M_COUNT, NUMCREDITSY1, PROPCREDITSPASSEDY1, TARGET
Preliminary results for Project 1: Faculty of Natural and Agricultural Sciences
Faculties can now be compared. Consider the rule that leads to the highest probability for success in each of the three faculties: Rule for faculty of Natural and Agricultural Sciences: If PROPCREDITSPASSEDY1> 0.89 and NUMCREDITSY1 >142, then P(Success) = 0.41 Rule for faculty of Economic and Management Sciences: If PROPCREDITSPASSEDY1>0.81, NUMCREDITY1>116.5, and M_COUNT>40.5, then P(Success)=0.59 Rule for faculty of the Humanities: If M_COUNT>34.5 and PROPCREDITSPASSEDY1>0.74, then P(Success)=0.58
2. Build predictive models using decision trees, regression, and neural networks to identify students likely to dropout from Faculty A at the end of year one. Definitions: Dropout is defined as the event that a student, who did not graduate at the end of year one, is not registered at the beginning of year two for any qualification in Faculty A. Only data available at the end of the first semester should be used. No-Dropout is defined as the event that a student is still registered for a qualification in Faculty A (not necessarily the same qualification as in year one). Variables included in dataset: ID, RACE, GENDER, CAMPUSY1, MINYEARSTOGRAD, AGEY1, H_LANGUAGE, M_COUNT, NUMCREDITS1Y1, PROPCREDITSPASSED1Y1, TARGET
3. Build predictive models using decision trees, regression, and neural networks to identify students in Faculty A likely to pass more than 90% of the courses registered for in year one. Only data available at the time of registration should be used. 4. Build predictive models using decision trees, regression, and neural networks to identify students in Faculty A likely to pass less than 20% of the courses registered for in year one. Only data available at the time of registration should be used. Variables included in dataset: ID, RACE, GENDER, CAMPUSY1, MINYEARSTOGRAD, AGEY1, H_LANGUAGE, M_COUNT, TARGET “Faculty A” can be : A. Humanities B. Education C. Natural and Agricultural Sciences D. Business and Management Sciences E. All faculties
The advantages • 1. Students are exposed to real life data sets. • 2. Students are introduced to an aspect of Institutional Research and will gain • insight in the challenges universities are faced with. • 3. Recent computing advances have created an increased demand for Business • Intelligence (BI) professionals. The courses are designed to educate students • to meet the marketplace demand. • 4. Cooperation and understanding between support services and academics are • promoted. • 5. The university is provided with BI to facilitate the making of strategic decisions • on a large scale since several projects will run simultaneously. • Projects can be updated every year to accommodate new enrollments. • 7. Possible changes over time in predictive models can be investigated. • 8. The projects will enable comparison between faculties. It will for example be • possible to compare the indicators for a dropout in Faculty A with that of • Faculty B.
The challenges 1. Well equipped computer laboratories should be available for student use. 2. Students with an insufficient level of computer literacy should be prevented from entering the course. 3. A data warehouse should be in place and properly maintained for reliable results. 4. The student projects should be closely monitored and supervised.