200 likes | 206 Views
A web-based prediction tool that implements a machine learning model to provide students with personalized insights and recommendations on effective educational behaviors. The tool allows students to explore which behaviors lead to higher success rates and ask "what if" questions.
E N D
Building a predictive model to enhance students' self-driven engagement Moletsane Moletsane T: +27(0)51 401 9111 | info@ufs.ac.za | www.ufs.ac.za
Overview • Introduction • Introduction and Motivation for a sensitivity tool • Data • Criteria for inclusion of variables • Variables used • Modelling • Random Forest modelling process • Evaluation of the model • The “what-if” tool.
What is student engagement? Student engagement measures provide information about: • What students do– time and energy devoted to educationally purposeful activities • What institutions do– using effective educational practices to induce students to do the right things With the aim of: • channelling student energy towards activities that matter.
what do we learn from se surveys? • In the absence of reliable indicators of actual student learning, SE surveys are “process indicators or proxies for student learning outcomes” (Banta, Pike, Hansen, 2009; Kuh, 2009)
Is se data shared with students? • Little use of student engagement data by students. • Similar for technology committees/groups in the institutions (NSSE, 2014)
How can we best share SE data to students? • In a manner that: • Guides students’ effective educational behaviours and encourages students to make more informed decisions regarding their learning • Reflects the students interest. • Does not violate students’ privacy • User friendly
How can we best share SE data? • Possible methods include: • Creating an annual report for students • Releasing snippets of data at certain time intervals (Social media, Posters, Email, SMSs) • Publishing SE articles in varsity magazines • Using SE data during the advising process, or • Providing students with aggregated data • Through a web based prediction tool that implements a model based on SE data .
What is the prediction tool? • A prediction model (We use a machine learning technique for the prediction modelling) • That is implemented in a web interface (Built in the R environment) • To make reactive predictions to students inputs on the tool • That allow students to: • Explore which educational behaviours lead to a higher chance of success, thus encouraging students to make more informed decisions regarding their learning. • Ask what if questions, and then find answers
What data do we have? • Student Engagement data • UFS data from 2013 to 2016. • Biographical data • Institutional Data • Students’ outcome e.g. we use proportion of modules passed • Students’ credit and module load • Biographical data
Should we Include all the data? • Biographical Data • Since we intend on sharing the tool with students, we believe that biographical data may be interpreted in a prejudiced manner. E.g. Race, disability, or gender. • Non actionable data • For the purpose of the tool, some non actionable data was not included in the prediction model despite being modest predictors. E.g. Faculty, residence status
SASSE data • UFS data from 2013 to 2016 has 6213 respondents. • Only 4602 of the observations are matched to the institutional data. • 190 variables
How do we choose which variables to use? • Variable Importance • The machine learning technique we use has a built in variable selection method. • The method is based on cross validation principles for variables which ranks the variables by the loss of accuracy the model has when a model is implemented without that feature. • From the top ranking variables, we select the most predictive 8 variables for our method.
How do we choose which variables to use? • Variable Importance • The machine learning technique we use has a built in variable selection method. • The method is based on cross validation principles for variables which ranks the variables by the loss of accuracy the model has when a model is implemented without that feature. • From the top ranking variables, we select the best 5 variables for our interface.
Algorithm • From 1 to K • Draw a bootstrap sample of size n from the data • Grow a random forest tree to the bootstrapped data by • Selecting m variables at random from the p variables • Pick the best variable split among the m variables • Split the node into two data nodes • Output the ensemble of trees • Make a final prediction based on the majority vote of ensemble
Overview of the random forest model New data Sample 1 Learning algorithm Classifier 1 Training data Combined classifiers Sample 2 Learning algorithm Classifier 2 Sample k Learning algorithm Classifier k Prediction
Model Resutls • Prediction with all (177) the variables sample (20.97%) • False positive rate = 20.8% • False negative rate = 21.08% • Prediction with the selected (8) variables sample (23.64%) • False positive rate = 24.3% • False negative rate = 23.5% Pred Actual Pred Actual
Thank you T: +27(0)51 401 9111 | info@ufs.ac.za | www.ufs.ac.za