1 / 48

Students Tackle Graduation Rates with Data Mining and JMP ® Pro

Students Tackle Graduation Rates with Data Mining and JMP ® Pro. JMP Discovery Conference 2012 Jim Grayson & Mary Filpus-Luyckx Augusta State University. Agenda. Project: Origin and Context Objective Data Characteristics & Exploration Analysis: Partitioning & Logistic Regression

odell
Download Presentation

Students Tackle Graduation Rates with Data Mining and JMP ® Pro

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Students Tackle Graduation Rates with Data Mining and JMP®Pro JMP Discovery Conference 2012 Jim Grayson & Mary Filpus-Luyckx Augusta State University

  2. Agenda • Project: Origin and Context • Objective • Data Characteristics & Exploration • Analysis: Partitioning & Logistic Regression • Insights and Recommendations Discovery Summit 2012

  3. Augusta State University • Public state access university located in Augusta, Georgia • Enrollment: 6,741 students • Retention Rate from first to second year: 67% • Graduation Rate (6 years): 22% Georgia Health Sciences University • Public state health research university also located in Augusta, Georgia • Enrollment: 2,400 students (400 undergrads, upper-level only) • Graduation Rate (6 years): 96% The New University • Research University with access? We need to find out why ASU’s retention and graduations rates are so low and we need to do it NOW!

  4. Hull College of Business • Business Analytics Class • Elective: Marketing, MIS • Project Orientation • Campus Client: Institutional Research Discovery Summit 2012

  5. Student Preparation • Use JMP as Analytical Engine • Text: Data Mining for Business Intelligence, 2ed, Shmueli, et al. • Primary Methods: Multiple Regression, Partitioning, Logistic Regression, Clustering Discovery Summit 2012

  6. Business Analytics Project • Purpose: This study is being undertaken for two purposes: • To better understand the characteristics common to students that do not complete graduation in six years, and • To develop and validate a model to predict whether a student will graduate in six years. Discovery Summit 2012

  7. Deliverables Your preliminary deliverable is a “technical review” to show (a) the results of your model development, and (b) your assessment of the model’s validity and usefulness. The final deliverables of this project are (a) project report describing your data mining process including your recommendations, supporting data, and analysis, and (b) project presentation which effectively communicates the background of the project and its insights and recommendations. The project report should be organized with the following sections: Executive Summary, Insights and Recommendations, Model Development Process and Results. Supporting data and charts should be included within the body of the report if they are referenced in the narrative, otherwise, these data and charts should be organized in appendices. Discovery Summit 2012

  8. Project Steps • Translate the business problem into a data mining problem • Describe the problem opportunity and business benefit • Briefly describe other research that will leverage your efforts • Select appropriate data • Explain the data identified • Explain the process of selecting data • Get to know the data • Describe the data • Describe insights gained from exploring the data set Discovery Summit 2012

  9. Steps Con’t • Create a model set [this is being done for you by Institutional Research] • Fix problems with the data • Transform data • As necessary, data transformations such as normalizing the data, etc. • Converting variables into subsets to facilitate analysis Discovery Summit 2012

  10. Steps Con’t • Build models • Choice of techniques and rationale • Model results • Assess models • Usefulness for predictability • Performance measures • Interpret the results • Implications of the results • Limitations (what you would have done if you could and what you want to do next) • Recommendations to the project sponsor Discovery Summit 2012

  11. Student Reports Discovery Summit 2012

  12. Student Results

  13. Data Snapshot Discovery Summit 2012

  14. Data Dictionary

  15. Response Variable Discovery Summit 2012

  16. Graphical Exploration of Relationships Discovery Summit 2012

  17. Exploring 1-Way Relationships • Categorical Variables • Race • Type of High School • Continuous Variables • HS GPA • SAT V • SAT M Discovery Summit 2012

  18. Exploring 1-Way Relationships • Categorical Variables • Race • Type of High School • Continuous Variables • HS GPA • SAT V • SAT M Discovery Summit 2012

  19. Many Relationships: Scatterplots Not Graduated Graduated Discovery Summit 2012

  20. Partitioning Discovery Summit 2012

  21. Partitioning Methods • Decision Tree • Bootstrap Forest • Boosted Tree * Includes the First Term GPA Discovery Summit 2012

  22. Decision Tree • First Term GPA • Race • SAT M Discovery Summit 2012

  23. Decision Tree Including First Term GPA Discovery Summit 2012

  24. Decision Tree Including First Term GPA Discovery Summit 2012

  25. Decision Tree Including First Term GPA Discovery Summit 2012

  26. Decision Tree • HS GPA • Race • SAT M • FT/PT Discovery Summit 2012

  27. Decision Tree Without First Term GPA Discovery Summit 2012

  28. Without First Term GPA Decision Tree Discovery Summit 2012

  29. Decision Tree Without First Term GPA Discovery Summit 2012

  30. Bootstrap Forest • First Term GPA • HS GPA • Age • SAT M • Race Including First Term GPA Discovery Summit 2012

  31. Bootstrap Forest • HS GPA • SAT V • SAT M • Race • Age Without First Term GPA Discovery Summit 2012

  32. Boosted Tree • First Term GPA • SAT M • Race • Age Including First Term GPA Discovery Summit 2012

  33. Boosted Tree • HS GPA • HS Type • SAT V • SAT M • Race Without First Term GPA Discovery Summit 2012

  34. Conclusions: Partition Models • After student is enrolled, the best factor to track to intervene is the First Term GPA • When accepting students the following factors could indicate a support system will be necessary to facilitate success • High School GPA • SAT Math and Verbal Scores • Type of High School • Race, Age and Gender of student Discovery Summit 2012

  35. Logistic Regression Discovery Summit 2012

  36. Logistic Regression Model Discovery Summit 2012

  37. Model Parameters Discovery Summit 2012

  38. ROC Curve Discovery Summit 2012

  39. Misclassification Misclassification Rate = (46 + 182)/(29 + 182 + 46 + 603) = 26.5% Discovery Summit 2012

  40. Logit Model Discovery Summit 2012

  41. Model Implications Discovery Summit 2012

  42. Model Scoring Probability Of Graduating is: Where Lin[Grad] is: Discovery Summit 2012

  43. Logit (Before Entering) Discovery Summit 2012

  44. Model Scoring(Before Entering) Probability Of Graduating is: Where Lin[Grad] 2 is: Discovery Summit 2012

  45. Model Implications (Before Entering) Discovery Summit 2012

  46. Conclusions: Logistic Regression • Before Entering: Biggest “odds enhancer” is HSGPA • After Entering: Biggest “odds enhancer” is First Term GPA • Either case, special attention to students with “odds” classifiers below 1 and low First Term GPAs Discovery Summit 2012

  47. Study Conclusions • At admissions students should be identified as “high risk” who match the qualifiers identified in our models • In the first semester students with low GPAs should be identified and provided help and mentoring Discovery Summit 2012

  48. Acknowledgements and References Data Mining for Business Intelligence, 2ed by GalitShmueli, Nitin R. Patel and Peter C. Bruce We acknowledge the help of the following individuals: Kerrie Scott, Institutional Research Office Discovery Summit 2012

More Related