180 likes | 275 Views
Inferring Conceptual Knowledge from Unstructured Student Writing. Workshop: Personalizing Education with Machine Learning Neural Information Processing Systems (NIPS) Conference Lake Tahoe, CA, 8 December 2012. Norma C. Ming. Vivienne L. Ming. The role of assessment in instruction.
E N D
Inferring Conceptual Knowledge from Unstructured Student Writing Workshop: Personalizing Education with Machine Learning Neural Information Processing Systems (NIPS) ConferenceLake Tahoe, CA, 8 December 2012 Norma C. Ming Vivienne L. Ming
The role of assessment in instruction • Reveals what students already know and what they need to learn • Provides feedback to students and teachers on success of learning and instruction • Timely and specific feedback can guide continued instruction (formative assessment) Graphic from http://www.cmu.edu/teaching/assessment/basics/alignment.html
Challenges with assessment • Large-scale assessment: • Heavy on summative assessment • Standardized tests, academic analytics systems • Emphasize performance, not conceptual understanding • Delayed, coarse-grained feedback • Intrusive • Interrupt class to administer test • Modify instruction to adopt others’ materials • Alternatives: • Teachers may lack training in designing and interpreting other kinds of assessment • Difficult to aggregate, calibrate Printable sign available athttp://www.pickens.k12.ga.us/assessment.html
Our goals • Use continuous, passive assessmentto elucidate conceptual knowledge. • Wealth of unstructured data • Informal • Build on teachers’ existing instruction • Align with formal assessment, e.g.: • course grades • standardized tests • instructor qualitative assessment
Research questions • Can topic models of unstructured student writing predict course outcomes? • How does the accuracy of these predictions change over time as more student work is analyzed? • What does learning the topic hierarchy add beyond conventional topic modeling in improving these predictions?
Dataset & Methods • Online discussion forums • 5- or 6-week courses • ≥2 mandatory discussion questions per week • Introductory courses at large, for-profit university
Analytical approach • Outcome of interest: Student conceptual understanding • Proxy Outcome: Student course grade • Compare possible data “features”: • Baseline: • Mean course grade • Individual student posting characteristics: • Word count • Conventional Semantic Modeling: • Probabilistic Latent Semantic Analysis (pLSA) • Feature of Interest: • Hierarchical Latent Dirichlet Allocation (hLDA)
Algorithms • Proof of concept: • Logistic regression on the accumulated topic coefficients from each week • Other supervised algorithms (e.g., SVM) surely “better” • LR chosen to focus on contribution from hLDA • Current work utilizes: • HCRF (Hidden-state Conditional Random Fields) • Improved weekly predictions • Allows forward prediction in course time
Results: Biology course • Prediction accuracy: • Word count > mean (for 3+ wks) • pLSA >> word count • hLDA > pLSA • With more data collected over time: • All predictions improve.
Results: Economics course • Prediction accuracy: • Word count > mean (for 2+ wks) • pLSA > word count • hLDA >> pLSA • With more data collected over time: • All predictions improve.
Topic modeling can distinguish topics discussed by final grades. • Each point represents posts by one student • Posts projected in 100-D pLSA concept space • Used local linear embedding (LLE) to reduce to 2-D C’s & D’s neglect these topics Increasing final grades
Comments by higher grade-earners reveal more structure. • Each point represents one post, color-coded by grade • D’s and below cluster in the center • Higher grades move in specific directions toward periphery • Directions may correspond to course structure or instructor’s guidance • Not just depth or specificity, but particular concepts
Structure corresponds to course topics. • Same points, color-coded by week • Different weeks on different branches • Low grades stay in center even when discussion topics invite more specific comments.
What does hierarchical modeling add? • Not all language is equal. • Conventional topic modeling treats all topics as equal (and independent). • Hierarchy implies ranking: • Shallower = more frequent and generic language • Deeper = more infrequent and technical language
Examining hLDA results (Econ) • Posts from students earning higher grades correlated with: • Higher mean of depth in hLDA • C grades: most language at shallowest level • A, B grades: more language at deeper levels • More technically proficient language use • General language: more anecdotal comments • Specific language: greater conceptual depth
Summary of results • Can topic models of unstructured student writing predict course outcomes? • YES – pLSA, hLDA both better than chance (and better than post length). • How does the accuracy of these predictions change over time as more student work is analyzed? • Extra weeks of data improves predictions. • By end of course, pLSA predictions are within one letter grade. • What does learning the topic hierarchy add beyond conventional topic modeling in improving these predictions? • hLDA > pLSA • Higher grades associated with discussion of deeper topics in hLDA.
Conclusions and Future Work • There is some collection of topics associated with higher grades (and some other collection of topics associated with lower grades). • Deeper topics associated with low/high grades could potentially differ; analysis yet to be done. • i.e., deep misconceptions such as inheriting acquired traits (Lamarckian evolution) • Next steps: • Create topic map: • Hierarchical relationships • Normative sources (e.g., textbook, exemplary student work) • Labeled, non-normative sources (common misconceptions)
Implications • Extensions to other text data: • Essays, short-answer test questions • Online tutoring • Informal learning environments (e.g., Quora, Evernote) • Annotations on e-texts • Wiki contributions • Language mediates learning; text is everywhere. Learn from it, improve it.