1.41k likes | 1.42k Views
LightSIDE Tutorial. Carolyn Penstein Ros é Language Technologies Institute/ Human-Computer Interaction Institute. Introduction. Classification Engine. Learning Algorithm. Data. Model. Prediction. New Data. What is machine learning?. Automatically or semi-automatically
E N D
LightSIDETutorial Carolyn Penstein Rosé Language Technologies Institute/ Human-Computer Interaction Institute
Classification Engine Learning Algorithm Data Model Prediction New Data What is machine learning? • Automatically or semi-automatically • Inducing rules from data • Making predictions
Automatic Analysis Of Conversation Positive Learning Outcomes Conversational Interventions
Effective data representations make problems learnable… • Machine learning isn’t magic • But it can be useful for identifying meaningful patterns in your data when used properly • Proper use requires insight into your data ?
SouFLé Framework (Howley et al., 2013) What properties of discourse are important for learning discussions?
SouFLé Framework (Howley et al., 2013) What properties of discourse are important for learning discussions? Person Person
SouFLé Framework (Howley et al., 2013) Transactive Knowledge Integration Person Person
SouFLé Framework (Howley et al., 2013) Transactive Knowledge Integration Person Engagement Engagement Person
SouFLé Framework (Howley et al., 2013) Authority Transactive Knowledge Integration Person Engagement Engagement Person Authority
i • Definition of Transactivity • building on an idea expressed earlier in a conversation • using a reasoning statement That’s true, but the important point is that water can flow in, but starch can’t flow out. I think the tube will get heavier because water is going in
Transactivity (Berkowitz & Gibbs, 1983) • Findings • Moderating effect on learning (Joshi & Rosé, 2007; Russell, 2005; Kruger & Tomasello, 1986; Teasley, 1995) • Moderating effect on knowledge sharing in working groups (Gweon et al., 2011) • Computational Work • Can be automatically detected in: • Threaded group discussions (Kappa .69) (Rosé et al., 2008) • Transcribed classroom discussions (Kappa .69) (Ai et al., 2010) • Speech from dyadic discussions (R = .37) (Gweon et al., 2012) • Predictable from a measure of speech style accommodation computed by an unsupervised Dynamic Bayesian Network (Jain et al., 2012) 16
AUTHOR: Gerry >Michael blames his poor achievements on a lack of giftedness in mathematics. From… ------------------------- Wow, that was a really good work. Right on! ------------------------- From the case I could not however directly conclude that Michael thinks the task is too difficult for him. Instead I thought Michael thinks that he is too dumb for mathematics. -------------------------- Therefore, I did not include something about that in my contribution. AUTHOR: Hans Michael blames his poor achievements on a lack of giftedness in mathematics. ------------------------------- From this one can conclude that his attribution is internal and stable. Internal because it comes from within himself. And stable because it is something that can't be changed. Identifying Transactivity in Threaded Discussions • Social modes of co-construction (Weinberger & Fischer, 2006) • To what degree or in what ways learners refer to the contributions of their learning partners • TagHelper tools achieves reliability of .69 Kappa (Rosé et al., 2008)
2 AUTHOR: Gerry >Michael blames his poor achievements on a lack of giftedness in mathematics. From… ------------------------- Wow, that was a really good work. Right on! ------------------------- From the case I could not however directly conclude that Michael thinks the task is too difficult for him. Instead I thought Michael thinks that he is too dumb for mathematics. -------------------------- Therefore, I did not include something about that in my contribution. AUTHOR: Hans Michael blames his poor achievements on a lack of giftedness in mathematics. ------------------------------- From this one can conclude that his attribution is internal and stable. Internal because it comes from within himself. And stable because it is something that can't be changed. Thread Structure Features • Thread structure features • depth(numeric): the depth in the thread where a message appears • parent_child_similarity(numeric): semantic similarity (cosine similarity) between the current message segment to all its parent message segments. The highest value is chosen
Effective data representations make problems learnable… Know your data!! Remember! ?
Essential Reading • Witten, I. H., Frank, E., Hall, M. (2011). Data Mining: Practical Machine Learning Tools and Techniques, third edition, Elsevier: San Francisco
Automated Discourse Analysis • Howley, I., Mayfield, E. & Rosé, C. P. (2013). Linguistic Analysis Methods for Studying Small Groups, in Cindy Hmelo-Silver, Angela O’Donnell, Carol Chan, & Clark Chin (Eds.) International Handbook of Collaborative Learning, Taylor and Francis, Inc. • Rosé, C. P., Wang, Y.C., Cui, Y., Arguello, J., Stegmann, K., Weinberger, A., Fischer, F., (2008). Analyzing Collaborative Learning Processes Automatically: Exploiting the Advances of Computational Linguistics in Computer-Supported Collaborative Learning, submitted to the International Journal of Computer Supported Collaborative Learning 3(3), pp237-271.Mu, J., Stegmann, K., Mayfield, E., Rosé, C. P., Fischer, F. (2012). The ACODEA Framework: Developing Segmentation and Classification Schemes or Fully Automatic Analysis of Online Discussions. International Journal of Computer Supported Collaborative Learning7(2), pp285-305. • Gweon, G., Jain, M., Mc Donough, J., Raj, B., Rosé, C. P. (2013). Measuring Prevalence of Other-Oriented Transactive Contributions Using an Automated Measure of Speech Style Accommodation, International Journal of Computer Supported Collaborative Learning 8(2), pp 245-265.
Applications to Learning Sciences Research • Howley, I., Kumar, R., Mayfield, E., Dyke, G., & Rosé, C. P. (2013). Gaining Insights from Sociolinguistic Style Analysis for Redesign of Conversational Agent Based Support for Collaborative Learning, in Suthers, D., Lund, K., Rosé, C. P., Teplovs, C., Law, N. (Eds.). Productive Multivocality in the Analysis of Group Interactions, edited volume, Springer. • Howley, I., Mayfield, E., Rosé, C. P., & Strijbos, J. W. (2013). A Multivocal Process Analysis of Social Positioning in Study Group Interactions, in Suthers, D., Lund, K., Rosé, C. P., Teplovs, C., Law, N. (Eds.). Productive Multivocality in the Analysis of Group Interactions, edited volume, Springer. • Adamson, D., Dyke, G., Jang, H. J., Rosé, C. P. (2014). Towards an Agile Approach to Adapting Dynamic Collaboration Support to Student Needs, International Journal of AI in Education 24(1), pp91-121.
Consider this simple example… Look for what distinguishes Questions and Statements in this dataset. What clues do you see?
Not all questions end in a question mark. What are good features for text categorization? What distinguishes Questions and Statements?
I versus you is not a reliable predictor What are good features for text categorization? What distinguishes Questions and Statements?
Not all WH words occur in questions What are good features for text categorization? What distinguishes Questions and Statements?
Cows make cheese. 110010 Hamsters eat seeds. 001101 Basic IdeaRepresent text as a vector where each position corresponds to a termThis is called the “bag of words” approach Cheese Cows Eat Hamsters Make Seeds
Cows make cheese. 110010 Hamsters eat seeds. 001101 But same representation for “Cheese makes cows.”! Basic IdeaRepresent text as a vector where each position corresponds to a termThis is called the “bag of words” approach Cheese Cows Eat Hamsters Make Seeds
Examples from Gallup Poll Data • Male from Virginia, age 30, negative: “I think it’ll increase costs for everyone.” • Female from Illinois, unknown age, positive: “Because the cost of healthcare is just outta sight crazy” • Male from Michigan, age 70, positive: “the cost”
Basic Types of Features “Because the cost of healthcare is just outta sight crazy”
Basic Types of Features “Because the cost of healthcare is just outta sight crazy”
Basic Types of Features “Because the cost of healthcare is just outta sight crazy”
Basic Types of Features “the cost of healthcare” DT NN PRP NN
1. CC Coordinating conjunction 2. CD Cardinal number 3. DT Determiner 4. EX Existential there 5. FW Foreign word 6. IN Preposition/subord 7. JJ Adjective 8. JJR Adjective, comparative 9. JJS Adjective, superlative 10.LS List item marker 11.MD Modal 12.NN Noun, singular or mass 13.NNS Noun, plural 14.NNP Proper noun, singular 15.NNPS Proper noun, plural 16.PDT Predeterminer 17.POS Possessive ending 18.PRP Personal pronoun 19.PP Possessive pronoun 20.RB Adverb 21.RBR Adverb, comparative 22.RBS Adverb, superlative Part of Speech Tagging http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html
23.RP Particle 24.SYM Symbol 25.TO to 26.UH Interjection 27.VB Verb, base form 28.VBD Verb, past tense 29.VBG Verb, gerund/present participle 30.VBN Verb, past participle 31.VBP Verb, non-3rd ps. sing. present 32.VBZ Verb, 3rd ps. sing. present 33.WDT wh-determiner 34.WP wh-pronoun 35.WP Possessive wh-pronoun 36.WRB wh-adverb Part of Speech Tagging http://www.comp.leeds.ac.uk/ccalas/tagsets/upenn.html