190 likes | 361 Views
Feature Engineering Studio Special Session. September 11, 2013. Assignment One. Problem Proposal Due next Monday Be ready to talk for 3 minutes on: A data set Give where it came from and how big it is
E N D
Feature Engineering StudioSpecial Session September 11, 2013
Assignment One • Problem Proposal • Due next Monday • Be ready to talk for 3 minutes on: • A data set • Give where it came from and how big it is • You need to already have this data set, or be able to acquire it in the next two weeks • A prediction model you will build in this data set • What variable will you predict? • What kind of variables will you use to predict it? • Why is this worth doing?
Example (Pardos et al., 2013) • Data set • ASSISTments system, formative assessment and learning software for math used by 60k students a year (Razzaq et al., 2007) • 810,000 data points from 229 students studied • Student actions in the software have been overlaid with synchronized field codes of student affect (boredom, frustration, etc.) • 3075 field codes • Each field code connects to 20 seconds of log file actions
Example (Pardos et al., 2013) • We will predict whether a student is bored at a specific time • So that we can replicate the human judgments without needing a field observer • We will predict this from what was going on in the log files at the time the field observation was made • We know every student action’s correctness, timing, relevant skill, and probability they knew the skill
Example (Pardos et al., 2013) • This is worth doing because boredom is known to predict student learning (Craig et al., 2004; Rodrigo et al., 2009; Pekrun et al., 2010) • And building a detector will help us study boredom more thoroughly • As well as enabling us to intervene on boredom in real time
Important Considerations • Is the problem genuinely important? (usable or publishable) • Is there a good measure of ground truth? (the variable you want to predict) • Do we have rich enough data to distill meaningful features? • Is there enough data to be able to take advantage of data mining?
What concerns you? • Data set • What variable will you predict? • What kind of variables will you use to predict it?
Data Set • Who here has a data set, but has concerns about it? • Who here doesn’t have a data set?
Data Set • Who here has a data set, but has concerns about it? • Who here doesn’t have a data set?
Data Set • Who here has a data set, but has concerns about it? • Who here doesn’t have a data set?
Potential Data Sources • ASSISTments (Neil Heffernan) • Genetics Tutor (Albert Corbett) • Inq-ITS (Janice Gobert) • Impulse (Elizabeth Rowe) • Refraction (Taylor Martin) • Grade data (Alex Bowers) • Mathemantics (Herb Ginsburg) • Vialogues (Gary Natriello, HuiSooChae) • STEPS (Chuck Kinzer, JoAnne Kleifgen) • Project LISTEN (Jack Mostow) • Chemistry Virtual Laboratory (David Yaron)
Potential Data Sources • Community College Course Data (CCRC) • International use of Scatterplot Tutor (me) • Zombie Division (Jake Habgood) • Virtual Performance Assessments (Jody Clarke-Midura) • EcoMUVE (Shari Metcalfe) • Reasoning Mind (George Khachatryan) • Newton’s Playground (Valerie Shute) • TC3-Sim (Robert Sottilare) • Course-taking and dropout data (Cristobal Romero)
Potential Data Sources • SQL-Tutor (TanjaMitrovic) • Project ARIES (Art Graesser) • Ecolab (GenaroRebolledo-Mendez) • Fractions Tutor (Vincent Aleven) • Help Tutor (Ido Roll) • InventionLab (Ido Roll) • BlueJ (Matt Jadud) • Aplusix (Jean-Francois Nicaud) • Second Life (Bruce Homer)
Procedure • Pick a data set • If I have it on hand, we talk right away • If not, I broker a conversation
What variable will you predict? • Something already directly labeled • Student was bored at 2:10:13 pm • Something indirectly labeled • Student had 15% overall learning gain • Something you can label with text replays • Student gamed the system while using learning system
What kind of variables will you use as predictors? • You don’t need to have specific ideas at this stage • The main question is, do you have the right kind of data to be able to do this at all?