Data Annotation for Classification

Data Annotation for Classification

Prediction • Develop a model which can infer a single aspect of the data (predicted variable) from some combination of other aspects of the data (predictor variables) • Which students are off-task? • Which students will fail the class?

Classification • Develop a model which can infer a categorical predicted variable from some combination of other aspects of the data • Which students will fail the class? • Is the student currently gaming the system? • Which type of gaming the system is occurring?

We will… • We will go into detail on classification methods tomorrow

In order to use prediction methods • We need to know what we’re trying to predict • And we need to have some labels of it in real data

For example… • If we want to predict whether a student using educational software is off-task, or gaming the system, or bored, or frustrated, or going to fail the class… • We need to first collect some data • And within that data, we need to be able to identify which students are off-task (or the construct of interest), and ideally when

So we need to label some data • We need to obtain outside knowledge to determine what the value is for the construct of interest

In some cases • We can get a gold-standard label • For instance, if we want to know if a student passed a class, we just go ask their instructor

But for behavioral constructs… • There’s no one to ask • We can’t ask the student (self-presentation) • There’s no gold-standard metric • So we use data labeling methods or observation methods • (e.g. quantitative field observations, video coding) • To collect bronze-standard labels • Not perfect, but good enough

One such labeling method • Text replay coding

Text replays • Pretty-prints of student interaction behavior from the logs

Examples

Sampling • You can set up any sampling schema you want, if you have enough log data • 5 action sequences • 20 second sequences • Every behavior on a specific skill, but other skills omitted

Sampling • Equal number of observations per lesson • Equal number of observations per student • Observations that machine learning software needs help to categorize (“biased sampling”)

Major Advantages • Both video and field observations hold some risk of observer effects • Text replays are based on logs that were collected completely unobtrusively

Major Advantages • Blazing fast to conduct • 8 to 40 seconds per observation

Notes • Decent inter-rater reliability is possible(Baker, Corbett, & Wagner, 2006)(Baker, Mitrovic, & Mathews, 2010)(Sao Pedro et al, 2010)(Montalvo et al, 2010) • Agree with other measures of constructs(Baker, Corbett, & Wagner, 2006) • Can be used to train machine-learned detectors(Baker & de Carvalho, 2008) (Baker, Mitrovic, & Mathews, 2010) (Sao Pedro et al, 2010)

Major Limitations • Limited range of constructs you can code • Gaming the System – yes • Collaboration in online chat – yes(Prata et al, 2008) • Frustration, Boredom – sometimes • Off-Task Behavior outside of software – no • Collaborative Behavior outside of software – no

Major Limitations • Lower precision (because lower bandwidth of observation)

Hands-on exercise

Find a partner • Could be your project team-mate, but doesn’t have to be • You will do this exercise with them

Get a copy of the text replay software • On your flash drive • Or at http://www.joazeirodebaker.net/algebra-obspackage-LSRM.zip

Skim the instructions • At Instructions-LSRM.docx

Log into text replay software • Using exploratory login • Try to figure out what the student’s behavior means, with your partner • Do this for ~5 minutes

Now pick a category you want to code • With your partner

Now code data • According to your coding scheme • (is-category versus is-not-category) • Separate from your partner • For 20 minutes

Now put your data together • Using the observations-NAME files you obtained • Make a table (in excel?) showing

Now… • We can compute your inter-rater reliability… (also called agreement)

Agreement/ Accuracy • The easiest measure of inter-rater reliability is agreement, also called accuracy # of agreements total number of codes

Agreement/ Accuracy • There is general agreement across fields that agreement/accuracy is not a good metric • What are some drawbacks of agreement/accuracy?

Agreement/ Accuracy • Let’s say that Tasha and Uniqua agreed on the classification of 9200 time sequences, out of 10000 actions • For a coding scheme with two codes • 92% accuracy • Good, right?

Non-even assignment to categories • Percent Agreement does poorly when there is non-even assignment to categories • Which is almost always the case • Imagine an extreme case • Uniqua (correctly) picks category A 92% of the time • Tasha always picks category A • Agreement/accuracy of 92% • But essentially no information

An alternate metric • Kappa (Agreement – Expected Agreement) (1 – Expected Agreement)

Kappa • Expected agreement computed from a table of the form

Kappa • Expected agreement computed from a table of the form • Note that Kappa can be calculated for any number of categories (but only 2 raters)

Cohen’s (1960) Kappa • The formula for 2 categories • Fleiss’s (1971) Kappa, which is more complex, can be used for 3+ categories • I have an Excel spreadsheet which calculates multi-category Kappa, which I would be happy to share with you

Expected agreement • Look at the proportion of labels each coder gave to each category • To find the number of agreed category A that could be expected by chance, multiply pct(coder1/categoryA)*pct(coder2/categoryA) • Do the same thing for categoryB • Add these two values together and divide by the total number of labels • This is your expected agreement

Example

Example • What is the percent agreement?

Example • What is the percent agreement? • 80%

Example • What is Tyrone’s expected frequency for on-task?

Example • What is Tyrone’s expected frequency for on-task? • 75%

Example • What is Pablo’s expected frequency for on-task?

Example • What is Pablo’s expected frequency for on-task? • 65%

Data Annotation for Classification

Data Annotation for Classification

Presentation Transcript

Data Mining: Classification

Simultaneous Image Classification and Annotation

Efficient classification for metric data

Data Annotation using Human Computation

Data Classification

1.2 Data Classification

Linked Life Data for annotation of Medline

Efficient classification for metric data

Systematization of Crowdsoucing for Data Annotation

Data Mining Classification:

Scientific Data Annotation and Analysis

EPL660: DATA CLASSIFICATION

Community Data Annotation/Curation

1.2 Data Classification

DATA CLASSIFICATION

Annotation of Mass Spectrum Data

Data Classification

Developing annotation solutions for online data-driven learning

Seclore Data Classification

Data Mining: Classification

Data Annotation Tools

Data Annotation Tools Market