200 likes | 315 Views
Active Learning to Classify Email. 4/22/05. What’s the problem?. How will I ever sort all these new emails?. What’s the problem?. To get an idea of what mail I have gotten, I will need to sort these new messages.
E N D
Active Learningto Classify Email 4/22/05
What’s the problem? • How will I ever sort all these new emails?
What’s the problem? • To get an idea of what mail I have gotten, I will need to sort these new messages. • A great solution would be if I could sort just a few and my computer could sort the rest for me. • To make it really accurate, the assistant could even pick which messages I should manually sort, so that it can learn to do the best job possible. (Active Learning)
What’s the solution? • To solve this problem, we need a way to choose the most informative training examples. • This requires some way of sorting emails by how informative they are for classification.
Email Classification • So, what do we know about email classification? • SVM and Naïve Bayes significantly outperform many other methods (Brutlag 2000, Kiritchenko 2001) • Both SVM and Naïve Bayes are suitable for “online” learning required for solving this problem effectively. (Cauwenberghs 2000) • Classifier accuracy varies more between users than between algorithms. (Kiritchenko 2001) • SVM performs better for users with more email in each folder. (Brutlag 2000) • Users with more email, such as in our example problem, tend to have more email in each folder than other users. (Klimt 2004) • Thus, we have chosen SVM as the basis for this research.
“Bag-of-Words” Model classification decision email data “bag of words” SVM
Multiple SVMs • Using separate SVMs for each section LLSF classification decision email data SVMs
Active Learning with SVM • In general, examples closer to the decision boundary hyperplane will cause larger displacement of that boundary. (Schohn and Cohn 2000, Tong 2001)
Labeling the closer example: Labeling the farther example: What if our prediction is right?
Picking the closer example: Picking the farther example: And if our prediction is wrong?
Incorporating Diversity • In this example, the instance near the top is intuitively more likely to be informative. • This is known as “diversity” (Brinker 2003).
Active Learning with SVM • But what about when you have multiple SVMs (like one-vs-rest)? (Yan 2003)
The Enron Corpus • 150+ users • 200,000 emails
Initial Results • Trained on 10%, Tested on 90%
Chrono-Diverse Algorithm • The way a user sorts email changes over time. • Pick training data that are maximally different from previous data with respect to time.
Combination Algorithm • Combine strengths of Standard and Chrono-Diverse. • Take a weighted combination of their results. • Adjust weighting with parameter lambda.
Results • Trained on 10%, Tested on 90%
Conclusions • State-of-the-art algorithm for active learning with text classification performs horribly on email data! • Choosing emails for time diversity works very well. • Combining the two works best.
Future Work • Improve the efficiency of SVM or find a better alternative • Determine when using chronological diversity performs best and worst • Adapt the algorithm to online classification