1 / 29

Text classification: In Search of a Representation

Text classification: In Search of a Representation. Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca. Outline . Supervised learning=classification ML/DM at U of O Classical approach Attempt at a linguistic representation

amato
Download Presentation

Text classification: In Search of a Representation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa stan@site.uottawa.ca

  2. Outline • Supervised learning=classification • ML/DM at U of O • Classical approach • Attempt at a linguistic representation • N-grams – how to get them? • Labelling and co-learning • Next steps?…

  3. Supervised learning (classification) Given: • a set of training instances T={et}, where each t is a class label : one of the classes C1,…Ck • a concept with k classes C1,…Ck (but the definition of the concept is NOT known) Find: • a description for each class which will perform well in determining (predicting) class membership for unseen instances

  4. Classification • Prevalent practice: examples are represented as vectors of values of attributes • Theoretical wisdom, confirmed empirically: the more examples, the better predictive accuracy

  5. ML/DM at U of O • Learning from imbalanced classes: applications in remote sensing • a relational, rather than propositional representation: learning the maintainability concept • Learning in the presence of background knowledge. Bayesian belief networks and how to get them. Appl to distributed DB

  6. Why text classification? • Automatic file saving • Internet filters • Recommenders • Information extraction • …

  7. Text classification: standard approach • Remove stop words and markings • remaining words are all attributes • A document becomes a vector <word, frequency> • Train a boolean classifier for each class • Evaluate the results on an unseen sample Bag of words

  8. Text classification: tools • RIPPER A “covering”learner Works well with large sets of binary features • Naïve Bayes Efficient (no search) Simple to program Gives “degree of belief”

  9. “Prior art” • Yang: best results using k-NN: 82.3% microaveraged accuracy • Joachim’s results using Support Vector Machine + unlabelled data • SVM insensitive to high dimensionality, sparseness of examples

  10. SVM in Text classification SVM Training with 17 examples in 10 most frequent categories gives test performance of 60% on 3000+ test cases available during training Transductive SVM Maximum separation Margin for test set

  11. Problem 1: aggressive feature selection

  12. Problem 2: semantic relationships are missed

  13. Proposed solution (Sam Scott) • Get noun phrases and/or key phrases (Extractor) and add to the feature list • Add hypernyms

  14. Hypernyms - WordNet

  15. Evaluation (Lewis) • Vary the “loss ratio” parameter • For each parameter value • Learn a hypothesis for each class (binary classification) • Micro-average the confusion matrices (add component-wise) • Compute precision and recall • Interpolate (or extrapolate) to find the point where micro- averaged precision and recall are equal

  16. Results No gain over BW in alternative representations But… Comprehensibility…

  17. Combining classifiers Comparable to best known results (Yang)

  18. Other possibilities • Using hypernyms with a small training set (avoids ambiguous words) • Use Bayes+Ripper in a cascade scheme (Gama) • Other representations:

  19. Collocations • Do not need to be noun phrases, just pairs of words possibly separated by stop words • Only the well discriminating ones are chosen • These are added to the bag of words, and… • Ripper

  20. N-grams • N-grams are substrings of a given length • Good results in Reuters [Mladenic, Grobelnik] with Bayes; we try RIPPER • A different task: classifying text files Attachments Audio/video Coded • From n-grams to relational features

  21. How to get good n-grams? We use Ziv-Lempel for frequent substring detection (.gz!) abababa a b a a b b a

  22. N-grams • Counting • Pruning: substring occurrence ratio < acceptance threshold • Building relations: string A almost always precedes string B • Feeding into relational learner (FOIL)

  23. Using grammar induction (text files) • Idea: detect patterns of substrings • Patterns are regular languages • Methods of automata induction: a recognizer for each class of files • We use a modified version of RPNI2 [Dupont, Miclet]

  24. What’s new… • Work with marked up text (Word, Web) • XML with semantic tags: mixed blessing for DM/TM • Co-learning • Text mining

  25. Co-learning • How to use unlabelled data? Or How to limit the number of examples that need be labelled? • Two classifiers and two redundantly sufficient representations • Train both, run both on test set, • add best predictions to training set

  26. Co-learning • Training set grows as… • …each learner predicts independently due to redundant sufficiency (different representations) • would also work with our learners if we used Bayes? • Would work with classifying emails

  27. Co-learning • Mitchell experimented with the task of classifying web pages (profs, students, courses, projects) – a supervised learning task • Used Anchor text Page contents • Error rate halved (from 11% to 5%)

  28. Cog-sci? • Co- learning seems to be cognitively justified • Model: students learning in groups (pairs) • What other social learning mechanisms could provide models for supervised learning?

  29. Conclusion • A practical task, needs a solution • No satisfactory solution so far • Fruitful ground for research

More Related