Constructing Informative Prior Distributions from Domain Knowledge in Text Classification

Constructing Informative Prior Distributions from Domain Knowledge in Text Classification Graduate : Chen, Shao-Pei Authors : Aynur Dayanik, David D. Lewis, David Madigan, Vladimir Menkov, Alexander Genkin IGIR

Outline • Motivation • Objective • Methodology • Experimental Results • Conclusion

Motivation • In operational text classification settings, however, small training sets are the rule, due to the expense and inconvenience of labeling, or skepticism that efforts will be adequately repaid.

Objective • Using domain knowledge texts would greatly improve classifier effectiveness when few training examples are available, and not hurt effectiveness with large training sets.

Methodology Bayesian Logistic Regression Gaussian Priors Laplace Priors 5

Experimental Results 500 Random Example 5 Positive and 5 Random Example 5 Positive and 5 Closest Negative Examples

Conclusion We found large improvements in effectiveness, particularly when only small training sets are available.

Constructing Informative Prior Distributions from Domain Knowledge in Text Classification

Constructing Informative Prior Distributions from Domain Knowledge in Text Classification

Presentation Transcript

Text Classification

Prior Knowledge

Prior Knowledge!

Prior Knowledge

Prior Knowledge

CLASSIFICATION Prior Knowledge

Prior Knowledge

TEXT CLASSIFICATION

Prior knowledge

Text Classification

Clues From Picture + Prior Knowledge =INFERENCE

Text Classification

Text Classification

Constructing and Connecting Prior Knowledge

CONTENTS Prior knowledge Structure and classification Nomenclature

Prior Knowledge

Text Classification

Classification Text

Text Classification

TEXT CLASSIFICATION