1 / 9

The Power of Word Clusters for Text Classification

The Power of Word Clusters for Text Classification. Noam Slonim and Naftali Tishby. Presented by: Yangzhe Xiao. Word-clusters vs words Reduced feature dimensionality. More robust. 18% increase in accuracy.

helga
Download Presentation

The Power of Word Clusters for Text Classification

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. The Power of Word Clusters for Text Classification Noam Slonim and Naftali Tishby Presented by: Yangzhe Xiao

  2. Word-clusters vs words • Reduced feature dimensionality. • More robust. • 18% increase in accuracy. • Challenge: Group similar words into word-clusters that preserve the information about document categories. --Information Bottleneck (IB) Method.

  3. IB method is based on following idea: Given the empirical joint distribution of two variables, one variable is compressed so that the mutual information about the other variable is preserved as much as possible. • find clusters of the members of the set X, denoted here by , such that the mutual information I( ;Y) is maximized, under a constraint on the information extracted from X, I ( ;X).

  4. The problem has optimal formal solution without any assumption about the origin of the joint distribution p(x,y).

  5. Kullback-Leibler divergence between the conditional distributions p(y|x) and Z(β,x) is a normalization factor. Single positive β determines the softness of the classification.

  6. Agglomerative IB Algorithm

  7. Agglomerative IB Algorithm

  8. Normalized information curves for all 10 iterations in large and small sample sizes

More Related