1 / 15

Maximum Likelihood and the Information Bottleneck

Maximum Likelihood and the Information Bottleneck. By Noam Slonim & Yair Weiss 2/19/2003. Overview. Main contribution Defines mapping: ML of mixture models to iterative IB Under some initial conditions, an algorithm for one gives a solution for the other. Theoretical and practical concern

leda
Download Presentation

Maximum Likelihood and the Information Bottleneck

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Maximum Likelihood and the Information Bottleneck By Noam Slonim & Yair Weiss 2/19/2003

  2. Overview • Main contribution • Defines mapping: ML of mixture models to iterative IB • Under some initial conditions, an algorithm for one gives a solution for the other. • Theoretical and practical concern • ML “ideal” vs. IB “real” • Using opposite algorithm could improve performance

  3. IB Intuition & Review • Given r.v. X and Y w/ joint p(x,y) • Rerepresent X with clusters T that preserve information about Y • Find compressed representation T of X with mapping q(t|x) • choice of q(t\x) must minimize the IB-Functional: • |T| and fixed • minimizing I(T;X) maximizes compression • maximizing I(T;Y) minimizes distortion

  4. IB Review • Additionally, given • so • From prev. paper to minimize • Use initial to get q(t), q(y|t) and iterate

  5. ML for mixture models • Generative process: • Y generated by multinomial distribution • choose t to maximize this probability • but we don’t know • We don’t have p(x,y) either, just samples n(x,y) • Use EM to find , that maximizes the likelihood of seeing n(x,y) with t’s

  6. EM • Iterative algorithm to compute ML • E step • denote as • set • k(x) normalization factor,

  7. EM con’t • M step • set • set • Alternative free energy version:

  8. ML IB mapping • Fairly straightforward mapping • , • Since we can’t map the corresponding parameter distributions directly, we do this mapping then an M-step or IB-step.

  9. Observations • When X uniformly distributed, mapping is equiv. to direct mapping of parameter distributions. • M-step and IB-step mathematically equivalent • When X uniform, EM is equiv to IB iterative algorithm with r = |X|. • equivalence of E-step to IB step setting q(t |x). • since

  10. Main Equivalence Claims • When X uniform and r = |X|, all fixed points of the likelihood L are fixed points of with • at the fixed points, • Any algorithm that finds a fixed point of L induces a fixed point of . If more than one the one that maximizes L minimizes

  11. Claims (2) • For or all the fixed points of L are mapped to the fixed points of • again, at the fixed points • Again any algorithm that finds one induces one for the other domain.

  12. Simulations • How do we know when N or is large enough to use the mapping? • Empirical validation: • Newsgroup clustering experiment • |X|=500 documents, |Y|=2000 words, |T|=10 groupings • N=43433 occurrences in one set, N=2171 in pruned set

  13. Simulation results • At small values of N the differences are more prominent

  14. Discussion • At higher values of N, EM can converge to a smaller value of after the mapping, and vice versa. • Mentions alternative formulation for IB where we minimize the KL distance between and the family of distributions for which the mixture model assumption holds. • For smaller sample size, the freedom of choosing in IB seems beneficial

  15. Conclusion • Interesting reformulation of IB in the standard mixture model setting for clustering. • Interesting theoretical results with possible practical advantages for mapping from one to the other.

More Related