1 / 27

Multivariate Information Bottleneck

Multivariate Information Bottleneck. Nir Friedman Ori Mosenzon Noam Slonim Naftali Tishby Hebrew University. Statistics. Data Analysis. Population. Information Bottleneck. Bachlor’s degree. Some college. Cluster “age” clusters that are predictive of education level?. High school.

kacy
Download Presentation

Multivariate Information Bottleneck

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Multivariate Information Bottleneck Nir Friedman Ori Mosenzon Noam Slonim Naftali Tishby Hebrew University .

  2. Statistics Data Analysis Population

  3. Information Bottleneck Bachlor’s degree Some college Cluster “age” clusters that are predictive of education level? High school None PHD 17 19 24 29 34 39 44 49 54 59 64 69 74

  4. Information Bottleneck Bachlor’s degree Some college Cluster “age” clusters that are predictive of education level? Also cluster education attained to be predictive of age? High school None PHD 17 19 24 29 34 39 44 49 54 59 64 69 74

  5. Our contribution Generalize Information Bottleneck: • Generic principle for specifying systems of interacting clusters • Characterization of the solution for these specs • General purpose methods for constructing solutions

  6. P(A,B) P(T|A) Tradeoff P(T,B) Minimize: I(T;A) - I(T;B) Compression Information lost about A Preserved information about B Information Bottleneck[Tishby, Peirera & Bialek 99] Soft clustering

  7. B B A A T Input parameters T Gout Gin Desired independencies Actual Distribution Information Bottleneck Reexamined

  8. TA TB A B Gout Example: Symmetric Bottleneck A B Simultaneous clustering of both A and B • P(TA|A) • P(TB|B) TA TB Gin So that • TA captures the information A contain about B • TB captures the information B contain about A

  9. X1 X2 Xn … T1 Tk General Principle Input: • P(X1,…,Xn) • Gin - Compression • Tj clusters values of paj • Gout - Desired (conditional) independencies Goal: • Find P(Tj|paj) in Gin to “match” Gout

  10. Multi-information Multi-information • Information random variables jointly contain about each other • Generalizes mutual information

  11. Graph Projection P Let G be a DAG Define: Distributions consistent with G All possible distributions

  12. Proposition: Graph Projection P Let G be a DAG Define: Multi-info as thoughP is consistent with G Real multi-info

  13. Multi-information & Bayesian Networks Proposition: If P is consistent with G Then Define Sum of local interactions

  14. Optimizing Criteria Two goals: • Lose info wrt Gin • Attain conditional independencies in Gout Optimization objective: Force clusters to compress Minimize violationsof conditional indep. in Gout

  15. Maximize information in Gout Minimize information in Gin Additional Interpretation Using properties of we can rewrite Thus, we can instead minimize

  16. Recall Parameters we can control Parameters we can control Minimization Objective - Example Symmetric Bottleneck TA TB A B Gout Gin TA TB A B Input (fixed)

  17. d(tj,paj) - measure of “distortion” between tj and paj For example in symmetric bottleneck: Characterization of Solutions Thm: Minimal point if and only if

  18. Finding Solutions How can we find solutions? Asynchronous update • Pick an index j • Update P(Tj|paj) Theorem • Asynchronous updates converge to (local) minima

  19. Example - 20 newsgroup • 20,000 messages from 20 news group [Lang 1995] • A - newsgroup of the message • B - word in the message P(a,b) - probability that choosing a random position in the corpus would select • word b in a message in newsgroup a • We applied symmetric bottleneck on both attributes

  20. 20 Newsgroup: Symmetric Bottleneck word  Newsgroup

  21. x file image encryption window dos mac … car turkish game team jesus gun hockey … comp.* misc.forsale sci.crypt sci.electronics alt.atheism rec.autos rec.motorcycles rec.sport.* sci.med sci.space soc.religion.christian talk.politics.* 20 Newsgroup: Symmetric Bottleneck word  Newsgroup P(TD,TW)

  22. 20 Newsgroup: Symmetric Bottleneck word  Newsgroup P(TD,TW)

  23. 20 Newsgroup: Symmetric Bottleneck word  Newsgroup P(TD,TW)

  24. 20 Newsgroup: Symmetric Bottleneck word  Newsgroup P(TD,TW)

  25. 20 Newsgroup: Symmetric Bottleneck word atheists christianity jesus bible sin faith …  alt.atheism soc.religion.christian talk.religion.misc Newsgroup P(TD,TW)

  26. Ask about relation to EM [Charniak 2001] Discussion General framework: • Defines a new family of optimization problems … and solutions Future directions: • Additional algorithms - agglomerative solutions • Relation to generative models • Parametric constraints in Gout

  27. T1 T2 A B A B T1 T2 Gout Gin Example: Parallel Bottleneck

More Related