270 likes | 481 Views
Multivariate Information Bottleneck. Nir Friedman Ori Mosenzon Noam Slonim Naftali Tishby Hebrew University. Statistics. Data Analysis. Population. Information Bottleneck. Bachlor’s degree. Some college. Cluster “age” clusters that are predictive of education level?. High school.
E N D
Multivariate Information Bottleneck Nir Friedman Ori Mosenzon Noam Slonim Naftali Tishby Hebrew University .
Statistics Data Analysis Population
Information Bottleneck Bachlor’s degree Some college Cluster “age” clusters that are predictive of education level? High school None PHD 17 19 24 29 34 39 44 49 54 59 64 69 74
Information Bottleneck Bachlor’s degree Some college Cluster “age” clusters that are predictive of education level? Also cluster education attained to be predictive of age? High school None PHD 17 19 24 29 34 39 44 49 54 59 64 69 74
Our contribution Generalize Information Bottleneck: • Generic principle for specifying systems of interacting clusters • Characterization of the solution for these specs • General purpose methods for constructing solutions
P(A,B) P(T|A) Tradeoff P(T,B) Minimize: I(T;A) - I(T;B) Compression Information lost about A Preserved information about B Information Bottleneck[Tishby, Peirera & Bialek 99] Soft clustering
B B A A T Input parameters T Gout Gin Desired independencies Actual Distribution Information Bottleneck Reexamined
TA TB A B Gout Example: Symmetric Bottleneck A B Simultaneous clustering of both A and B • P(TA|A) • P(TB|B) TA TB Gin So that • TA captures the information A contain about B • TB captures the information B contain about A
… X1 X2 Xn … T1 Tk General Principle Input: • P(X1,…,Xn) • Gin - Compression • Tj clusters values of paj • Gout - Desired (conditional) independencies Goal: • Find P(Tj|paj) in Gin to “match” Gout
Multi-information Multi-information • Information random variables jointly contain about each other • Generalizes mutual information
Graph Projection P Let G be a DAG Define: Distributions consistent with G All possible distributions
Proposition: Graph Projection P Let G be a DAG Define: Multi-info as thoughP is consistent with G Real multi-info
Multi-information & Bayesian Networks Proposition: If P is consistent with G Then Define Sum of local interactions
Optimizing Criteria Two goals: • Lose info wrt Gin • Attain conditional independencies in Gout Optimization objective: Force clusters to compress Minimize violationsof conditional indep. in Gout
Maximize information in Gout Minimize information in Gin Additional Interpretation Using properties of we can rewrite Thus, we can instead minimize
Recall Parameters we can control Parameters we can control Minimization Objective - Example Symmetric Bottleneck TA TB A B Gout Gin TA TB A B Input (fixed)
d(tj,paj) - measure of “distortion” between tj and paj For example in symmetric bottleneck: Characterization of Solutions Thm: Minimal point if and only if
Finding Solutions How can we find solutions? Asynchronous update • Pick an index j • Update P(Tj|paj) Theorem • Asynchronous updates converge to (local) minima
Example - 20 newsgroup • 20,000 messages from 20 news group [Lang 1995] • A - newsgroup of the message • B - word in the message P(a,b) - probability that choosing a random position in the corpus would select • word b in a message in newsgroup a • We applied symmetric bottleneck on both attributes
20 Newsgroup: Symmetric Bottleneck word Newsgroup
x file image encryption window dos mac … car turkish game team jesus gun hockey … comp.* misc.forsale sci.crypt sci.electronics alt.atheism rec.autos rec.motorcycles rec.sport.* sci.med sci.space soc.religion.christian talk.politics.* 20 Newsgroup: Symmetric Bottleneck word Newsgroup P(TD,TW)
20 Newsgroup: Symmetric Bottleneck word Newsgroup P(TD,TW)
20 Newsgroup: Symmetric Bottleneck word Newsgroup P(TD,TW)
20 Newsgroup: Symmetric Bottleneck word Newsgroup P(TD,TW)
20 Newsgroup: Symmetric Bottleneck word atheists christianity jesus bible sin faith … alt.atheism soc.religion.christian talk.religion.misc Newsgroup P(TD,TW)
Ask about relation to EM [Charniak 2001] Discussion General framework: • Defines a new family of optimization problems … and solutions Future directions: • Additional algorithms - agglomerative solutions • Relation to generative models • Parametric constraints in Gout
T1 T2 A B A B T1 T2 Gout Gin Example: Parallel Bottleneck