330 likes | 440 Views
Multivariate Information Bottleneck. Noam Slonim Princeton University Lewis-Sigler Institute for Integrative Genomics Nir Friedman Naftali Tishby Hebrew University School of Computer Science and Engineering. Multivariate Information Bottleneck - Preview.
E N D
Multivariate Information Bottleneck Noam Slonim Princeton University Lewis-Sigler Institute for Integrative Genomics Nir Friedman Naftali Tishby Hebrew University School of Computer Science and Engineering .
Multivariate Information Bottleneck - Preview • A general framework for specifying a new family of • clustering problems • Almost all of these problems, are not treated by • standard clustering approaches • Insights and demonstrations why these problems are • important • A general optimal solution for all these problems, based • on a single Information Theoretic principle • Applications for text analysis, gene expression data • and more...
Multivariate IB – introduction • Second half starts here… • Maybe a temporary summary – a well defined method (formulated as a variational principle)… 3 different algorithmic approaches… however – it was limited for a specific optimization problem… but we could think of other problems (e.g. symmetric)… and in the following we will describe a lift-up of the first half for dealing with a much more rich family of problems… still work in progress… -Original IB: Compressing one variable while preserving the information about some other single variable
Multivariate IB – introduction(cont.) • However, we could think of other problems, • e.g. symmetric compression: Question: How to formulate and solve all such problems under one unifying principle?
(a few words about …)Bayesian Networks • A Bayes net over (X1,…,Xn) is a DAG G in which • vertices correspond to the random variables • P(X1,…,Xn) is consistent with G iff each Xi is • independent of all the other (non-descendant) • variables, given its parents Pai
Multi-information and Bayes nets • The information (X1,…,Xn) contains • about each other is captured by: -If P(X1,…,Xn) is consistent with G then:
New generalized formulation: Which in this case means: What predicts what What compresses what Constant Original IB through Bayes net formulation
P For a given DAG G, define: For P which is consistent with Gin: Alternative formulation: preliminaries Real multi-info in P(X,T) Multi-info as though P(X,T) is consistent with Gout
Beyond the original IB[Slonim, Friedman, Tishby] Input variables Input variables Parameters Gin dependencies (minimize) Gout dependencies (maximize) Compression (Bottleneck) variables
A simple example: Symmetric IB What compresses what What predicts what
A multivariate formal optimal solution -Where now d(Paj,tj) is a generalized (KL) distortion measure… - For example, in symmetric IB:
W1 W2 W3 W4 W5 ................ WN W1 W2 W3,W4 W5 .......... WN W1,W2...WN Multivariate IB algorithms – example for aIB[Slonim, Friedman, Tishby, 2002] -Which pair to merge? W1 W2 W3 W4 W5 ................ WN W1 W2 W3,W4 W5 .......... WN W1,W2...WN -Where now is a generalized (JS) distortion measure… - For example, in symmetric aIB:
Symmetric aIB compression: documents, words • Accuracy of symmetric aIB vs. original aIB • over 3 small datasets: Word clusters provide a more robust representation…
Symmetric IB through Deterministic Annealing Data: 20,000 messages from 20 different discussion groups [Lang, 95] W – a word in the corpus C – the class (newsgroup) of the message P(W=‘bible’,C=‘alt.atheism’): Probability that choosing a random position in the corpus would select the word ‘bible’ in a message of the newsgroup (class) ‘alt.atheism’… Words Classes
Symmetric IB through Deterministic Annealing Word Newsgroup
x file image encryption window dos mac … car turkish game team jesus gun hockey … comp.* misc.forsale sci.crypt sci.electronics alt.atheism rec.autos rec.motorcycles rec.sport.* sci.med sci.space soc.religion.christian talk.politics.* Symmetric IB through Deterministic Annealing Word Newsgroup P(TC,TW)
Symmetric IB through Deterministic Annealing word Newsgroup P(TC,TW)
Symmetric IB through Deterministic Annealing Word Newsgroup P(TC,TW)
Symmetric IB through Deterministic Annealing Word Newsgroup P(TC,TW)
Symmetric IB through Deterministic Annealing Word atheists christianity jesus bible sin faith … alt.atheism soc.religion.christian talk.religion.misc Newsgroup P(TC,TW)
Symmetric aIB compression: genes, samples Data: Gene expression of 500 “informative” genes Vs. 72 Leukemia samples (Golub et al, 1999) Genes Samples
Symmetric aIB compression: genes, samples Data after symmetric aIB compression: 8 Sample clusters X00437_s_at M12886_at X76223_s_at M59807_at U23852_s_at D00749_s_at U89922_s_at X03934_at U50743_at M21624_at M28826_at M37271_s_at X59871_at X14975_at M16336_s_at L05148_at M28825_at 10 Gene clusters ALL B-cell hosp1 ALL B-cell hosp1 ALL T-cell hosp1 Male BM B-cell BM B-cell AML AML hosp2 AML hosp3
Another example: parallel IB • Consider a document collection with different • topics, and different writing styles: topic4 topic1 topic4 topic1 topic4 topic2 topic3 topic2 topic1 topic2 topic3 topic3 topic4 topic1 topic3 topic1 topic2 Science topic4 topic1 topic2 topic1 topic1 topic4 topic3
Another example: parallel IB (cont.) • One possible “legitimate” partition • is by the topic: Topic1 Topic2 Topic3 Topic4 topic1 topic2 topic3 topic4 topic1 topic2 topic3 topic4 topic1 topic2 topic3 topic4 topic1 topic2 topic3 topic4 topic1 topic2 topic3 topic4 topic2 topic3 topic4 topic3 topic3
topic2 topic2 topic4 topic1 topic1 topic3 topic3 topic1 topic2 topic1 topic3 topic4 topic3 topic1 topic4 topic1 Another example: parallel IB (cont.) • And another possible “legitimate” partition • is by the writing style: Style1 Style2 Style3 topic4 topic1 topic2 topic3 topic1 topic3 topic2 topic4 topic4 There might be more than one “legitimate” partition…
Parallel IB: solution Minimize dependencies Maximize dependencies Effective distortion:
T1,a T1,b T2,a T2,b The Beasts of Tarzan 315 2 315 2 Burroughs The Gods of Mars 407 0 1 406 The Jungle Book 0 255 254 1 Kipling Rewards and Fairies 0 367 42 325 Parallel sIB: Text analysis results • Data: ~1,500 “documents” taken from • E. R. Burroughs: The Beasts of Tarzan & The Gods of Mars • R. Kipling: The Jungle Book & Rewards and Fairies - X1 corresponds to “documents”, X2 corresponds to words
T1,a T1,b T2,a T2,b T3,a T3,b T4,a T4,b AML 23 2 14 11 12 13 13 12 ALL 0 47 37 10 9 38 22 25 B-cell 0 38 37 1 6 32 20 18 T-cell 0 9 0 9 3 6 2 7 <PS> .64 .72 .71 .66 .53 .76 .70 .69 Parallel sIB :Gene Expression data results - Data: Gene expression of 500 “informative” genes Vs. 72 Leukemia samples (Golub et al, 1999) - X1 corresponds to samples, X2 corresponds to genes
Xp Xm Xn Tp Tn Another Example: Triplet IB • Consider the following sequence data: s(1) s(2) s(3) … s(t-1) s(t) s(t+1) … • Can we extract features s.t. their combination is • informative about a symbol between them?
Minimize dependencies Maximize dependencies Triplet IB: solution
1st word in triplet Xp 2nd word in triplet Xm 3rd word in triplet Xn “… As Tarzan ascended the platform his eyes narrowed angrily at the sight which met them… ‘’What means this?” he cried angrily…” (E. R. Burroughs, “Tarzan the Terrible”) Triplet IB Data Data: Tarzan and the Jewels of Opar, Tarzan of the Apes, Tarzan the Terrible, Tarzan the Untamed, The Beasts of Tarzan, The Jungle Tales of Tarzan, The Return of Tarzan Xm= {apemans, apes, eyes, girl, great, jungle, tarzan, time, two, way} Joint distribution P(Xp,Xm,Xn) of dimension 90 x 10 x 233
Xm = argmax P( xm’ | xp,xn ) Precision (%) Recall (%) Xm Tp, Tn Xp, Xn Tp, Tn Xp, Xn Apes(78) 43% 26% 17% 14% Eyes (177) 83% 81% 32% 28% Girl (240) 43% 30% 5% 1% Great (219) 92% 92% 50% 48% Jungle (241) 49% 54% 27% 24% Tarzan (48) 41% 67% 40% 25% Time (145) 70% 82% 48% 26% Two (148) 41% 92% 11% 8% Way (101) 60% 81% 28% 21% Average 53% 55% 28% 22% Triplet sIB: Text analysis results - Given Xpand Xn, two schemes to predict middle word: Xm = argmax P( xm’ | tp,tn ) - Test on a NEW sequence, “The son of Tarzan”:
Summary • The IB method is a principled framework, for extracting • “informative” structure out of a joint distribution P(X1,X2). • The Multivariate IB extends this framework to extract • “informative” structure from more complex joint • distributions, P(X1,…,Xn), in various ways. • This enables us to define and solve a new family of • optimization problems, under a single unifying • Information Theoretic principle. • “Clustering” conceals a family of distinct problems which deserve • special consideration. The multivariate IB framework enables to define • these sub-problems, solve them, and demonstrate their importance. - References: www.cs.huji.ac.il/~noamm