200 likes | 332 Views
“Ideal Parent” Structure Learning. Gal Elidan with Iftach Nachman and Nir Friedman. School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel. Variables. Data. S. C. E. S. C. S. C. E. D. D. E. 1. Consider local changes. D. 2. Score each candidate. S.
E N D
“Ideal Parent” Structure Learning Gal Elidan with Iftach Nachman and Nir Friedman School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel
Variables Data S C E S C S C E D D E 1 Consider local changes D 2 Score each candidate S C S C E E 3 Apply best modification D D Learning Structure Input: Output: Instances Init: Start with initial structure -17.23 -23.13 • The “Ideal Parent” Approach • Approximate improvements of changes (fast) • Optimize & score promising candidates (slow) Problems: • Need to score many candidates • Each one requires costly parameter optimization Structure learning is often impractical -19.19
A B D C P(E| C) E C E Linear Gaussian Networks
U X Pred(X|U) Instances The “Ideal Parent” Idea Goal: Score only promising candidates Parent Profile Child Profile
Z1 U Z2 Z3 X potential parents Z4 Instances Instances The “Ideal Parent” Idea Goal: Score only promising candidates Step 1:Compute optimalhypothetical parent Step 2:Search for“similar” parent Ideal Profile Parent Profile Y Child Profile Pred(X|U) Pred(X|U,Y)
Ideal Profile Z1 Y U Z2 Z3 X potential parents Pred(X|U,Y) Z4 Instances Instances The “Ideal Parent” Idea Goal: Score only promising candidates Step 1:Compute optimalhypotheticalparent Step 2:Search for“similar” parent Step 3:Add new parentand optimize parameters Parent(s) Profile Z2 Child Profile Predicted(X|U,Z)
U Z U Likelihood of Likelihood of Y X X fy,z Z Choosing the best parent Z • Our goal: Choose Z that maximizes We define: Theorem: likelihood improvement when only z is optimized
C2 Similarity C1 Similarity score score Similarity vs. Score effect of fixed variance is large We now have an efficient approximation for the score • C2 is more accurate • C1 will be useful later
S C E S C D E -17.23 D -23.13 S C S C E E -19.19 D D Ideal Parent in Search • Structure search involvesO(N2) Add parentO(NE) Replace parentO(E) Delete parentO(E) Reverse edge • Vast majority of evaluations are replaced by ideal approximation • Only K candidates per family are optimized and scored
0.2 4 Amino Metabolism Conditions (AA) 3 Conditions (Met) 0.1 test -log-likelihood speedup 2 1 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 K K K Gene Expression Experiment 4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (2xConditions) variables greedy 0.4%-3.6% changes evaluated Speedup:1.8-2.7
Chemical Reaction Linear Gaussian Sigmoid Gaussian Scope Conditional probability distribution (CPD) of the form link function white noise General requirement: g(U) be any invertible (w.r.t ui) function
1 1 g(z) 0.5 g(z) Z 0 0 2 2 X Exact P(X=0.5|Z) Approx P(X=0.85|Z) 0 -4 -2 0 2 4 0 -4 -2 0 2 4 Z Sensitivity to Z depends on gradient of specific instance -4 -4 -2 -2 0 0 2 2 4 4 Sigmoid Gaussian CPD Solution: Problem:No simple form for similarity measures X = 0.85 X = 0.5 Y(0.5) Y(0.85) Linear approximation around Y=0 Z Likelihood Likelihood Z
3.52 3.37 2.31 2.26 1.1 1.15 Z x 0.1275 (g0.85) -0.11 0.04 -1.85 -0.64 0.58 1.79 Z x 0.25 (g0.5) -0.86 -0.3 0.27 0.83 Sigmoid Gaussian CPD Equi-Likelihood Potential After gradient correction Z (X=0.85) We can now use the same measure Z (X=0.5)
Amino Metabolism 100 Conditions (AA) 0.1 Conditions (Met) test -log-likelihood speedup 60 0 20 -0.1 0 0 5 5 10 10 15 15 20 20 K K Sigmoid Gene Expression 4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (Conditions) variables greedy • 2.2%-6.1% moves evaluated • 18-30 times faster
Y1 X1 Y2 X2 Y3 X3 Y4 X1 X2 X4 X4 Y5 X5 Adding New Hidden Variables Idea: Introduce hidden parent for nodes with similar ideal profiles Idea Profile H Instances For the Linear Gaussian case: Challenge: Find that maximizes this bound
Rayleigh quotient of the matrix and . Scoring a parent • where is the matrix whose columns are • must lie in the span of • is the eigenvector with largest eignevalue • Setting and using the above (with A invertible) Finding h* amounts to solving an eigenvector problem where |A|=size of cluster
X1 X2 X3 X4 X1 3.11 X2 X3 X4 compute only once X1 X1 X3 X2 X4 X3 Finding the best Cluster 12.35 14.12 • Compute using
X1 X2 X3 X4 X1 3.11 X2 16.79 X3 X1 X3 X2 X4 X4 18.45 compute only once X1 X3 X2 X1 X3 X1 X1 X2 X3 X3 X4 X1 X3 X2 X4 Finding the best Cluster • Select cluster with highest score • Add hidden parent and continue with search 12.35 14.12 14.12
-20 -20 -60 train log-likelihood test log-likelihood -40 Greedy Ideal K=2 -100 Ideal K=5 Gold -60 10 100 10 100 Instances Instances Bipartite Network Instances from biological expert network with7 (hidden) parents and 141 (observed) children • Speedup is roughly x 10 • Greedy takes over 2.5 days!
Summary • New method for significantly speeding up structure learning in continuous variable networks • Offers promising time vs. performance tradeoff • Guided insertion of new hidden variables Future work • Improve cluster identification for non-linear case • Explore additional distributions and relation to GLM • Combine the ideal parent approach as plug-in with other search approaches