“Ideal Parent” Structure Learning

“Ideal Parent” Structure Learning Gal Elidan with Iftach Nachman and Nir Friedman School of Engineering & Computer Science The Hebrew University, Jerusalem, Israel

Variables Data S C E S C S C E D D E 1 Consider local changes D 2 Score each candidate S C S C E E 3 Apply best modification D D Learning Structure Input: Output: Instances Init: Start with initial structure -17.23 -23.13 • The “Ideal Parent” Approach • Approximate improvements of changes (fast) • Optimize & score promising candidates (slow) Problems: • Need to score many candidates • Each one requires costly parameter optimization Structure learning is often impractical -19.19

A B D C P(E| C) E C E Linear Gaussian Networks

U X Pred(X|U) Instances The “Ideal Parent” Idea Goal: Score only promising candidates Parent Profile Child Profile

Z1 U Z2 Z3 X potential parents Z4 Instances Instances The “Ideal Parent” Idea Goal: Score only promising candidates Step 1:Compute optimalhypothetical parent Step 2:Search for“similar” parent Ideal Profile Parent Profile Y Child Profile Pred(X|U) Pred(X|U,Y)

Ideal Profile Z1 Y U Z2 Z3 X potential parents Pred(X|U,Y) Z4 Instances Instances The “Ideal Parent” Idea Goal: Score only promising candidates Step 1:Compute optimalhypotheticalparent Step 2:Search for“similar” parent Step 3:Add new parentand optimize parameters Parent(s) Profile Z2 Child Profile Predicted(X|U,Z)

U Z U Likelihood of Likelihood of Y X X fy,z Z Choosing the best parent Z • Our goal: Choose Z that maximizes We define: Theorem: likelihood improvement when only z is optimized

C2 Similarity C1 Similarity  score  score Similarity vs. Score effect of fixed variance is large We now have an efficient approximation for the score • C2 is more accurate • C1 will be useful later

S C E S C D E -17.23 D -23.13 S C S C E E -19.19 D D Ideal Parent in Search • Structure search involvesO(N2) Add parentO(NE) Replace parentO(E) Delete parentO(E) Reverse edge • Vast majority of evaluations are replaced by ideal approximation • Only K candidates per family are optimized and scored

0.2 4 Amino Metabolism Conditions (AA) 3 Conditions (Met) 0.1 test -log-likelihood speedup 2 1 0 0 1 1 1 2 2 2 3 3 3 4 4 4 5 5 5 K K K Gene Expression Experiment 4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (2xConditions) variables greedy 0.4%-3.6% changes evaluated Speedup:1.8-2.7

Chemical Reaction Linear Gaussian Sigmoid Gaussian Scope Conditional probability distribution (CPD) of the form link function white noise General requirement: g(U) be any invertible (w.r.t ui) function

1 1 g(z) 0.5 g(z) Z 0 0 2 2 X Exact P(X=0.5|Z) Approx P(X=0.85|Z) 0 -4 -2 0 2 4 0 -4 -2 0 2 4 Z Sensitivity to Z depends on gradient of specific instance -4 -4 -2 -2 0 0 2 2 4 4 Sigmoid Gaussian CPD Solution: Problem:No simple form for similarity measures X = 0.85 X = 0.5 Y(0.5) Y(0.85) Linear approximation around Y=0 Z Likelihood Likelihood Z

3.52 3.37 2.31 2.26 1.1 1.15 Z x 0.1275 (g0.85) -0.11 0.04 -1.85 -0.64 0.58 1.79 Z x 0.25 (g0.5) -0.86 -0.3 0.27 0.83 Sigmoid Gaussian CPD Equi-Likelihood Potential After gradient correction Z (X=0.85) We can now use the same measure Z (X=0.5)

Amino Metabolism 100 Conditions (AA) 0.1 Conditions (Met) test -log-likelihood speedup 60 0 20 -0.1 0 0 5 5 10 10 15 15 20 20 K K Sigmoid Gene Expression 4 Gene expression datasets with 44 (Amino), 89 (Metabolism) and 173 (Conditions) variables greedy • 2.2%-6.1% moves evaluated • 18-30 times faster

Y1 X1 Y2 X2 Y3 X3 Y4 X1 X2 X4 X4 Y5 X5 Adding New Hidden Variables Idea: Introduce hidden parent for nodes with similar ideal profiles Idea Profile H Instances For the Linear Gaussian case: Challenge: Find that maximizes this bound

Rayleigh quotient of the matrix and . Scoring a parent • where is the matrix whose columns are • must lie in the span of • is the eigenvector with largest eignevalue • Setting and using the above (with A invertible) Finding h* amounts to solving an eigenvector problem where |A|=size of cluster

X1 X2 X3 X4 X1 3.11 X2 X3 X4 compute only once X1 X1 X3 X2 X4 X3 Finding the best Cluster 12.35 14.12 • Compute using

X1 X2 X3 X4 X1 3.11 X2 16.79 X3 X1 X3 X2 X4 X4 18.45 compute only once X1 X3 X2 X1 X3 X1 X1 X2 X3 X3 X4 X1 X3 X2 X4 Finding the best Cluster • Select cluster with highest score • Add hidden parent and continue with search 12.35 14.12 14.12

-20 -20 -60 train log-likelihood test log-likelihood -40 Greedy Ideal K=2 -100 Ideal K=5 Gold -60 10 100 10 100 Instances Instances Bipartite Network Instances from biological expert network with7 (hidden) parents and 141 (observed) children • Speedup is roughly x 10 • Greedy takes over 2.5 days!

Summary • New method for significantly speeding up structure learning in continuous variable networks • Offers promising time vs. performance tradeoff • Guided insertion of new hidden variables Future work • Improve cluster identification for non-linear case • Explore additional distributions and relation to GLM • Combine the ideal parent approach as plug-in with other search approaches

“Ideal Parent” Structure Learning

“Ideal Parent” Structure Learning

Presentation Transcript

Learning Your Colors.....

Parent-Teacher Conference

Learning and Repair Techniques for Self-Healing Systems

I. Physical Properties

Family Life

IT/CS 811 Principles of Machine Learning and Inference

Learning linguistic structure

Part III Learning structured representations Hierarchical Bayesian models

Learning linguistic structure

Air Force Learning Committee (AFLC)

Learning Bayesian networks from postgenomic data with an improved structure MCMC sampling scheme

The Cell

Computer Technologies for English Teaching and Learning

Protein Structure

Protein Structure

Chapter 16: Capital Structure Decisions: The Basics

Learning

NON-IDEAL FLOW Residence Time Distribution

EXPERIENTIAL LEARNING LECTURES

Annual Title 1 Parent Meeting