Focused Belief Propagation for Query-Specific Inference

Focused Belief Propagation for Query-Specific Inference Anton ChechetkaCarlos Guestrin 14 May 2010

Query-Specific inference problem query known not interesting Using information about the queryto speed up convergence of belief propagation for the query marginals

Graphical models • Talk focus: pairwise Markov random fields • Paper:arbitrary factor graphs (more general) • Probability is a product of local potentials • Graphical structure •  Compact representation •  Intractable inference •  Approximate inference often works well in practice potential k r h i variable j s u

(loopy) Belief Propagation • Passing messages along edges • Variable belief: • Update rule: • Result: all single-variable beliefs k r h i j s u

(loopy) Belief Propagation • Update rule: • Message dependencies are local: • Round–robin schedule • Fix message order • Apply updates in that order until convergence dependence k r h i dep. j s u dependence

Dynamic update prioritization large change small change • Fixed update sequence is not the best option • Dynamic update scheduling can speed up convergence • Tree-Reweighted BP [Wainwright et. al., AISTATS 2003] • Residual BP [Elidan et. al. UAI 2006] • Residual BP  apply the largest change first large change small change 1 2 large change small change informative update wasted computation

Residual BP [Elidan et. al., UAI 2006] • Update rule: • Pick edge with largest residual • Update new old More effort on the difficult parts of the model  But no query 

Our contributions • Using weighted residuals to prioritize updates • Define message weights reflecting the importance of the message to the query • Computing importance weights efficiently • Experiments: faster convergence on large relational models

Why edge importance weights? query residual < residualwhich to update?? • Residual BP updates • no influence on the query • wasted computation • want to update • influence on the queryin the future Our work  max approx. eventual effect on P(query) Residual BP  max immediate residual reduction

Query-Specific BP • Update rule: • Pick edge with • Update new old edgeimportance the only change! Rest of the talk: defining and computing edge importance

Edge importance base case • Pick edge with approximate eventual update effect on P(Q) query Base case: edge directly connected to the queryAji=?? k r h i j s u change in query belief change in message ||P(NEW)(Q)  P(OLD)(Q)||  ||m(NEW)  m(OLD)||  1 1 ji ji ||m|| ||P(Q)|| tight bound ji

Edge importance one step away query mji mji Edge one step away from the query: Arj=?? sup sup mrj mrj ||P(Q)||  ||m || k r over values ofall other messages h ji i change in query belief  j s u ||m||  rj change in message message importance can compute in closed formlooking at only fji [Mooij, Kappen; 2007]

Edge importance general case  Base case: Aji=1 k r h  query  i One step away: Arj= j s u mji sup mrj Generalization? • expensive to compute • bound may be infinite ||P(Q)||  ||msh|| P(Q) sup msh P(Q) mhr mrj mji sup sup sup sup msh    msh mrj mhr  sensitivity(): max impact along the path 

Edge importance general case 2  1 query    k r h sensitivity(): max impact along the path  i P(Q) mhr mrj mji j s u sup sup sup sup msh • Ash= max all paths from to query sensitivity()  msh mhr mrj h There are a lot of paths in a graph,trying out every one is intractable 

Efficient edge importance computation There are a lot of paths in a graph,trying out every one is intractable  always decreases as the path grows Dijkstra’s(shortest paths) alg. will efficiently find max-sensitivity pathsfor every edge  always 1 always 1 always 1   sensitivity( hrji) = decomposes into individual edge contributions mhr mrj mji sup sup sup mhr msh mrj • A= max all paths from to query sensitivity()

Query-Specific BP • Run Dijkstra’salgstarting at query to get edge weights • Pick edge with largest weighted residual • Update Aji= max all paths from itoquery sensitivity() More effort on the difficult parts of the model and relevant Takes into account not only graphical structure, but also strength of dependencies

Outline • Using weighted residuals to prioritize updates • Define message weights reflecting the importance of the message to the query • Computing importance weights efficiently • As an initialization step before residual BP • Restore anytime behavior of residual BP • Experiments: faster convergence on large relational models

Big picture Before After ? ? long initialization all marginals all marginals anytimeproperty Dijkstra BP query marginals all marginals This is broken!

Interleaving BP and Dijkstra’s • Dijkstra’sexpands the highest weight edges first • Can pause it at any time and get the most relevant submodel query full model Dijkstra Dijkstra When to stop BP and resume Dijkstra’s? BP BP BP Dijkstra …

Interleaving • Dijkstra’s expands the highest weight edges first query expanded onprevious iteration not yet expanded just expanded minexpanded edgesA  A suppose upper bound on priority actual priority of  M  minexpanded A no need to expand further at this point

Any-Time query-specific BP query Query-specific BP: Dijkstra’s alg. BP updates k r same BP update sequence! i j s Anytime QSBP:

Experiments – the setup • Relational model: semi-supervised people recognition in collection of images [with Denver Dash and MatthaiPhilipose @ Intel] • One variable per person • Agreement potentials for people who look alike • Disagreement potentials for people in the same image • 2K variables, 900K factors • Error measure: KL from the fixed point of BP agree disagree

Experiments – convergence speed AnytimeQuery-Specific BP(with Dijkstra’s interleaving) Residual BP Query-Specific BP better

Conclusions • Prioritize updates by weighted residuals • Takes query info into account • Importance weights depend on both the graphical structure and strength of local dependencies • Efficient computation of importance weights • Much faster convergence on large relational models Thank you!

Focused Belief Propagation for Query-Specific Inference