240 likes | 419 Views
Focused Belief Propagation for Query-Specific Inference. Anton Chechetka Carlos Guestrin. 14 May 2010. Query-Specific inference problem. q uery . known. n ot interesting. Using information about the query to speed up convergence of belief propagation for the query marginals.
E N D
Focused Belief Propagation for Query-Specific Inference Anton ChechetkaCarlos Guestrin 14 May 2010
Query-Specific inference problem query known not interesting Using information about the queryto speed up convergence of belief propagation for the query marginals
Graphical models • Talk focus: pairwise Markov random fields • Paper:arbitrary factor graphs (more general) • Probability is a product of local potentials • Graphical structure • Compact representation • Intractable inference • Approximate inference often works well in practice potential k r h i variable j s u
(loopy) Belief Propagation • Passing messages along edges • Variable belief: • Update rule: • Result: all single-variable beliefs k r h i j s u
(loopy) Belief Propagation • Update rule: • Message dependencies are local: • Round–robin schedule • Fix message order • Apply updates in that order until convergence dependence k r h i dep. j s u dependence
Dynamic update prioritization large change small change • Fixed update sequence is not the best option • Dynamic update scheduling can speed up convergence • Tree-Reweighted BP [Wainwright et. al., AISTATS 2003] • Residual BP [Elidan et. al. UAI 2006] • Residual BP apply the largest change first large change small change 1 2 large change small change informative update wasted computation
Residual BP [Elidan et. al., UAI 2006] • Update rule: • Pick edge with largest residual • Update new old More effort on the difficult parts of the model But no query
Our contributions • Using weighted residuals to prioritize updates • Define message weights reflecting the importance of the message to the query • Computing importance weights efficiently • Experiments: faster convergence on large relational models
Why edge importance weights? query residual < residualwhich to update?? • Residual BP updates • no influence on the query • wasted computation • want to update • influence on the queryin the future Our work max approx. eventual effect on P(query) Residual BP max immediate residual reduction
Query-Specific BP • Update rule: • Pick edge with • Update new old edgeimportance the only change! Rest of the talk: defining and computing edge importance
Edge importance base case • Pick edge with approximate eventual update effect on P(Q) query Base case: edge directly connected to the queryAji=?? k r h i j s u change in query belief change in message ||P(NEW)(Q) P(OLD)(Q)|| ||m(NEW) m(OLD)|| 1 1 ji ji ||m|| ||P(Q)|| tight bound ji
Edge importance one step away query mji mji Edge one step away from the query: Arj=?? sup sup mrj mrj ||P(Q)|| ||m || k r over values ofall other messages h ji i change in query belief j s u ||m|| rj change in message message importance can compute in closed formlooking at only fji [Mooij, Kappen; 2007]
Edge importance general case Base case: Aji=1 k r h query i One step away: Arj= j s u mji sup mrj Generalization? • expensive to compute • bound may be infinite ||P(Q)|| ||msh|| P(Q) sup msh P(Q) mhr mrj mji sup sup sup sup msh msh mrj mhr sensitivity(): max impact along the path
Edge importance general case 2 1 query k r h sensitivity(): max impact along the path i P(Q) mhr mrj mji j s u sup sup sup sup msh • Ash= max all paths from to query sensitivity() msh mhr mrj h There are a lot of paths in a graph,trying out every one is intractable
Efficient edge importance computation There are a lot of paths in a graph,trying out every one is intractable always decreases as the path grows Dijkstra’s(shortest paths) alg. will efficiently find max-sensitivity pathsfor every edge always 1 always 1 always 1 sensitivity( hrji) = decomposes into individual edge contributions mhr mrj mji sup sup sup mhr msh mrj • A= max all paths from to query sensitivity()
Query-Specific BP • Run Dijkstra’salgstarting at query to get edge weights • Pick edge with largest weighted residual • Update Aji= max all paths from itoquery sensitivity() More effort on the difficult parts of the model and relevant Takes into account not only graphical structure, but also strength of dependencies
Outline • Using weighted residuals to prioritize updates • Define message weights reflecting the importance of the message to the query • Computing importance weights efficiently • As an initialization step before residual BP • Restore anytime behavior of residual BP • Experiments: faster convergence on large relational models
Big picture Before After ? ? long initialization all marginals all marginals anytimeproperty Dijkstra BP query marginals all marginals This is broken!
Interleaving BP and Dijkstra’s • Dijkstra’sexpands the highest weight edges first • Can pause it at any time and get the most relevant submodel query full model Dijkstra Dijkstra When to stop BP and resume Dijkstra’s? BP BP BP Dijkstra …
Interleaving • Dijkstra’s expands the highest weight edges first query expanded onprevious iteration not yet expanded just expanded minexpanded edgesA A suppose upper bound on priority actual priority of M minexpanded A no need to expand further at this point
Any-Time query-specific BP query Query-specific BP: Dijkstra’s alg. BP updates k r same BP update sequence! i j s Anytime QSBP:
Experiments – the setup • Relational model: semi-supervised people recognition in collection of images [with Denver Dash and MatthaiPhilipose @ Intel] • One variable per person • Agreement potentials for people who look alike • Disagreement potentials for people in the same image • 2K variables, 900K factors • Error measure: KL from the fixed point of BP agree disagree
Experiments – convergence speed AnytimeQuery-Specific BP(with Dijkstra’s interleaving) Residual BP Query-Specific BP better
Conclusions • Prioritize updates by weighted residuals • Takes query info into account • Importance weights depend on both the graphical structure and strength of local dependencies • Efficient computation of importance weights • Much faster convergence on large relational models Thank you!