70 likes | 375 Views
Bounding the Partition Function Using Hölder’s Inequality. Qiang Liu Alexander Ihler Department of Computer Science, University of California, Irvine. Duality results. Graphical models. H ö lder’s inequality. Markov random fields Factorized form
E N D
Bounding the Partition Function Using Hölder’s Inequality QiangLiu Alexander Ihler Department of Computer Science, University of California, Irvine Duality results Graphical models Hölder’s inequality • Markov random fields • Factorized form • Factors are associated with cliques of a graph G=(V,E) • Task: calculate the partition function Z, or • Important: probability of evidence, parameter estimation • #P-complete in general graphs • Approximations and bounds are needed • Define the weighted (or power) sum: • has “zero-temperature” limits • Hölder’s inequality: • if some weights are negative, the bound reverses: • Weighted mini-bucket (WMBE): • Same procedure as naïve MBE • sum/max bounds replaced with weighted sums • reduces to MBE if w! 0+ or 0- • Dual form of the weighted log-partition function: • µ-optimal bound is • Comments: • µ-optimal bound is equivalent to TRBP (or more generally CED) • More compact representation: • Fewer parameters (others held at optimal values) • Simple & efficient weight optimization or Variational methods = • Dual representation • Loopy belief propagation • Tree-reweighted belief propagation • Generalizes to hypertrees, GTRBP • Conditional entropy decomposition • Generalizes to weighted combinations of orders • Comments • Relatively good bounds at convergence • Bound not guaranteed until convergence • Hard to choose weights & cliques; esp. for GTRBP, CED (Wainwright & Jordan 08) Spanning trees Covering tree 3X3 grid (Yedidia et al. 04) Experiments (Wainwright et al. 05) • 10x10 Ising grids • random • mixed interactions • A few iterations are usually good enough • iboundis the most dominant factor • Optimizing w can be better than optimizing θ • Linkage analysis • from UAI2008 competition • 300-1000 nodes, treewidth 20-30 (Globerson & Jaakkola 07) Weighted log-partition function • Covering graph: • Weighted log-partition • has derivatives • where q(.) is defined by a chain rule: • Tightening the bound: • related to TRBP and reparameterization • Important, but largely unexplored Mini-bucket θ-optimized, one pass How to choose the weights and split the parameters? original graph Mini-bucket elimination • Bucket elimination (variable elimination) • Directly sum over the variables in sequence • Cost is exponential in the tree-width • Mini-bucket elimination (MBE) approximates BE • Gives upper or lower bound • Comments • Low accuracy for • small clique sizes (ibound) • Single pass, non-iterative • Easy to implement with high ibound Timing comparisons w-optimized, one pass Both fully optimized (A natural extension of the log-partition function) (Distributive law) ibound=5 (Dechter & Rish 03) Splitting ibound=15 = = ibound: controls clique size, & how much splitting is required pedigree13