10 likes | 157 Views
Evidence-Specific Structures for Rich Tractable CRFs. Anton Chechetka , Carlos Guestrin. Acknowledgements: this work has been supported by NSF Career award IIS-0644225 and by ARO MURI W911NF0710287 and W911NF0810242. Motivation. Our Approach: Evidence-Specific Structures.
E N D
Evidence-Specific Structures for Rich Tractable CRFs Anton Chechetka, Carlos Guestrin Acknowledgements: this work has been supported by NSF Career award IIS-0644225 and by ARO MURI W911NF0710287 and W911NF0810242 Motivation Our Approach:Evidence-Specific Structures Fewer Sources of Errors Learning Good Evidence-Specific Trees P( | , ) want P(structuredQuery |Evidence): Directly generalize existing algorithms for the no-evidence case: Intuition: Edge importance depends on evidence Collaborative filtering: P(Q|E=e) P(Q) (no evidence) Dependencein general E={gas tank is empty} P(Qi,Qj)(pairwise marginals)+Chow-Liu algorithm=optimal tree P(Qi,Qj|E=e)(pairwise conditionals)+Chow-Liu algorithm=good tree for E=e No dependence forthis specific evidence Webpage classification: professor P( | ) webpage text + links student Looking at one edge at a time is not enough to guarantee tree structure: being a tree is a global property student project Train stage: Relational Extensions E Q Face recognition in image collections: E Q1,Q2 E Q1,Q3 E Q3,Q4 Parameter sharingfor both w and u: one weight per relation, not per grounding P( | ) B collectionof images +face similarities Select tree structures, based on evidence,to capture the most important dependencies: C , … A face labels B original high-dimensional problem ( ) ( ) ( ) Fixed dense model × Evidence-specific tree “mask” = Evidence-specific model Parameters dimensionality independent of model size Reduces overfitting low-dimensional pairwise problems Qi Qj 1 1 • General approach: • Ground the model / features • Use standard ESS-CRFs +parameter sharing learn pairwise conditional estimators params u Conditional Random Fields E=e1 E=e1 0 × Test stage(evidence-specific Chow-Liu alg.): 0 E=e2 E=e2 … 1 Instantiate evidence in pairwise estimators: E=e3 E=e3 normalization weights features ( ) 0 Structure selection only after grounding ) Captureall potentialdependencies ( Select the most important treespecific to the evidence value Features can be arbitrarilycorrelated Convex objective Unique global optimum Intuitive gradient: Evidence Q1 No worries about structure being a tree on the relational level Q2 Compute mutual information values edge weights Q3 Q4 f34 CRF with Evidence-Specific StructureFormalism: Results f12 Enginestarts Enginestarts Q1 Q2 Return maximum spanning tree: Batteryis good Batteryis good Q3 Q4 Face recognition[w/ Denver Dash, Matthai Philipose] Query evidence-specific structure standard weighted features • Exploit face similarities to propagate labels in collections of images • Semi-superwised relational model • 250…1700 images, 4…24 unique people • Compare against dense discriminative models Induced structure over Q f12 Learning Optimal Feature Weights f34 agree disagree feature expected feature structure selection algorithm structure selection parameters T(E,u) encodes the output of a structure selection algorithm need inference in the induced model Gradient similar to standard CRFs: Exact inference: #P-complete Approximate inference:NP-complete Hopeless in large dense models Easy for tree-structured models Globalperspective on structure selection structure-related parameters u are fixed from the tree-learning step Easy to guarantee tree structure(by selecting appropriate alg.) sparsity conforms to theevidence-specific structure efficient exact computationbecause T(e,u) is a tree choose features f choose tree learning algorithm T(E,) learn u select evidence-specific trees T(ei,u) for every datapoint(E=ei,Q=qi) [u is fixed at this stage] given u, trees T(ei,u), learn w [L-BFGS, etc.] Learning a ESS-CRF model: Model Structure Tradeoffs individual datapoints tree-sparse gradients (with different evidence-dependentsparsity patterns) Equal or better accuracy Dense models Tree models Capture complex dependencies Natural extensions to relational settings Arbitrarily bad inference quality Arbitrarily bad parameters quality Simple dependencies only Relational settings are not tree-structured Efficient exact inference Efficient learning of optimal parameters ( ) ) ( ) ( + + E=e1 E=e2 E=e3 = Objective still convex in w(but not u) Efficient exact inference Efficient learning of optimal parameters w Much richer class of models than fixed trees (potential for capturing complex correlations) Structure selection decoupled from feature design and weights (can use an arbitrarily dense model as the basis) 100 times faster ( ) overall dataset:dense gradient,but still tractable WebKB[data + features thanks to Ben Taskar] • webpages text + links page type (student, project,…) Can exactly compute the convexobjective and its gradient Use L-BFGS or conjugate gradient to find the unique global optimum w.r.t. w exactly This work: Keep efficient exact inference + parameters, enable rich dependencies and relational extensions same accuracy as dense models ~10 times faster