1 / 1

Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth

Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth. Amortized Inference. General Recipe.

Download Presentation

Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Margin-based Decomposed Amortized Inference Gourab Kundu, Vivek Srikumar, Dan Roth Amortized Inference General Recipe Key Observations: (1) In NLP, we solve a lot of inference problems, at least one per sentence. (2)Redundancy of structures: The number of observed structures (blue solid line) is much smaller than the number of inputs (red dotted line). Moreover, the distribution of observed structures is highly skewed (inset). (Eg. for POS, a small number of tag sequences are much more frequent than the rest.)  Pigeon Principle Applies. Research Question: Can we solve the k-th inference instance much fast than the 1st? Amortized inference (Srikumar et al 2012) shows how computation from earlier inference instances can be used to speed up inference for new, previously unseen instances. If CONDITION(problem cache, newproblem) then (no need to call the solver) SOLUTION(new problem) = old solution Else Call base solver and update cache End +A theorem guaranteeing correctness 2. Decomposed Amortized Inference using Lagrangian Relaxation ILP formulations 1. Margin based Amortization • Given a problem: max cqT y s.t. MT y <= b: • Partitionconstraints into two sets, say C1 and C2 (with constraints M2T y – b2) • Define L(λ): max y ϵ C1 cqT y – λT (M2T y – b2), solved using any amortized algorithm • Dual problem: minλ >= 0 L(λ), solved using gradient descent over dual variables λ. • The dual can be decomposed into multiple smaller problems if no constraint in C1 has active variables from two different smaller problems. Even otherwise, it has fewer constraints than the original problem. Considering the dual can have two advantages from the perspective of amortization: (1) Smaller problems (2) Higher chance of cache hits • Let B be the solution yp for inference • problem p with objective cp, denoted • by the red hyperplane and let A be the • second best assignment. • For a new objective function cq (blue), if the margin δ is greater than the sum of • the decrease in the objective value of yp • and the maximum increase in the • objective of another solution (Δ), then • the solution to the new inference • problem will still be yp. • ILP (Integer Linear Programming) is a general formulation for inference in structured prediction tasks [Roth & Yih, 04, 07] • Inference using ILP has been successful in NLP tasks e.g. SRL, Dep. Parsing, Information extraction and more. • ILP can be expressed as: • max cx max2x1 + 3x2 • s.t. Ax ≤ b x1 + x2 <= 1 • x integer • Amortized inference depends on having an ILP formulation; but multiple solvers might be used. More redundancy among smaller structures Redundancy in components of structures: We extend amortization to cases where the full structured output is not repeatedby storing partial computation for future inference problems. • Formally , the condition is given by: - ( cq - cp)T yp + Δ <= δ • At the caching stage, δ is computed for each ILP and stored in • cache. • At test time, computing Δ exactly requires solving an ILP. • Instead, we compute an approximate Δ by solving a relaxed problem • which can be done efficiently. Experimental Setup Semantic Role Labeling Entities and Relations [Roth & Yih 2004] See also [Punyakanok, et al 2008] We simulate a long-running NLP process by caching problems and solutions from Gigaword corpus. We used a database engine to cache ILPs and their solutions along their and structured margin. We compare our approaches to a state-of-the-art ILP solver (Gurobi) and also to Theorem 1 from (Srikumar et al.2012). This work continues the original work on Amortization: Srikumar, Kundu and Roth. On Amortizing Inference Cost for Structured Prediction. EMNLP, 2012 Solve only one in four problems Solve only one in six problems Wall clock improvements too This research is sponsored by the Army Research Laboratory (ARL) under agreement W911NF-09-2-0053, Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0181, DARPA under agreement number FA8750-13-2-0008, Intelligence Advanced Research Projects Activity (IARPA) via Department of Interior National Business Center contract number D11PC20155.

More Related