260 likes | 365 Views
A Generalization of Forward-backward Algorithm. Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology. Forward-backward algorithm. Allows efficient calculation of sums (e.g. expectation, ...) over all paths in a trellis. Plays an important role in sequence modeling
E N D
A Generalization of Forward-backward Algorithm Ai Azuma Yuji Matsumoto Nara Institute of Science and Technology
Forward-backward algorithm • Allows efficient calculation of sums (e.g. expectation, ...) over all paths in a trellis. • Plays an important role in sequence modeling • HMMs (Hidden Markov Models) • CRFs (Conditional Random Fields)[Lafferty et al., 2001] • ...
A sequential labeling example: part-of-speech tagging “Time flies like an arrow” Time [noun] flies [noun] like [noun] an [noun] arrow [noun] SOURCE SINK Time [verb] flies [verb] like [verb] an [verb] arrow [verb] Time [prep.] flies [prep.] like [prep.] an [prep.] arrow [prep.] Time [indef. art.] flies [indef. art.] like [indef. art.] an [indef. art.] arrow [indef. art.] in CRFs and HMMs, we need to compute the "sum" of the probabilities (or scores) of all paths.
Forward-backward algorithm efficiently computes sums over all paths in the trellis with dynamic programming It is intractable to enumerate all paths in the trellis because the number of all paths is enormous Forward-backward algorithm recursively computes the sum from source/sink to sink/source with keeping intermediate results on each node and arc
Forward-backward algorithm is applicable to = type of node/node pair = set of paths = set of nodes and arcs (cliques) in path = k-th feature
Type of sums computable with forward-backward algorithm: = set of paths = set of nodes and arcs (cliques) in path
But sometimes we need higher-order multivariate moments... • To name a few examples: • Correlation between features • Objectives more complex than log-likelihood • Parameter differentiations of these • ...
Our goal: To generalize forward-backward algorithm for higher-order multivariate moments!
Can we derive dynamic programming for this formula? Answer Record multiple forward/backward variables for each clique, and Combine all the previously calculated values by the binomial theorem
SOURCE u A set of pathsfrom SOURCE to u ・・・・・
SOURCE u A set of pathsfrom SOURCE to u Ordinaryforward-backward records only this variable ・・・・・
u ・・・・・ v ・・・・・ SOURCE Direct ancestors of v ・・・・・ ・・・・・
u ・・・・・ v ・・・・・ SOURCE Direct ancestors of v ・・・・・ These are derived from the binomial theorem ・・・・・
・・・・・ SINK ・・・・・ SOURCE ・・・・・ Direct ancestors of SINK ・・・・・ Desired values
Summary of Our Ideas u multiple variables for each clique ・・・・・ v ・・・・・ SOURCE ・・・・・ ・・・・・ ・・・・・ Dependency between variables in a step, which is derived from the binomial theorem
For multivariate cases, forward/backward variables have multiple indices u ・・・・・ ・・・・・
Computational cost is only linear in the number of nodes and arcs in the trellis Linear in |V| and |E| To calculate the following form computational cost of the generalized forward-backward is proportional to
Merits of the generalized forward-backward algorithm • The generalized forward-backward subsumes many existing task-specific algorithms • For some tasks, it leads to a solution more efficient than the existing ones
Merit 1. The generalized forward-backward subsumes many existing task-specific algorithms:
Merit 1. The generalized forward-backward subsumes many existing task-specific algorithms: All these formulas have a form computable with our proposed method.
The previously proposed algorithms for these tasks are task-specific • The generalized forward-backward is a task-independentalgorithm applicable to formulae of the form • If a problem involves this form, it immediately offers efficient solution
Merits of the generalized forward-backward algorithm • The generalized forward-backward subsumes many existing task-specific algorithms • For some tasks, it leads to a solution more efficient than the existing ones
Merit 2. Efficient optimization procedure with respect to Generalized Expectation Criteria for CRFs [Mann et al., 2008] Algorithm proposed in [Mann et al., 2008] By a specialization of the generalization Nodes labeled as answers Computational cost is proportional to Computational cost is proportional to (L = # of nodes labeled as answers)
Future tasks • Explore other tasks to which our generalized forward-backward algorithm is applicable • Extend the generalized forward-backward to trees and general graphs containing cycles
Summary • We have generalized the forward-backward algorithm to allow for higher-order multivariate moments • The generalization offers an efficient way to compute complex models of sequences that involve higher-order multivariate moments • Many existing task-specific algorithms are instances of this generalization • It leads to a faster algorithm for computing Generalized Expectation Criteria for CRFs
Summary Thank you for your attention! • We have generalized the forward-backward algorithm to allow for higher-order multivariate moments • The generalization offers an efficient way to compute complex models of sequences that involve higher-order multivariate moments • Many existing task-specific algorithms are instances of this generalization • It leads to a faster algorithm for computing Generalized Expectation Criteria for CRFs