Learning structured ouputs A reinforcement learning approach

Learning structured ouputsA reinforcement learning approach P. Gallinari F. Maes, L. Denoyer Patrick.gallinari@lip6.fr www-connex.lip6.fr University Pierre et Marie Curie – Paris – Fr Computer Science Lab.

Outline • Motivation and examples • Approaches for structured learning • Generative models • Discriminant models • Reinforcement learning for structured prediction • Experiments ANNPR 2008 - P. Gallinari

Machine learning and structured data • Different types of problems • Model, classify, cluster structured data • Predict structured outputs • Learn to associate structured representations • Structured data and applications in many domains • chemistry, biology, natural language, web, social networks, data bases, etc ANNPR 2008 - P. Gallinari

Coord. Conj. noun Verb 3rd pers adverb Noun plural determiner Verb gerund Verb plural adjective Sequence labeling: POS ANNPR 2008 - P. Gallinari

adverbial Phrase Noun Phrase Noun Phrase Verb Phrase Noun Phrase Segmentation + labeling: syntactic chunking (Washington Univ. tagger) ANNPR 2008 - P. Gallinari

Syntaxic Parsing (Stanford Parser) ANNPR 2008 - P. Gallinari

Tree mapping: • Problem • query heterogeneous XML collections • Web information extraction • Need to know the correspondence between the structured representations  usually made by hand • Learn the correspondence between the different sources Labeled tree mapping problem ANNPR 2008 - P. Gallinari

Others • Taxonomies • Social networks • Adversial computing: Webspam, Blogspam, … • Translation • Biology • ….. ANNPR 2008 - P. Gallinari

Is structure really useful ?Can we make use of structure ? • Yes • Evidence from many domains or applications • Mandatory for many problems • e.g. 10 K classes classification problem • Yes but • Complex or long term dependencies often correspond to rare events • Practical evidence for large size problems • Simple models sometimes offer competitive results • Information retrieval • Speech recognition, etc ANNPR 2008 - P. Gallinari

Why is structure prediction difficult • Size of the output space • # of potential labeling for a sequence • # of potential joint labeling and segmentation • # of potential labeled trees for an input sequence • … • Exponential output space • Inference often amounts at a combinatorial search problem • Scaling issues in most models ANNPR 2008 - P. Gallinari

Structured learning • X, Y : input and output spaces • Structured output • y  Ydecomposes into parts of variable size • y = (y1, y2,…, yT) • Dependencies • Relations between y parts • Local, long term, global • Cost function • O/ 1 loss: • Hamming loss: • F-score • BLEU etc ANNPR 2008 - P. Gallinari

General approach • Predictive approach: • where F : X x Y R is a score function used to rank potential outputs • F trained to optimize some loss function • Inference problem • |Y| sometimes exponential • Argmax is often intractable: hypothesis • decomposability of the score function over the parts of y • Restricted set of outputs ANNPR 2008 - P. Gallinari

Structured algorithms differ by: • Feature encoding • Hypothesis on the output structure • Hypothesis on the cost function • Type of structure prediction problem ANNPR 2008 - P. Gallinari

Three main families of SP models Generative models Discriminant models Search models

Generative models • 30 years of generative models • Hidden Markov Models • Many extensions • Hierarchical HMMs, Factorial HMMs, etc • Probabilistic Context Free grammars • Tree models, Graphs models ANNPR 2008 - P. Gallinari

Usual hypothesis • Features : “natural” encoding of the input • Local dependency hypothesis • On the outputs (Markov) • On the inputs • Cost function • Usually joint likelihood • Decomposes, e.g. sum of local cost on each subpart • Inference and learning usually use dynamic programming • HMMs • Decoding complexity O(n|Q|2) for a sequence of length n and |Q| states • PCFGs • Decoding Complexity: O(m3n3), n = length of the sentence, m = # non terminals in the grammar • Prohibitive for many problems ANNPR 2008 - P. Gallinari

Summary generative SP models • Well known approaches • Large number of existing models • Local hypothesis • Often imply using dynamic programming • Inference • learning ANNPR 2008 - P. Gallinari

Discriminant models • Structured Perceptron (Collins 2002) • Breakthrough • Large margin methods (Tsochantaridis et al. 2004, Taskar 2004) • Many others now • Considered as an extension of multi-class clasification ANNPR 2008 - P. Gallinari

Usual hypothesis • Joint representation of input – output Φ(x, y) • Encode potential dependencies among and between input and output • e.g. histogram of state transitions observed in training set, frequency of (xi,yj), POS tags, etc • Large feature sets (102 -> 104) • Linear score function: • Decomposability of features set (outputs) and of the loss function ANNPR 2008 - P. Gallinari

Extension of large margin methods • 2 problems • Classical 0/1 loss is meaningless with structured outputs • Structured outputs is not a simple extension of multiclass classification • Generalize max margin principle to other loss functions than 0/1 loss • Number of constraints for the QP problem is proportional to |Y|, i.e. potentially exponential • Different solutions • Consider only a polynomial number of “active” constraints -> bounds on the optimality of the solution • Learning require solving an Argmax problem at each iteration • i.e. Dynamic programming ANNPR 2008 - P. Gallinari

Summary: discriminant approaches • Hypothesis • Local dependencies for the output • Decomposability of the loss function • Long term dependencies in the input • Nice convergence properties + bounds • Complexity • Learning often does not scale ANNPR 2008 - P. Gallinari

Reinforcement learning for structured outputs

Precursors • Incremental Parsing, Collins 2004 • SEARN, Daume et al. 2006 • Reinforcement Learning, Maes et al. 2007 ANNPR 2008 - P. Gallinari

General idea • The structured output will be built incrementally • ŷ = (ŷ1, ŷ2,…, ŷT) • Different from solving directly the prediction preblem • It will by build via Machine learning • Loss : • Goal • Construct incrementally ŷ so as to minimize the loss • Training: learn how to search the solution space • Inference: build ŷ as a sequence of successive decisions ANNPR 2008 - P. Gallinari

Example : sequence labelling Example 2 labels R et B Search space : (input sequence, {sequence of labels}) For a size 3 sequence x = x1x2x3 : A node represents a state in the search space ANNPR 2008 - P. Gallinari

Example:actions and decisions Sequence of size 3 target : Actions Π(s, a) Π(s, a) Π(s, a) Π(s, a) : decision function Π(s, a) ANNPR 2008 - P. Gallinari

Example : expected loss C=0 CT=1 Sequence of size 3 target : C=0 C=1 C=1 CT=2 C=0 CT=2 C=1 C=1 CT=3 C=0 CT=0 C=0 C=0 C=1 CT=1 Loss does not always separate !! C=0 C=1 CT=1 C=1 CT=2 ANNPR 2008 - P. Gallinari

Inference • Suppose we have a policy function F which decides at each step which action to take • Inference could be performed by computing • ŷ1= F(x,. ), ŷt= F(ŷ1,… ŷt-1),…, ŷT= F(ŷ1,… , ŷT-1) • ŷ = F(ŷ1,… , ŷT) • No Dynamic programming needed ANNPR 2008 - P. Gallinari

Example: state space exploration guided by local costs C=0 CT=1 Sequence of size 3 target : C=0 C=1 C=1 CT=2 C=0 CT=2 C=1 C=1 CT=3 C=0 CT=0 C=0 Goal: generalize to unseen situation C=0 C=1 CT=1 C=0 C=1 CT=1 C=1 CT=2 ANNPR 2008 - P. Gallinari

Training • Learn a policyF • From a set of (input, output) pairs • Which takes the optimal action at each state • Which is able to generalize to unknown situations ANNPR 2008 - P. Gallinari

Reinforcement learning for SP • Formalizes search based ideas using Markov Decision Processes as model and Reinforcement Learning for training • Provides a general framework for this approach • Many RL algorithm could be used for training • Differences with classical use of RL • Specific hypothesis • Deterministic environment • Problem size • State : up to 106 variables • Generalization properties ANNPR 2008 - P. Gallinari

Markov Decision Process • A MDP is a tuple (S, A, P, R) • S is the State space, • A is the action space, • P is a transition function describing the dynamic of the environment • P(s, a, s’) = P(st+1 = s’| st = s, at = a) • R is a reward function • R(s, a, s’) = E[rt|st+1 = s’, st = s, at = a) ANNPR 2008 - P. Gallinari

Example: Prediction problem ANNPR 2008 - P. Gallinari

Example: State space • Deterministic MDP • States: input + partial output • Actions: elementary modifications of the partial output • Rewards: quality of prediction ANNPR 2008 - P. Gallinari

Example: partially observable reward • The reward function is only partially observable : limited set of training (input, output) ANNPR 2008 - P. Gallinari

Learning the MPDP • The reward function cannot be computed on the whole MDP • Approximated RL • Learn the policy on a subspace • Generalize to the whole space • The policy is learned as a linear function • Φ(s, a) is a joint description of (s, a) • θ is a ,parameter vector • Learning algorithm • Sarsa, Q-learning, etc ANNPR 2008 - P. Gallinari

Inference • State at step t • Initial input • Partial output • Once the MDP is learned • Using the state at step t, compute state t+1 • Until a solution is reached • Greedy approach • Only the best action is chosen at each time ANNPR 2008 - P. Gallinari

Structured outputs and MDP • State st • Input x + partial output ŷt • Initial state : (x,) • Actions • Task dependent • POS: new tag for the current word • Tree mapping: insert a new path in a partial tree • Inference • Greedy approach • Only the best action is chosen at each time • Reward • Final: • Heuristic: ANNPR 2008 - P. Gallinari

Exemple: sequence labeling • Left Right model • Actions: label • Order free model • Actions: label + position • Loss : Hamming cost or F score • Tasks • Named entity recognition (Shared task at CoNNL 2002 - 8 000 train, 1500 test) • Chunking – Noun Phrases (CoNNL 2002) • Handwriten word recognition (5000 train, 1000 test) • Complexity of inference • O(sequence size * number of labels) ANNPR 2008 - P. Gallinari

ANNPR 2008 - P. Gallinari

Tree mapping: Document mapping • Problem Labeled tree mapping problem Different instances Flat text to XML HTML to XML XML to XML…. ANNPR 2008 - P. Gallinari

Document mapping problem • Central issue: Complexity • Large collections • Large feature space: 103 to 106 • Large search space (exponential) • Other SP methods fail ANNPR 2008 - P. Gallinari

XML structuration • Action • Attach path ending with the current leaf to a position in the current partial tree • Φ(a,s) encodes a series of potential (state, action) pairs • Loss: F-Score for trees ANNPR 2008 - P. Gallinari

HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote Francis MAES Title of the section TITLE AUTHOR SECTION INPUT DOCUMENT Example Welcome to INEX TEXT TARGET DOCUMENT TITLE DOCUMENT ANNPR 2008 - P. Gallinari

HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote ANNPR 2008 - P. Gallinari

HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote Example TITLE DOCUMENT ANNPR 2008 - P. Gallinari

HTML HEAD BODY TITLE IT H1 P FONT Example Francis MAES Title of the section Welcome to INEX This is a footnote Example Francis MAES TITLE AUTHOR DOCUMENT ANNPR 2008 - P. Gallinari

Learning structured ouputs A reinforcement learning approach

Learning structured ouputs A reinforcement learning approach

Presentation Transcript

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

REINFORCEMENT LEARNING

Reinforcement Learning

Reinforcement Learning

Learning structured ouputs

Reinforcement Learning

Grid Differentiated Services: a Reinforcement Learning Approach

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning

Reinforcement Learning