CRFs

CRFs 米海涛 2007.03.22

Outline: • 1、Directed Graph • HMM • MEMM • 2、Undirected Graph • Conditional Random Fields • 3、Comparison: • HMM MEMM CRF

V2 V1 V4 V3 V5 1、Directed Graph • Directed Graph • Directed graph G = (V, E), here V is a set of nodes in G, and E is the edges between the nodes V. In V each Vi represents a random variable.

y1 y2 y3 y4 y5 x1 x2 x3 x4 x5 1.1 Hidden Markov Model • Generative model: Hidden Markov Model ( L.R. Rabiner. 1989) • Disadvantages strict independency assumption on observation; can not handle interaction features and long-distance dependency

y1 y2 y3 y4 y5 X 1.2 Maximum Entropy Markov Model • Discriminative models:MEMM ( Andrew McCallum, 2000) • Given training set X with label sequence Y: • Train a model θ that maximizes P(Y|X, θ) • For a new data sequence x, the predicted label y maximizes P(y|x, θ) • Notice the per-state normalization (Probability of moving from state s to s’ on observation x)

1.2 Maximum Entropy Markov Model • Bias in favor of paths that go through nodes of low entropy • most paths out of a node of high entropy have small probabilities

2 CRF

2.1 Random Field

2.2 CRF • X is a random variable over data sequences to be labeled • Y is a random variable over corresponding label sequences

X y1 y2 y3 y4 y5 2.2 Example • First order CRFs

2.2 CRF

2.3 Parameter Estimation for CRFs • Parameter Estimation for CRFs • Maximum likelihood parameter estimation:

2.4 Parameter Estimation for CRFs • Learning Algorithm: • GIS • IIS • Quasi Newton Method • L-BFGS

Input Output 2.5 Example: Base Noun Phrase Recognition

y1 y2 y3 y1 y2 y3 y1 y2 y3 x1 x2 x3 X X 3 Comparison: • Graphical comparison among HMM, MEMM and CRF HMM MEMM CRF

y1 y2 y3 x1 x2 x3 HMM • Disadvantage: • Not practical to represent multiple interacting features or long-range dependencies of the observations • Very strict independence assumptions on the observations

y1 y2 y3 X MEMM • Conditional probability P(label sequence S | observation sequence O) rather than joint probability P(O, S) • Allow arbitrary, non-independent features on the observation sequence X • The probability of a transition between labels may depend on past and future observations • Relax strong independence assumptions in generative models

y1 y2 y3 X MEMM • The per-state normalization • Subject to Label Bias Problem • Bias toward states with fewer outgoing transitions

P=1 P=1 Label Bias Example P=0.75 • Given: “rib” 3 times, “rob” 1 times • Training: p(1|0, “r”)=0.75, p(4|0, “r”)=0.25 • Inference: P=0.25

CRF： • Given: “rib” 3 times, “rob” 1 times • Training: (constrains) • Init() • w(1|0, “r”)=3, w(2|1, “i”)=3, w(3|2, “b”)=3 • w(4|0, “r”)=1 , w(5|4, “o”)=1, w(3|5, “b”)=1 • Iterator() • L(0453|rib)=w(4|0, “r”)+w(5|4, “i”)+w(3|5, “b”)=1+0+1=2 • L(0123|rib)=3+3+3=9 > L(0453|rib)=1+0+1=2 right • L(0453|rob)=1+1+1=3 < L(0123|rob)=3+0+3=6 wrong • ……… • Final: • w(1|0, “r”)=3, w(2|1, “i”)=3, w(3|2, “b”)=2 • w(4|0, “r”)=2 , w(5|4, “o”)=2, w(3|5, “b”)=2 • L(0123|rib)=3+3+2=8 > L(0453|rib)=2+0+2=4 right • L(0453|rob)=2+2+2=6 > L(0123|rob)=3+0+2=5 right

CRF： • Given: “rib” 3 times, “rob” 1 times • Training: (constrains) • Iterator() • Final: • w(1|0, “r”)=3, w(2|1, “i”)=3, w(3|2, “b”)=2 • w(4|0, “r”)=2 , w(5|4, “o”)=2, w(3|5, “b”)=2 • L(0123|rib)=3+3+2=8 > L(0453|rib)=2+0+2=4 right • L(0453|rob)=2+2+2=6 > L(0123|rob)=3+0+2=5 right • P(0123|rob) = 5/(6+5) • P(0453|rob) = 6/(6+5) • Inference: • L(0453|rob)=2+2+2=6 > L(0123|rob)=3+0+2=5 • P(0453|rob) = 6/(6+5)

X y1 y2 y3 y4 y5 y6 Example • Second order CRFs

Example • Second-order CRFs • Features: • Edge feature type 1 : e1 • State feature type 1 : s1 • Edge feature type 2 : e2 • State feature type 2 : s2

y1 y2 y3 X CRF • CRFs have all the advantages of MEMMs without label bias problem • MEMM uses per-state exponential model for the conditional probabilities of next states given the current state • CRF has a single exponential model for the joint probability of the entire sequence of labels given the observation sequence

Thanks!

CRFs

CRFs

Presentation Transcript

MTN-028 Clinical Management CRFs

Associative Hierarchical CRFs for Object Class Image Segmentation

Analyzing Semantic Segmentation Using Hybrid Human-Machine CRFs

The ‘label bias’ problem: MEMMs and CRFs

Efficient Inference for Fully-Connected CRFs with Stationarity

Associative Hierarchical CRFs for Object Class Image Segmentation

Modeling Updates CRFS—Technical Meeting November 14, 2012

March 2010 CRFS Meeting

CRFS Technical Meeting LC Operations Update November 14, 2012

CRFS November 14, 2012

CRFS Technical Meeting UC Operations Update

CRFS Technical Meeting LC Operations Update March 29, 2011

CRFS Technical Meeting UC Operations Update

MEMMs/CMMs and CRFs

CRFS Technical Meeting LC Operations Update March 30, 2010

Evidence-Specific Structures for Rich Tractable CRFs

CRFs for ASR: Extending to Word Recognition

Fontenelle Reservoir Overview CRFS Meeting, November 19, 2009

Pair-HMMs and CRFs

CRFs for ASR: Extending to Word Recognition

Structured Prediction with Perceptrons and CRFs

Tree-edit CRFs for RTE