250 likes | 356 Views
Kernels for Relation Extraction & Semi-Markov Models. William Cohen 3-27-2007. …and announcements. Projects and such: Last class is (officially) Thus May 3. Projects due Fri May 11. But I was soft about allowing small projects…so We’ll meet May 8 and May 10 also
E N D
Kernels for Relation Extraction& Semi-Markov Models William Cohen 3-27-2007
…and announcements • Projects and such: • Last class is (officially) Thus May 3. • Projects due Fri May 11. • But I was soft about allowing small projects…so • We’ll meet May 8 and May 10 also • Send me mail if you can’t attend • Project presentations start April 17 – two per session, 30min each, plus questions. • Preliminary reports are ok in your presentation
Announcements • Lectures thru mid-April: • SRL overview on this Thus • Bootstrapping next week • Critique for Etzioni paper due Tues. • Please keep talks to 20min talking
Kernels vs Structured Output Spaces • Two kinds of structured learning: • HMMs, CRFs, VP-trained HMM, structured SVMs, stacked learning, ….: the output of the learner is structured. • Eg for linear-chain CRF, the output is a sequence of labels—a string Yn • Bunescu & Mooney (EMNLP, NIPS): the input to the learner is structured. • EMNLP: structure derived from a dependency graph. New!
x x’ x1× x2× x3 × x4× x5 = 4*1*3*1*4 = 48 features x3 x2 x4 x5 x1 K( x1 × … × xn, y1 × … × yn ) = ( x1 × … × xn )∩ (y1 × … × yn) …
and the NIPS paper… • Similar representation for relation instances: x1 × … × xn where each xi is a set…. • …but instead of informative dependency path elements, the x’s just represent adjacent tokens. • To compensate: use a richer kernel
Subsequence kernels [Lohdi et al, JMLR 2002] • Example strings: • “Elvis Presley was born on Jan 8” s1) PERSON was born on DATE. • “William Cohen was born in New York City on April 6” s2) PERSON was born in LOCATION on DATE. • Plausible pattern: • PERSON was born … on DATE. • What we’ll actually learn: • u = PERSON … was … born … on … DATE. • u matches s if exists i=i1,…,in so that s[i]=s[i1]…s[in]=u • For string s1, i=1234. For string s2, i=12367 i=i1,…,in are increasing indices in s
Subsequence kernels s1) PERSON was born on DATE. s2) PERSON was born in LOCATION on DATE. • Pattern: • u = PERSON … was … born … on … DATE. • u matches s if exists i=i1,…,in so that s[i]=s[i1]…s[in]=u • For string s1, i=1234. For string s2, i=12367 • How to we say that s1 matches better than s2? • Weight a match of s to u by λlength(i) wherelength(i)=in-i1+1 • Now let’s define K(s,t) = the sum over all u that match both s and t of matchWeight(u,s)*matchweight(u,t)
K’i(s,t) = #patterns u that match s and t where the last index is at the very end of s and t. These recursions allow dynamic programming
Subsequence kernel with features • set of all sparse subsequencesu of x1 × … × xn witheach u downweighted according to sparsity Relaxation of old kernel: We don’t have to match everywhere, just at selected locations For every position we decide to match at, we get a penalty of λ To pick a “feature” inside (x1 …xn)’ Pick a subset of locations i=i1,…,ikand then Pick a feature value in each location In the preprocessed vector x’ weight every feature for i by λlength(i) = λik-i1+1
Subsequence kernel or Where c(x,y) = Number of ways x and y match (i.e number of common features)
* c(x,t[j]) all j Number of ways x and t[j] match (i.e number of common features)
* c(x,t[j]) * c(x,t[j]) all j
Additional details • Special domain-specific tricks for combining the subsequences for what matches in the fore, aft, and between sections of a relation-instance pair. • Subsequences are of length less than 4. • Is DP needed for this now? • Count fore-between, between-aft, and between subsequences separately.
Results Protein-protein interaction
t x y State-of-the-art NER: Sequential Word Classification I met Prof. F. Douglas at the zoo Question: how can we guide this using a dictionary D? Simple answer: make membership in D a feature fd what about a dictionary entry like “Fred Douglis”?
Semi-Markov models t x y l,u x y
Prediction Viterbi finds this efficiently. Many models fit this framework
Proposed Semi-Markov model previous label Start of Sj j-th label end of Sj
Modified Viterbi Best segmentation ending at position i and assigned label y Maximum segment length L=4 V(1,L) V(2,L) V(3,L) V(4,L) V(5,L) V(1,P) V(2,P) V(3,P) V(4,P) V(5,O) V(1,O) V(2,O) V(3,O) V(4,O) V(5,P)?
[Sarawagi & Cohen, NIPS 2004] Internal dictionary: formed from training examples External dictionary: from external source