140 likes | 224 Views
A Self Learning Universal Concept Spotter By Tomek Strzalkowski and Jin Wang. Original slides by Iman Sen Edited by Ralph Grishman. Introduction . When this paper appeared (1996), most named entity taggers were hand-coded; work on supervised learning for NE was just beginning.
E N D
A Self Learning Universal Concept Spotter By TomekStrzalkowski and Jin Wang Original slides by Iman Sen Edited by Ralph Grishman
Introduction • When this paper appeared (1996), most named entity taggers were hand-coded; work on supervised learning for NE was just beginning. • The Universal Spotter was one of the first procedures proposed for unsupervised learning of semantic categories of names and noun phrases.
Basic Idea • Start with some examples and/or contexts for things to spot (the ‘seed’) & a large corpus • Exploit redundancy of evidence • we may be able to classify a name both because we know the name and we know its context • Use seed examples to learn indicative contexts and use these contexts to learn “new” items. • Initially precision is high, recall very low. • Iterations should increase recall, while (hopefully) maintaining/improving precision.
Seeds: What we are looking for • The seed is the initial information provided by the user. • … in the form of either Examples or Contextual Information. • Examples are taken from the text ( “Microsoft”, “toothbrushes”). • Context information can also be specified (both Internal & External). For example, “Name ends with Co.” or “appears after produced” .
The Cyclic Process • Build context rules from the seed examples. • Use these rules to find further examples of this concept in the corpus. • As we find more examples of the concepts, we can find more contextual information. • Selectively expand context rules using these new contexts. • Repeat.
Simple Example • Suppose we have the seeds “Co” and “Inc” initially and the following text. “Henry Kaufman is president of Henry Kaufman & Co., …..president of Gabelli Funds Inc. ; Claude. N . Rosenberg is named president of Thomson S.A ….” • Use “Co” and “Inc” to pick out Henry Kaufman & Co and Gabelli Funds Inc. • Use these new seeds to get contextual information such as for example, “president of” before each of the entities. • Use “president of” to find “Thomson S.A.”
The Classification Task • So our goal is to decide whether a sequence of words represents a desired entity/concept. • This is done by calculating significance weights, SW, of evidence items [features], and then combining them .
The Process: In Detail • Initially some preprocessing is done including tokenization, POS tagging and lexical normalization or stemming. • POS tagging help to delineate which sequences of words might contain the desired entities. • These become the ‘candidate items’
“Evidence Items” [features] • Consider sequence of words W1,W2,…Wm in text which is of interest. There is a window of size n on either side of the central unit where one looks for contextual information. • Then do the following: Make up pairs of (word, position), where position is one of preceding (p) context, central unit (s) or following (f) context for all words within the window of size n. Similarly make up pairs of (bigram, position). Make up triples of (word, position, distance) for the same sequence of words, where distance is the distance from W1 or Wm. (for units in W1 thru Wm take distance from Wm).
AnExample of Evidence Items • Example: ... boys kicked the doorwith rage ... with window n=2, and central unit, “the door”. • The generated tuples (called evidence items) are : (boys, p), (kicked, p), (the, s), (door, s), (with, f), (rage , f), ((boys, kicked), p), ((the, door)), s), ((with, ,rage), f), (boys, p, 2), (kicked, p, 1), (the, s, 2), (door, s, 1), (with, f, 1), (rage, f, 2), ((boys, kicked), p, 1), ((the, door)), s, 1), ((with, ,rage), f, 1)
Calculating Significance Weights for Evidence Items • Candidate items may be classified into two groups, accepted (A) and rejected (R). • Use these groups to calculate SW: where s is a constant to filter noise and f(x,X) is the frequency of x in X. • SW takes values between -1.0 & 1.0 • For some e, SW(t)>e>0 is taken as positive evidence and SW(t)<-e is taken as negative evidence. SW (t) = f(t,A)-f(t,R) f ( t , A ) + f ( t , R ) > s f(t,A)+y(t,R) 0 otherwise
Combining SW weights • Then these SW weights for a given candidate item are combined and if the result exceeds a threshold, then they become available during the next tagging stage. • the primary scheme used by the authors for combining is: x + y - xy if x>0 and y>0 x o y = x + y + xy if x<0 and y<0 x + y otherwise Note: Values still remain with [-1.0, 1.0] +
Bootstrapping The basic bootstrapping process then looks like this: • Procedure Bootstrapping • Collect seeds • l o o p • Training phase(calc. SW for each evidence item) • Tagging phase(combine SW for each candidate item) • until Satisfied.
Experiments and Results • Organization Names • Training on 7 MB WSJ corpus, Testing on 10 selected articles. • With seed context features, precision 97%, recall 49% • Reached P=95% & R= 90% after 4th cycle • Similar experiment for identifying products, lower performance (about 70% R, 70% P)