1 / 18

Unsupervised Models for Named Entity Classifcation

Unsupervised Models for Named Entity Classifcation. Michael Collins Yoram Singer AT&T Labs, 1999. The Task. Tag phrases with “ person ”, “ organization ” or “ location ”. For example, R alph Grishman , of NYU , sure is swell. WHY?. Labeled data. Unlabeled data. Spelling Rules.

afya
Download Presentation

Unsupervised Models for Named Entity Classifcation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Unsupervised Models for Named Entity Classifcation Michael Collins Yoram Singer AT&T Labs, 1999

  2. The Task • Tag phrases with “person”, “organization” or “location”. For example, Ralph Grishman, of NYU, sure is swell.

  3. WHY? Labeled data Unlabeled data

  4. Spelling Rules • The approach uses two kinds of rules • Spelling • Simple look up to see “Honduras” is a location! • Look for words in string, like “Mr.”

  5. Contextual Rules • Contextual • Words surrounding the string • A rule that any proper name modified by an appositive whose head is “president” is a person.

  6. Two Categories of Rules • The key to the method is redundancy in the two kind of rules. …says Mr. Cooper, a vice president of… contextual spelling Unlabeled data gives us these hints! spelling contextual

  7. The Experiment • 970,000 New York Times sentences were parsed. • Sequences of NNP and NNPS were then extracted as named entity examples if they met one of two critereon.

  8. Kinds of Noun Phrases • There was an appositive modifier to the NP, whose head is a singular noun (tagged NN). • …says Maury Cooper, a vice president… • The NP is a compliment to a preposition which is the head of a PP. This PP modifies another NP whose head is a singular noun. • … fraud related to work on a federally funded sewage plant in Georgia.

  9. (spelling, context) pairs created • …says Maury Cooper, a vice president… • (Maury Cooper, president) • … fraud related to work on a federally funded sewage plant in Georgia. • (Georgia, plant_in)

  10. Rules • Set of rules • Full-string=x (full-string=Maury Cooper) • Contains(x) (contains(Maury)) • Allcap1 IBM • Allcap2 N.Y. • Nonalpha=x A.T.&T. (nonalpha=..&.) • Context = x (context = president) • Context-type = x appos or prep

  11. SEED RULES • Full-string = New York • Full-string = California • Full-string = U.S. • Contains(Mr.) • Contains(Incorporated) • Full-string=Microsoft • Full-string=I.B.M.

  12. The Algorithm • Initialize: Set the spelling decision list equal to the set of seed rules. • Label the training set using these rules. • Use these to get contextual rules. (x = feature, y = label) • Label set using contextual rules, and use to get sp. rules. • Set spelling rules to seed plus the new rules. • If less than threshold new rules, go to 2 and add 15 more. • When finished, label the training data with the combined spelling/contextual decision list, then induce a final decision list from the labeled examples where all rules are added to the decision list.

  13. Example • (IBM, company) • …IBM, the company that makes… • (General Electric, company) • ..General Electric, a leading company in the area,… • (General Electric, employer ) • … joined General Electric, the biggest employer… • (NYU, employer) • NYU, the employer of the famous Ralph Grishman,…

  14. The Power Mr. I.B.M. Two classifiers both give labels on 49.2% of unlabeled examples Agree on 99.25% of them!

  15. Evaluation • 88,962 (spelling, context) pairs. • 971,746 sentences • 1,000 randomly extracted to be test set. • Location, person, organization, noise • 186, 289, 402, 123 • Took out 38 temporal noise. • Clean Accuracy: Nc/ 962 • Noise Accuracy: Nc/(962-85)

  16. Results

  17. QUESTIONS

  18. Thank you! • www.lightrail.com/ • www.cnnfn.com/ • pbskids.org/ • www.szilagyi.us • www.dflt.org

More Related