1 / 22

Information Extraction for New Event Detection

Information Extraction for New Event Detection. Xiaoqiang Luo. Acknowledgments. Martin Franz Abe Ittycheriah Scott McCarley Salim Roukos Todd Ward. Outline. NED systems tf-idf baseline MaxEnt model: tf-idf + ACE annotation Errors and Observations Conclusions. tf-idf Baseline.

katy
Download Presentation

Information Extraction for New Event Detection

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Extraction for New Event Detection Xiaoqiang Luo

  2. Acknowledgments • Martin Franz • Abe Ittycheriah • Scott McCarley • Salim Roukos • Todd Ward

  3. Outline • NED systems • tf-idf baseline • MaxEnt model: tf-idf + ACE annotation • Errors and Observations • Conclusions

  4. tf-idf Baseline Similarity score: Confidence: d0: current doc d-: previous doc W: docs in the past window Decision: New Event if DET Curve: Varying Threshold Value

  5. Baseline Performance Window size: *Newswire stories only

  6. Why Just Words? • The tf-idf score: structure is ignored • Example “structures”: • PERSON, LOCATION, ORGANIZATION etc • Coreference information • Relation between entities

  7. AMA ManagentRole Reardon ACE: mention, entity, relation The American Medical Association voted yesterday to install Thomas R. Reardonas itspresident-elect, rejecting a strong, upstart challenge by a Districtdoctor who argued that the nation’s largest physicians’ group needs stronger ethics and new leadership. In electing Thomas R. Reardon, an Oregon general practitioner who had been the chairman of its board, members signified they did not hold him responsible for a costly gaffe last year, when the group agreed to endorse a line of Sunbeam Corp. health care products. Reardon had become chairmanof …

  8. ACE Entity and Relation Types • RelType Subtype • AT based-In • located • residence • NEAR relative-location • PART other • part-Of • subsidiary • ROLE affiliate-partner • citizen-Of • client • founder • general-staff • management • member • other • owner • SOCIAL associate • grandparent • other-personal • other-professional • other-relative • parent • sibling • spouse Entity Type: PERSON ORGANIZATION FACILITY LOCATION GPE Mention Level: NAME NOMINAL PRONOUN

  9. ME Model for NED • Probability of “new”: • MaxEnt Model • Used to rescore a top-N set of candidate documents {d_}

  10. In electing Thomas R. Reardon, an Oregon general practitioner who had been the chairman of its board, members signified they did not hold him responsible for a costly gaffe last year, when the group agreed to endorse a line of Sunbeam Corp. health care products. Reardon had become chairmanof … N1 =4 The American Medical Association voted yesterday to install Thomas R. Reardon as itspresident-elect, rejecting a strong, upstart challenge by a Districtdoctor who argued that the nation’s largest physicians’ group needs stronger ethics and new leadership. N2 =5 Past Story: Current Story: Counting Common Entities Comm: N(ent) =2 Ratio: R1=N/N1 R2=N/N2 Rc=N/(N1+N2)

  11. Features in MaxEnt Model • Example Features: • tfidf: if • R1: if • R1&Rc: if • Relation: similar

  12. ME Learning Curve Training Data: TDT3 #events: 2963 (180+) #Features: 294

  13. ME Results Summary * Training on TDT3 Test=TDT3*,TDT2,TDT4

  14. ME Model: Easier to Pick A Good Operating Point tfidf system ME system

  15. Similar on TDT4 tfidf system ME system

  16. submission ME Model on TDT5

  17. Analysis: Extra Information from ACE Entity? From nl312::ws2/h/hh

  18. Some Other Findings • Entity not covered by ACE • “Hurricane George” • First Story or First Event? • Feature or metrics computed at story-level • Example follows

  19. Analysis: Want Event, Not Story TDT3 30033: Introduction of euro 1st Story: 19981001_0635_0719_APW_ENG.tkn_RECID=4226.sent.htm (FA) Doc: 19981001_0931_1012_APW_ENG.tkn_RECID=5863.sent.htm

  20. Events Are On the Way …

  21. Conclusions • ACE used in NED • ME model useful for picking a good operating point • Benefit of ACE features should be enhanced by: • Training set with consistent annotation rules! • More entity/relation types and Events • Need for sub-document level analysis: • Document-level features not good for detecting events!

  22. THE END

More Related