40 likes | 342 Views
Lance Ramshaw (with Ralph Weischedel). BBN. Ontobank Coreference. Part of the multi-site Ontobank effort Intended to combine with word-sense and propositional structure to capture lexical semantics Target coreference types: Names, nominals, pronouns
E N D
Ontobank Coreference • Part of the multi-site Ontobank effort • Intended to combine with word-sense and propositional structure to capture lexical semantics • Target coreference types: • Names, nominals, pronouns • Attributes not corefed (“Bush is the president.”) • Generic or underspecified nominals are not corefed, though pronouns may still refer to them • Appositives • These can then be treated like copulas • Temporal expressions • Definite references to events • Intended for broad coverage • Initial testing being done on Penn Treebank data
Examples • Names/Nominals/Pronouns • Elco Industries Inc. said itexpects net income in the year ending June 30 , 1990 , to fall below a recent analyst 's estimate of $ 1.65 a share . The Rockford , Ill. maker of fasteners also said it expects to post sales in the current fiscal year that are `` slightly above '' fiscal 1989 sales of $ 155 million . • Appositive constructions Heads and attributes in • the PhacoFlex intraocular lens <HEAD>, the first foldable silicone lens available for cataract surgery <ATTRIB> • Events/Verbs • Sales of passenger cars grew 22%. The strong growth followed year-to-year increases. • Temporal expressions • John spent three years in jail. In that time …
Results • Corpus: WSJ • Mention extents extracted automatically from the Treebank trees • Annotated so far: ~300K words (4 annotators) • Annotation speed: ~4500 words / hour • Double-annotation and adjudication of 100K words: ~60 hours • Interannotator agreement (using MUC coref measure) • Coreference: • ~84% measured between annotators • ~90% between annotator and the adjudicated version • Apposition: • ~90% measured between annotators • ~94% between annotator and the adjudicated version