Towards a semantic extraction of named entities

Towards a semantic extraction of named entities Diana Maynard, Kalina Bontcheva, Hamish Cunningham University of Sheffield, UK

Introduction • Challenges posed by progression from traditional IE to a more semantic representation of NEs • What techniques are best for the deeper level of analysis necessary? • Can traditional rule-based methods cope with such a transition, or does the future lie solely with machine learning?

The ACE program “A program to develop technology to extract and characterise meaning from human language” Aims: • produce structured information about entities, events and the relations that hold between them • promote design of more generic systems rather than those tuned to a very specific domain and text type (as with MUC)

The ACE tasks • Identification of entities and classification into semantic types (Person, Organisation, Location, GPE, Facility) • Identification and coreference of all mentions of each entity in the text (name, pronominal, nominal) • Identification of relations holding between such entities

<entity ID="ft-airlines-27-jul-2001-2" GENERIC="FALSE" entity_type = "ORGANIZATION"> <entity_mention ID="M003" TYPE = "NAME" string = "National Air Traffic Services"> </entity_mention> <entity_mention ID="M004" TYPE = "NAME" string = "NATS"> </entity_mention> <entity_mention ID="M005" TYPE = "PRO" string = "its"> </entity_mention> <entity_mention ID="M006" TYPE = "NAME" string = "Nats"> </entity_mention> </entity>

The MACE System • Rule-based NE system developed within GATE, adapted from ANNIE • PRs: tokeniser, sentence splitter, POS tagger, gazetteer, semantic tagger, orthomatcher, pronominal and nominal coreferencer • Also: genre ID, switching controller to select different PRs automatically

Differences between ANNIE and MACE • Locations  Location / GPE • GPEs have roles (GPE, Per, Org, Loc) • New type Facility (subsumes some Orgs) • Metonymy means context is necessary for disambiguation (e.g. England cricket team vs England country) • No Date, Time, Money, Percent, Address, Identifier

What does this mean in practical terms? • Separation of specific from general information makes adaptation easier • Reclassification of gazetteers unnecessary • Changes mainly to semantic grammars to - use different gazetteer lookups • use more contextual information • group rules together differently

Semantic Grammars • ANNIE uses 21 phases, 187 rules, 9 entity types (av. 20.8 rules per entity type) • MACE uses 15 phases, 180 rules, 5 entity types (av. 36 rules per entity type) • The important factor is the increased complexity of new rules, rather than the number • Rules may be hand-crafted, but an experienced JAPE user can write several rules per minute • 6 weeks for adaptation

Evaluation (1)

Evaluation (2) • NEWS – 92 articles (business news) • ACE – 86 broadcast news from September 2002 evaluation • Difference on ACE task • MACE on MUC-style annotations • GPEs are left as GPE (so count as errors) • GPEs are mapped to Locations

Comparison of ANNIE vs MACE 72% Precision, 84% Recall if GPEs mapped to Locations

Conclusions • MACE is a rule-based NE system, in contrast with most systems which use ML. • Advantages that doesn’t require much training data, and is fast to adapt because of its robust design • If large amounts of training data are available, HMM-based systems tend to perform slightly better • Rule-based systems tend to be good at recall but sometimes low on precision unless supported additionally by ML methods

Towards a semantic extraction of named entities

Towards a semantic extraction of named entities

Presentation Transcript

Semantic Relation Extraction for Linking Named Entities to Biomedical Databases

Towards Semantic Web Mining

Indexing concepts and/or named entities

Relational Duality: Unsupervised Extraction of Semantic Relations between Entities on the Web

Towards Semantic Health Assistants

LINDEN : Linking Named Entities with Knowledge Base via Semantic Knowledge

Using Wikipedia for Hierarchical Finer Categorization of Named Entities

Semantic Annotation of Street-level Geospatial Entities

Named Entities in Domain Unlimited Speech Translation

Towards an Understanding of Semantic Prosody

Towards a semantic web

Towards Semantic Web engineering

Towards a new generation of semantic web applications

LINDEN: Linking Named Entities with Knowledge Base via Semantic Knowledge

Towards a Semantic Web

Towards a Semantic Wikipedia: WikiData

Named Entity Extraction

Identification of Composite Named Entities in a Spanish Textual Database

Towards a Semantic Web

Semantic Web Towards a Web of Knowledge - Introduction

Iterative Set Expansion of Named Entities using the Web