Named-Entity Recognition with Character-Level Models

Named-Entity Recognition with Character-Level Models Dan Klein, Joseph Smarr, Huy Nguyen, and Christopher D. Manning Stanford University CoNLL-2003: Seventh Conference on Natural Language Learning

Unknown Words are a Central Challenge for NER • Recognizing known named-entities (NEs) is relatively simple and accurate • Recognizing novel NEs requires recognizing context and/or word-internal features • External context and frequent internal words (e.g. “Inc.”) are most commonly used features • Internal composition of NEs alone provide surprisingly strong evidence for classification (Smarr & Manning, 2002) • Staffordshire • Abdul-Karim al-Kabariti • CentrInvest

Are Names Self-Describing? • NO: names can be opaque/ambiguous Word-Level: “Washington” occurs as LOC, PER, and ORG Char-Level: “–ville” suggests LOC, but exceptions like “Neville” • YES: names can be highly distinctive/descriptive Word-Level: “National Bank” is a bank (i.e. ORG) Char-Level: “Cotramoxazole” is clearly a drug name • Question: Overall, how informative are names alone?

How Internally Descriptive are Isolated Named Entities? • Classification accuracy of pre-segmented CoNLL NEs without context is ~90% • Using character n-grams as features instead of words yields 25% error reduction • On single-word unknown NEs, word model is at chance; char n-gram model fixes 38% of errors NE Classification Accuracy (%) [not CoNLL task]

#Tom#, #Tom, Tom#, #To, Tom, om#, #T, To, om, m#, T, o, m #Tom# Exploiting Word-Internal Features • Many existing systems use some word-internal features (suffix, capitalization, punctuation, etc.) • e.g. Mikheev 97, Wacholder et al 97, Bikel et al 97 • Features usually language-dependent (e.g. morphology) • Our approach: use char n-grams as primary representation • Use all substrings as classification features: • Char n-grams subsume word features • Features are language-independent (assuming its alphabetic) • Similar in spirit to Cucerzan and Yarowsky (99), but uses ALL char n-grams vs. just prefix/suffix

Character-Feature Based Classifier • Model I: Independent classification at each word • maxent classifiers, trained using conjugate gradient • equal-scale gaussian priors for smoothing • trained models with >800K features in ~2 hrs • POS tags and contextual features complement n-grams

Character-Based CMM • Model II: Joint classifications along the sequence • Previous classification decisions are clearly relevant: • “Grace Road” is a single location, not a person + location • Include neighboring classification decisions as features • Perform joint inference across chain of classifiers • Conditional Markov Model (CMM, aka. maxent Markov model) • Borthwick 1999, McCallum et al 2000

Character-Based CMM • Final extra features: • Letter-type patterns for each word • United Xx, 12-month  d-x, etc. • Conjunction features • E.g., previous state and current signature • Repeated last words of multi-word names • E.g., Jones after having seen Doug Jones • … and a few more

Final Results • Drop from English dev to test largely due to inconsistent labeling • Lack of capitalization cues in German hurts recall more because maxent classifier is precision-biased when faced with weak evidence

Conclusions • Character substrings are valuable and underexploited model features • Named entities are internally quite descriptive • 25-30% error reduction vs. word-level models • Discriminative maxent models allow productive feature engineering • 30% error reduction vs. basic model • What distinguishes our approach? • More and better features • Regularization is crucial for preventing overfitting

Named-Entity Recognition with Character-Level Models

Named-Entity Recognition with Character-Level Models

Presentation Transcript

Named Entity Recognition

Exploiting Domain Structure for Named Entity Recognition

Named Entity Recognition

CS544: Named Entity Recognition and Classification

Named Entity Recognition in Tweets: TwitterNLP

Biomedical Named Entity Recognition

Named Entity Recognition

Structure Learning for NLP Named-entity recognition using generative models

Improving Machine Translation Quality with Automatic Named Entity Recognition

Named Entity Recognition

NAMED ENTITY RECOGNITION WITH RANDOM FORESTS AND BAYESIAN OPTIMIZATION

Character Gazetteer for Named Entity Recognition with Linear Matching Complexity

Unsupervised Models for Named Entity Classifcation

NAMED ENTITY RECOGNITION

Named Entity Recognition (NER) with NLTK

Named Entity Recognition

CS544: Named Entity Recognition and Classification

Myanmar Named Entity Recognition with Hidden Markov Model

How Does Named Entity Recognition Work?