210 likes | 288 Views
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License . CS 479, section 1: Natural Language Processing. Lectures # 25: MEMMs II, Named Entity Recognition. Thanks to Dan Klein of UC Berkeley for some of the materials used in this lecture. Announcements.
E N D
This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lectures #25: MEMMs II, Named Entity Recognition Thanks to Dan Klein of UC Berkeley for some of the materials used in this lecture.
Announcements • Reading Report #10 due now • Optional Bayes Nets Exercises • In place of Mid-term exam question #4 • Due: Friday • Reading Report #11: M&S 11.1-11.3.1, 11.3.3 • Due: Monday • Project #3 • Early: Monday • Due: Wednesday
Objectives • Gain further insight into MEMMs • Understand another application: Named Entity Tagging • Reinforce the distinction between joint and conditional models • Understand the label bias problem in local conditional models • Think about other sequence labeling problems
Feature Templates Your templates do notextract the current tag. The current tag is provided behind thescenes. • Emission: • Feature: < w0=future, t0=JJ > • Feature template: < w0, t0 > • public class CurrentWordExtractor implements • FeatureTemplate<LocalContext<String, String>> { • […] • public String getName() { • return "CURWORD"; • } • […] • public void extract(LocalContext<String, String> • inputDatum, Counter<String> outputVector) { • outputVector.incrementCount(getName() + ":" + • inputDatum.getCurrentWord()); • } • }
Feature Templates • Edges or Transitions: • Feature: < t-1 =DT, t0=JJ > • Feature template: < t-1, t0 > • Higher order: < t-2, t-1, t0 > public class PreviousTagExtractor implements FeatureTemplate<LocalContext<String, String>> { […] public String getName() { return "PREVIOUS_TAG"; } […] public void extract(LocalContext<String, String> inputDatum, Counter<String> outputVector) { outputVector.incrementCount(getName() + ":" + inputDatum.getPreviousTag()); } }
Feature Templates • We discussed others: • Feature: < w0 ends in “ly”, t0 =RB > • Feature template: < 2-letter suffix(w0), t0 > • Also possible: • Mixed feature templates: < t-1, w0, t0 >
MEMMs Local Model: Mathematically: Example local context (at training time): Example feature vector:
MEMMs Local Model: Mathematically: Example local context (at training time): Example feature vector:
Decoding • Decoding with MaxEnttaggers: • Like decoding in HMMs • Same techniques: Viterbi, beam search, posterior decoding • Viterbi algorithm (HMMs): • Viterbi algorithm (MEMMs): • Beam search is effective! Why?
New Problem • Named Entity Recognition
Named Entity Recognition Which features are dynamic? Feature Weights: Decision Point: StateforGrace Local Context
Other Applications • Word breaking • Sentence breaking • Phrasal “chunking” / shallow parsing • Information extraction • Dialog act mark-up • …
Label Bias Problem • An issue with conditional local models worth knowing about: • “Label bias” • Maxent taggers’ local scores can be near one without having both good “transitions” and “emissions” • This means that often evidence doesn’t flow properly in the MEMM (as a sequence model)
Label Bias Problem • Search space: • Think: probabilistic garden path
CRF Taggers • Newer, higher-powered discriminatively trained sequence models: • Conditional Random Fields (CRFs): normalize globally! • Voted perceptrons • Maximum Margin Markov Networks (M3Ns) • Do not decompose training into independent local regions • Con: Can be deathly slow to train – require repeated inference on training set • Pro: Avoid label bias • Differences tend not to be too important for POS tagging • Why?
CRFs are Slow to Train! • POS: 45 labels, 1,000,000 words = over a week • NER: 11 labels, 200,000 words = 2 hours • Using Mallet in the NLP Lab (2008) on new hardware
Reminder: Conditional vs. Joint • Joint model, conditional query: • In practice, we restrict the search over possible tags for w: • Conditional model (with discriminative training):
Joint vs. Conditional Pairs Maximum EntropyClassifier Will cover General “factor graphs” next Fall in CS 401R
Next Time • Parsing!