CS 479, section 1: Natural Language Processing

This work is licensed under a Creative Commons Attribution-Share Alike 3.0 Unported License. CS 479, section 1:Natural Language Processing Lectures #25: MEMMs II, Named Entity Recognition Thanks to Dan Klein of UC Berkeley for some of the materials used in this lecture.

Announcements • Reading Report #10 due now • Optional Bayes Nets Exercises • In place of Mid-term exam question #4 • Due: Friday • Reading Report #11: M&S 11.1-11.3.1, 11.3.3 • Due: Monday • Project #3 • Early: Monday • Due: Wednesday

Objectives • Gain further insight into MEMMs • Understand another application: Named Entity Tagging • Reinforce the distinction between joint and conditional models • Understand the label bias problem in local conditional models • Think about other sequence labeling problems

Feature Templates Your templates do notextract the current tag. The current tag is provided behind thescenes. • Emission: • Feature: < w0=future, t0=JJ > • Feature template: < w0, t0 > • public class CurrentWordExtractor implements • FeatureTemplate<LocalContext<String, String>> { • […] • public String getName() { • return "CURWORD"; • } • […] • public void extract(LocalContext<String, String> • inputDatum, Counter<String> outputVector) { • outputVector.incrementCount(getName() + ":" + • inputDatum.getCurrentWord()); • } • }

Feature Templates • Edges or Transitions: • Feature: < t-1 =DT, t0=JJ > • Feature template: < t-1, t0 > • Higher order: < t-2, t-1, t0 > public class PreviousTagExtractor implements FeatureTemplate<LocalContext<String, String>> { […] public String getName() { return "PREVIOUS_TAG"; } […] public void extract(LocalContext<String, String> inputDatum, Counter<String> outputVector) { outputVector.incrementCount(getName() + ":" + inputDatum.getPreviousTag()); } }

Feature Templates • We discussed others: • Feature: < w0 ends in “ly”, t0 =RB > • Feature template: < 2-letter suffix(w0), t0 > • Also possible: • Mixed feature templates: < t-1, w0, t0 >

MEMMs Local Model: Mathematically: Example local context (at training time): Example feature vector:

Decoding • Decoding with MaxEnttaggers: • Like decoding in HMMs • Same techniques: Viterbi, beam search, posterior decoding • Viterbi algorithm (HMMs): • Viterbi algorithm (MEMMs): • Beam search is effective! Why?

New Problem • Named Entity Recognition

Named Entity Recognition

Named Entity Recognition Which features are dynamic? Feature Weights: Decision Point: StateforGrace Local Context

Other Applications • Word breaking • Sentence breaking • Phrasal “chunking” / shallow parsing • Information extraction • Dialog act mark-up • …

Label Bias Problem • An issue with conditional local models worth knowing about: • “Label bias” • Maxent taggers’ local scores can be near one without having both good “transitions” and “emissions” • This means that often evidence doesn’t flow properly in the MEMM (as a sequence model)

Label Bias Problem • Search space: • Think: probabilistic garden path

CRF Taggers • Newer, higher-powered discriminatively trained sequence models: • Conditional Random Fields (CRFs): normalize globally! • Voted perceptrons • Maximum Margin Markov Networks (M3Ns) • Do not decompose training into independent local regions • Con: Can be deathly slow to train – require repeated inference on training set • Pro: Avoid label bias • Differences tend not to be too important for POS tagging • Why?

CRFs are Slow to Train! • POS: 45 labels, 1,000,000 words = over a week • NER: 11 labels, 200,000 words = 2 hours • Using Mallet in the NLP Lab (2008) on new hardware

Reminder: Conditional vs. Joint • Joint model, conditional query: • In practice, we restrict the search over possible tags for w: • Conditional model (with discriminative training):

Joint vs. Conditional Pairs Maximum EntropyClassifier Will cover General “factor graphs” next Fall in CS 401R

Next Time • Parsing!

CS 479, section 1: Natural Language Processing

CS 479, section 1: Natural Language Processing

Presentation Transcript

PROCESSING OF CERAMICS AND CERMETS

Lexical Semantics and Ontologies Tutorial at the ACL/HCSnet 2006 Advanced Program in Natural Language Processing

Natural Language Processing Applications

PSYCHOLOGY OF LANGUAGE

CS 388: Natural Language Processing: Part-Of-Speech Tagging, Sequence Labeling, and Hidden Markov Models (HMMs)

Language processing: introduction to compiler construction

LING / C SC 439/539 Statistical Natural Language Processing

Statistical Natural Language Parsing

Natural Language Processing (4)

Leica Geo Office GNSS Processing

Speech perception as a window into language processing:

Natural Language Processing COMPSCI 423/723

Natural Language Processing, Linguistics and Terminology

Membranes for Gas Conditioning

WJEC GCSE H English Language Preparing for the Reading Section

Assembly Language

Image Processing

Chapter 15 Evolution

Natural Language Processing

Kernel Methods in Natural Language Processing