1 / 21

SIMS 290-2: Applied Natural Language Processing

SIMS 290-2: Applied Natural Language Processing. Marti Hearst Sept 20, 2004. Today. Handout: basic English grammar Determine time for a one-time lab Begin chunking/shallow parsing. Shallow (Chunk) Parsing. Goal: divide a sentence into a sequence of chunks.

braden
Download Presentation

SIMS 290-2: Applied Natural Language Processing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. SIMS 290-2: Applied Natural Language Processing Marti Hearst Sept 20, 2004

  2. Today • Handout: basic English grammar • Determine time for a one-time lab • Begin chunking/shallow parsing

  3. Shallow (Chunk) Parsing Goal: divide a sentence into a sequence of chunks. • Chunks are non-overlapping regions of a text [I] saw [a tall man] in[the park]. • Chunks are non-recursive • A chunk can not contain other chunks • Chunks are non-exhaustive • Not all words are included in chunks Slide modified from Steven Bird's

  4. Chunk Parsing Examples • Noun-phrase chunking: [I] saw [a tall man] in[the park]. • Verb-phrase chunking: The man who[was in the park] [saw me]. • Prosodic chunking: [I saw] [a tall man] [in the park]. • Question answering: • What[Spanish explorer]discovered [the Mississippi River]? Slide modified from Steven Bird's

  5. Shallow Parsing: Motivation • Locating information • e.g., text retrieval • Index a document collection on its noun phrases • Ignoring information • Generalize in order to study higher-level patterns • e.g. phrases involving “gave” in Penn treebank: • gave NP; gave up NP in NP; gave NP up; gave NP help; gave NP to NP • Sometimes a full parse has too much structure • Too nested • Chunks usually are not recursive Slide modified from Steven Bird's

  6. Representation • BIO (or IOB)Trees Slide modified from Steven Bird's

  7. Comparison with Full Syntactic Parsing • Parsing is usually an intermediate stage • Builds structures that are used by later stages of processing • Full parsing is a sufficient but not necessary intermediate stage for many NLP tasks • Parsing often provides more information than we need • Shallow parsing is an easier problem • Less word-order flexibility within chunks than between chunks • More locality: • Fewer long-range dependencies • Less context-dependence • Less ambiguity Slide modified from Steven Bird's

  8. Chunks and Constituency Constituents: [[a tall man] [in [the park]]]. Chunks:[a tall man] in[the park]. • A constituent is part of some higher unit in the hierarchical syntactic parse • Chunks are not constituents • Constituents are recursive • But, chunks are typically subsequences of constituents • Chunks do not cross major constituent boundaries Slide modified from Steven Bird's

  9. Chunk Parsing in NLTK • Chunk parsers usually ignore lexical content • Only need to look at part-of-speech tags • Possible steps in chunk parsing • Chunking, unchunking • Chinking • Merging, splitting • Evaluation • Compare to a Baseline • Evaluate in terms of • Precision, Recall, F-Measure • Missed (False Negative), Incorrect (False Positive) Slide modified from Steven Bird's

  10. Chunking • Define a regular expression that matches the sequences of tags in a chunk A simple noun phrase chunk regexp: (Note that <NN.*> matches any tag starting with NN) <DT>? <JJ>* <NN.?> • Chunk all matching subsequences: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN [the/DTlittle/JJcat/NN] sat/VBD on/IN[the/DTmat/NN] • If matching subsequences overlap, first 1 gets priority Slide modified from Steven Bird's

  11. Unchunking • Remove any chunk with a given pattern • e.g., unChunkRule(‘<NN|DT>+’, ‘Unchunk NNDT’) • Combine with Chunk Rule <NN|DT|JJ>+ • Chunk all matching subsequences: • Input: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN • Apply chunk rule [the/DTlittle/JJcat/NN] sat/VBD on/IN[the/DTmat/NN] • Apply unchunk rule [the/DTlittle/JJcat/NN] sat/VBD on/INthe/DTmat/NN

  12. Chinking • A chink is a subsequence of the text that is not a chunk. • Define a regular expression that matches the sequences of tags in a chink A simple chink regexp for finding NP chunks: (<VB.?>|<IN>)+ • First apply chunk rule to chunk everything • Input: the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN • ChunkRule('<.*>+', ‘Chunk everything’) [the/DT little/JJ cat/NN sat/VBD on/IN the/DT mat/NN] • Apply Chink rule above: [the/DTlittle/JJcat/NN]sat/VBD on/IN[the/DTmat/NN] Chunk Chink Chunk Slide modified from Steven Bird's

  13. Merging • Combine adjacent chunks into a single chunk • Define a regular expression that matches the sequences of tags on both sides of the point to be merged • Example: • Merge a chunk ending in JJ with a chunk starting with NN MergeRule(‘<JJ>’, ‘<NN>’, ‘Merge adjs and nouns’) [the/DTlittle/JJ][cat/NN] sat/VBD on/IN the/DT mat/NN [the/DTlittle/JJcat/NN] sat/VBD on/IN the/DT mat/NN • Splitting is the opposite of merging Slide modified from Steven Bird's

  14. Tokens and Labels in NLTK • Tokens are at many levels of description • Document • Sentence • Word • Can have multiple representations at the same level • A sentence can be marked up with TREE and WORDS simultaneously • A word can have both TEXT and POS (or TAG)

  15. Applying Chunking to Treebank Data

  16. Usually resolve this kind of problem by checking out the API: http://nltk.sourceforge.net/api-1.4/index.html But not all that helpful in this case. Tutorial has the answer.

  17. Cascaded Chunking Slide modified from Steven Bird's

  18. Next Time and Upcoming • Finish Shallow Parsing • Evaluating Shallow Parsing Results • More examples of chunk/chink/unchunk rules • Revisit topics from previous week • Shallow Parsing Assignment • Sent out Tues or Wed • Due on Wed Sept 29 • Next week: • Read paper on end-of-sentence disambiguation • Presley and Barbara lecturing on categorization

More Related