1 / 18

Chunking: Shallow Parsing

School of Computing FACULTY OF ENGINEERING . Chunking: Shallow Parsing. Eric Atwell, Language Research Group. Shallow Parsing. Break text up into non-overlapping contiguous subsets of tokens. Also called chunking, partial parsing, light parsing. What is it useful for? – semantic patterns

elias
Download Presentation

Chunking: Shallow Parsing

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. School of Computing FACULTY OF ENGINEERING Chunking: Shallow Parsing Eric Atwell, Language Research Group

  2. Shallow Parsing • Break text up into non-overlapping contiguous subsets of tokens. • Also called chunking, partial parsing, light parsing. • What is it useful for? – semantic patterns • Finding key “meaning-elements”: Named Entity Recognition • people, locations, organizations • Studying linguistic patterns, e.g. semantic patterns of verbs • gave NP • gave up NP in NP • gave NP NP • gave NP to NP • Can ignore complex structure when not relevant

  3. A Relationship between Segmenting and Labeling • Tokenization segments the text • Tagging labels the text • Shallow parsing does both simultaneously.

  4. Chunking vs. Full Syntactic Parsing • “G.K. Chesterton, author of The Man who was Thursday”

  5. Representations for Chunks • IOB tags • Inside, outside, and begin • In English, the start of a phrase is often marked by a function-word

  6. Representations for Chunks • Trees • Chunk structure is a two-level tree that spans the entire text, containing both chunks and non-chunks

  7. CONLL Corpus: training data for Machine Learning of chunking • From the Conference on Natural Language Learning Competition from 2000 • Goal: create machine learning methods to improve on the chunking task

  8. CONLL Corpus • Data in IOB format from WSJ Wall Street Journal: • Word POS-tag IOB-tag • Training set: 8936 sentences • Test set: 2012 sentences • Tags from the Brill tagger • Penn Treebank Tags • Evaluation measure: F-score • 2*precision*recall / (recall+precision) • Baseline was: select the chunk tag that is most frequently associated with the POS tag, F =77.07 • Best score in the contest was F=94.13

  9. Chunking with Regular Expressions • This time we write regex’s over TAGS rather than characters • <DT><JJ>?<NN> • <NN.*> • <JJ|NN>+ • Compile them with parse.ChunkRule() • rule = parse.ChunkRule(‘<DT|NN>+’) • chunkparser = parse.RegexpChunk([rule], chunk_node = ‘NP’) • Resulting object is a (sort-of) parse tree • Top-level node called S • Chunks are labelled NP

  10. Chunking with Regular Expressions

  11. Chunking with Regular Expressions • Rule application is sensitive to order

  12. Chinking • Specify what does not go into a chunk. • Kind of like specifying punctuation as being not alphanumeric and spaces. • Can be more difficult to think about.

  13. Simple chink-chunk approach: function v content word-class • Regular expressions for chunks and chinks CAN get complex • BUT the whole point is to be simpler than full parsing! • SO: use a simple model which works “reasonably well” • (then tidy up afterwards…) • Chunk = nominal content-word (noun) • Chink = others (verb, pronoun, determiner, preposition, conjunction) (+adjective, adverb as a borderline category)

  14. Example • Fruit flies like a banana • fruit\N flies\N like\V a\A banana\N • [fruit flies] like a [banana] • [S [NP fruit\N flies\N NP] • [VP like\V • [NP a\A banana\N NP] • VP] • S]

  15. An alternative parse • This sentence is grammatically ambiguous: • Fruit flies like a banana • fruit\N flies\N like\V a\A banana\N [fruit flies] like a [banana] • fruit\N flies\V like\I a\A banana\N [fruit] flies like a [banana] • cf: “bank robbers like a chase” v “bread bakes in an oven” • [S [NP fruit\N NP] • [VP flies\V • [PP like\I [NP a\A banana\N NP] PP] • VP] • S]

  16. Ambiguity leads to more rules • fruit\N flies\N like\V a\A banana\N [fruit flies] like a [banana] • fruit\N flies\V like\I a\A banana\N [fruit] flies like a [banana] • BUT what about: Time flies like an arrow - time\N, time\V • time\N flies\N like\V an\A arrow\N [time flies] like an [arrow] • time\N flies\V like\I an\A arrow\N [time] flies like an [arrow] • time\V flies\N like\I an\A arrow\N time [flies] like an [arrow] • 3rd PoS-tagging gives ambiguous parse

  17. Chunking can predict prosodic breaks • http://www.acm.org/crossroads/ • An Approach for Detecting Prosodic Phrase Boundaries in Spoken English by Claire Brierley and Eric Atwell

  18. Summary • Shallow parsing is useful for: • Entity recognition • people, locations, organizations Studying linguistic patterns • gave NP • gave up NP in NP • gave NP NP • gave NP to NP Prosodic phrase breaks – pauses in speech Can ignore complex structure when not relevant Chink-chunk approach: “quick-and-dirty” chunking, content v function PoS Chink-chunk parsing is simpler than context-free grammar parsing!

More Related