1 / 27

Fast Full Parsing by Linear-Chain Conditional Random Fields

Explore fast parsing techniques for large document collections and real-time processing using linear-chain Conditional Random Fields (CRFs) in Natural Language Processing (NLP). Learn about chunking, searching for the best parse, and experiments on the Penn Treebank Corpus.

purcella
Download Presentation

Fast Full Parsing by Linear-Chain Conditional Random Fields

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Fast Full Parsing by Linear-Chain Conditional Random Fields Yoshimasa Tsuruoka, Jun’ichi Tsujii, and Sophia Ananiadou The University of Manchester

  2. Outline • Motivation • Parsing algorithm • Chunking with conditional random fields • Searching for the best parse • Experiments • Penn Treebank • Conclusions

  3. Motivation • Parsers are useful in many NLP applications • Information extraction, Summarization, MT, etc. • But parsing is often the most computationally expensive component in the NLP pipeline • Fast parsing is useful when • The document collection is large • e.g. MEDLINE corpus: 70 million sentences • Real-time processing is required • e.g. web applications

  4. Parsing algorithms • History-based approaches • Bottom-up & left-to-right (Ratnaparkhi, 1997) • Shift-reduce (Sagae & Lavie 2006) • Global modeling • Tree CRFs (Finkel et al., 2008; Petrov & Klein 2008) • Reranking (Collins 2000; Charniak & Johnson, 2005) • Forest (Huang, 2008)

  5. Chunk parsing • Parsing Algorithm • Identify phrases in the sequence. • Convert the recognized phrases into new non-terminal symbols. • Go back to 1. • Previous work • Memory-based learning (Tjong Kim Sang, 2001) • F-score: 80.49 • Maximum entropy (Tsuruoka and Tsujii, 2005) • F-score: 85.9

  6. Parsing a sentence S VP NP NP QP VBN NN VBD DT JJ CD CD NNS . Estimated volume was a light 2.4 million ounces .

  7. 1st iteration NP QP VBN NN VBD DT JJ CD CD NNS . Estimated volume was a light 2.4 million ounces .

  8. 2nd iteration NP NP VBD DT JJ QP NNS . volume was a light million ounces .

  9. 3rd iteration VP NP VBD NP . volume was ounces .

  10. 4th iteration S NP VP . volume was .

  11. 5th iteration S was

  12. Complete parse tree S VP NP NP QP VBN NN VBD DT JJ CD CD NNS . Estimated volume was a light 2.4 million ounces .

  13. Chunking with CRFs NP QP VBN NN VBD DT JJ CD CD NNS . • Conditional random fields (CRFs) • Features are defined on states and state transitions Estimated volume was a light 2.4 million ounces . Feature function Feature weight

  14. Chunking with “IOB” tagging B : Beginning of a chunk I : Inside (continuation) of the chunk O : Outside of chunks NP QP B-NP I-NP O O O B-QP I-QP O O VBN NN VBD DT JJ CD CD NNS . Estimated volume was a light 2.4 million ounces .

  15. Features for base chunking ? VBN NN VBD DT JJ CD CD NNS . Estimated volume was a light 2.4 million ounces .

  16. Features for non-base chunking NP ? NP VBD DT JJ QP NNS . VBN NN Estimated volume volume was a light million ounces .

  17. Finding the best parse • Scoring the entire parse tree • The best derivation can be found by depth-first search.

  18. Depth first search POS tagging Chunking (base) Chunking (base) Chunking Chunking Chunking Chunking Chunking Chunking

  19. Finding the best parse

  20. Extracting multiple hypotheses from CRF CRF • A* search • Uses a priority queue • Suitable when top n hypotheses are needed • Branch-and-bound • Depth-first • Suitable when a probability threshold is given 0.2 0.3 0.18 BIOOOB BIIOOB BIOOOO

  21. Experiments • Penn Treebank Corpus • Training: sections 2-21 • Development: section 22 • Evaluation: section 23 • Training • Three CRF models • Part-of-speech tagger • Base chunker • Non-base chunker • Took 2 days on AMD Opteron 2.2GHz

  22. Training the CRF chunkers • Maximum likelihood + L1 regularization • L1 regularization helps avoid overfitting and produce compact modes • OWLQN algorithm (Andrew and Gao, 2007)

  23. Chunking performance Section 22, all sentences

  24. Beam width and parsing performance Section 22, all sentences (1,700 sentences)

  25. Comparison with other parsers Section 23, all sentences (2,416 sentences)

  26. Discussions • Improving chunking accuracy • Semi-Markov CRFs (Sarawagi and Cohen, 2004) • Higher order CRFs • Increasing the size of training data • Create a treebank by parsing a large number of sentences with an accurate parser • Train the fast parser using the treebank

  27. Conclusion • Full parsing by cascaded chunking • Chunking with CRFs • Depth-first search • Performance • F-score = 86.9 (12msec/sentence) • F-score = 88.4 (42msec/sentence) • Available soon

More Related