140 likes | 156 Views
Learn about LSTM-CRF model for identifying and classifying name mentions in unstructured text. Explore input features, sentence encoding, label prediction, and various tagging models.
E N D
Outline • Name tagging • LSTM-CRF model • Overview • Input features • Sentence encoder (feature extraction) • Label prediction • Ying Lin • yinglin8@illinois.edu • Room 1115, Siebel
Name Tagging • Goal: identify and classify name mentions in unstructured text into pre-defined categories; e.g., person, organization, location, geo-political entity (GPE). • Tag scheme: • BIO: • actress/O Elizabeth/B-PER Arnold/I-PER Hopkins/I-PER Poe/I-PER and/O • in/O Boston/B-GPE on/O • BIOES: • actress/O Elizabeth/B-PER Arnold/I-PER Hopkins/I-PER Poe/E-PER and/O • in/O Boston/S-GPE on/O He was born Edgar Poe in Boston on January 19, 1809, the second child of English-born actress Elizabeth Arnold Hopkins Poeand actor David Poe Jr. PERSON GPE PERSON PERSON
Name Tagging Models • Hidden Markov Models (HMM) • Support Vector Machines (SVM) • Conditional Random Fields (CRF) • Decision trees • Bidirectional LSTM-CRF • LSTM: https://colah.github.io/posts/2015-08-Understanding-LSTMs/ • CRF: Lafferty, John, Andrew McCallum, and Fernando CN Pereira. "Conditional random fields: Probabilistic models for segmenting and labeling sequence data." (2001).
LSTM-CRF Model CRF E-PER Linear Bi-LSTM Input Sentence Features (Chiu and Nichols, 2016)
LSTM-CRF Model E-PER Word Embedding: Word2vec GloVe FastText ELMo Bert … Corpora: Wikipedia WMT Tagger Input Sentence Character-level Representation Each token in the given sentence is represented as the combination of its word embedding and character features. Word embedding Features Character-level Convolutional Network (Chiu and Nichols, 2016)
LSTM-CRF Model E-PER Tagger Input Sentence (Ma and Hovy, 2016) Character-level Representation Each token in the given sentence is represented as the combination of its word embedding and character features. Word embedding Features Character-level Convolutional Network (Chiu and Nichols, 2016)
LSTM-CRF Model E-PER Tagger Input Sentence (Lample et al., 2016) Character-level Representation Each token in the given sentence is represented as the combination of its word embedding and character features. Word embedding Features Character-level Convolutional Network (Chiu and Nichols, 2016)
LSTM-CRF Model (Liu et al., 2017)
LSTM-CRF Model E-PER Tagger The bidirectional LSTM (Long-short Term Memory, an RNN variant) processes the input sentence from both directions, encoding each token and its context into a vector (hidden states). Bidirectional LSTM Input Sentence Features (Chiu and Nichols, 2016)
LSTM-CRF Model E-PER Linear Layer The linear layer projects hidden states to the label space. Tagger 0.1 B-ORG 0.2 I-ORG 0.1 E-ORG Input Sentence 2.5 B-PER 0.1 O … … Features (Chiu and Nichols, 2016)
LSTM-CRF Model Softmax E-PER Linear Layer The linear layer projects hidden states to the label space. Tagger B-PER S-PER Input Sentence Edgar Poe Features (Chiu and Nichols, 2016)
LSTM-CRF Model CRF The CRF (Conditional Random Fields) layer models the dependencies between labels. E-PER Tagger Input Sentence Features ✓ B-PER I-PER ✗ B-ORG B-ORG (Chiu and Nichols, 2016)
LSTM-CRF Model Partial CRF (Yang et al., 2018) No linear layer (Ma and Hovy, 2016) Multiple linear layers Self attention E-PER Tagger Transformer Gated recurrent unit (GRU) (Yang et al., 2017) Input Sentence Character-level LSTM Contextualized embedding (e.g., Bert, ELMo) Hand-crafted features Other feature composition methods Features
LSTM-CRF Model • Chiu, Jason PC, and Eric Nichols. "Named entity recognition with bidirectional LSTM-CNNs." Transactions of the Association for Computational Linguistics 4 (2016): 357-370. • Lample, Guillaume, et al. "Neural Architectures for Named Entity Recognition." Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies. 2016. • Ma, Xuezhe, and Eduard Hovy. "End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF." Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2016. • Yang, Zhilin, Ruslan Salakhutdinov, and William W. Cohen. "Transfer learning for sequence tagging with hierarchical recurrent networks." arXiv preprint arXiv:1703.06345 (2017). • Liu, Liyuan, et al. "Empower sequence labeling with task-aware neural language model." Thirty-Second AAAI Conference on Artificial Intelligence. 2018. • Yang, Yaosheng, et al. "Distantly supervised ner with partial annotation learning and reinforcement learning." Proceedings of the 27th International Conference on Computational Linguistics. 2018. • Character features: Kim, Yoon, et al. "Character-aware neural language models." Thirtieth AAAI Conference on Artificial Intelligence. 2016.