1 / 21

Information Extraction

Information Extraction. Entity Extraction: Statistical Methods Sunita Sarawagi. What Are Statistical Methods?.

adem
Download Presentation

Information Extraction

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Information Extraction Entity Extraction: Statistical Methods SunitaSarawagi

  2. What Are Statistical Methods? • “Statistical methods of entity extraction convert the extraction task to a problem of designing a decomposition of the unstructured text and then labeling various parts of the decomposition, either jointly or independently.” • Models • Token-level • Segment-level • Grammar-based • Training • Likelihood • Max-margin

  3. Token-level Models • Sequence of tokens (characters, words, or n-grams) • Entity labels assigned to each token • Generalization of classification problem • Feature selection important

  4. Features • Word features • Surface word itself is strong indicator of which label to use • Orthographic features • Capitalization patterns (cap-words) • Presence of special characters • Alphanumeric generalization of characters in the token • Dictionary lookup features f : (x,y, i) → R

  5. Models for Labeling Tokens • Logistic classifier • Support Vector Machine (SVM) • Hidden Markov Models (HMMs) • Maximum entropy Markov Model (MEMM) • Conditional Markov Model (CMM) • Conditional Random Fields (CRFs) • Single joint distribution Pr(y|x) • Scoring function

  6. Segment-level Models • Sequence of segments • Entity labels assigned to each segment • Features span multiple tokens

  7. Entity-level Features • Exact segment match • Similarity function such as TF/IDF • Segment length

  8. Global Segmentation Models • Probability distribution • Goal is to find segment s such that w·f(x,s) is maximized

  9. Grammar-based Models • Production rule oriented • Produces parse trees • Scoring function for each production

  10. Training Algorithms • Outputs some y • Sequence of labels for sequence models • Segmentation of x for segment-level models • Parse tree for grammar-based models • Argmax of s(y) = w·f(x,y) where f(x,y) is a feature vector • Two types of training methods • Likelihood-based training • Max-margin training

  11. Likelihood Trainer • Probability distribution • Log probability distribution • Maximize weight vector w

  12. Likelihood Trainer

  13. Max-margin Training • “an extension of support vector machines for training structured models” • Find weight vector w

  14. Max-margin Training

  15. Inference Algorithms • Two kinds of inference queries • MAP labeling • Expected feature values • Both can be solved using dynamic programming

  16. MAP for Sequential Labeling • Also known as the Viterbi algorithm • Find best label for x found by where n is the length of x • Runs in where m is the number of labels

  17. MAP for Segmentations • Runs in where is size of the largest segment

  18. MAP for Parse Trees • Best tree is where goes over all possible nonterminals • Runs in where is the total number of terminals and nonterminals

  19. Expected Features Values for Sequential Labelings • Value at each node (dynamic programming) • Recursive algorithm • Backward recursive • Expected value of a feature

  20. Summary • Most prominent models used • Maximum entropy taggers (MaxEnt) • Hidden Markov Models (HMMs) • Conditional Random Fields (CRFs) • CRFs are now established as state-of-the-art • Segment-level and grammar-based CRFs not as popular

  21. Further Readings • Active learning • Bootstrapping from structured data • Transfer learning from domain adaptation • Collective inference

More Related