210 likes | 335 Views
Information Extraction. Entity Extraction: Statistical Methods Sunita Sarawagi. What Are Statistical Methods?.
E N D
Information Extraction Entity Extraction: Statistical Methods SunitaSarawagi
What Are Statistical Methods? • “Statistical methods of entity extraction convert the extraction task to a problem of designing a decomposition of the unstructured text and then labeling various parts of the decomposition, either jointly or independently.” • Models • Token-level • Segment-level • Grammar-based • Training • Likelihood • Max-margin
Token-level Models • Sequence of tokens (characters, words, or n-grams) • Entity labels assigned to each token • Generalization of classification problem • Feature selection important
Features • Word features • Surface word itself is strong indicator of which label to use • Orthographic features • Capitalization patterns (cap-words) • Presence of special characters • Alphanumeric generalization of characters in the token • Dictionary lookup features f : (x,y, i) → R
Models for Labeling Tokens • Logistic classifier • Support Vector Machine (SVM) • Hidden Markov Models (HMMs) • Maximum entropy Markov Model (MEMM) • Conditional Markov Model (CMM) • Conditional Random Fields (CRFs) • Single joint distribution Pr(y|x) • Scoring function
Segment-level Models • Sequence of segments • Entity labels assigned to each segment • Features span multiple tokens
Entity-level Features • Exact segment match • Similarity function such as TF/IDF • Segment length
Global Segmentation Models • Probability distribution • Goal is to find segment s such that w·f(x,s) is maximized
Grammar-based Models • Production rule oriented • Produces parse trees • Scoring function for each production
Training Algorithms • Outputs some y • Sequence of labels for sequence models • Segmentation of x for segment-level models • Parse tree for grammar-based models • Argmax of s(y) = w·f(x,y) where f(x,y) is a feature vector • Two types of training methods • Likelihood-based training • Max-margin training
Likelihood Trainer • Probability distribution • Log probability distribution • Maximize weight vector w
Max-margin Training • “an extension of support vector machines for training structured models” • Find weight vector w
Inference Algorithms • Two kinds of inference queries • MAP labeling • Expected feature values • Both can be solved using dynamic programming
MAP for Sequential Labeling • Also known as the Viterbi algorithm • Find best label for x found by where n is the length of x • Runs in where m is the number of labels
MAP for Segmentations • Runs in where is size of the largest segment
MAP for Parse Trees • Best tree is where goes over all possible nonterminals • Runs in where is the total number of terminals and nonterminals
Expected Features Values for Sequential Labelings • Value at each node (dynamic programming) • Recursive algorithm • Backward recursive • Expected value of a feature
Summary • Most prominent models used • Maximum entropy taggers (MaxEnt) • Hidden Markov Models (HMMs) • Conditional Random Fields (CRFs) • CRFs are now established as state-of-the-art • Segment-level and grammar-based CRFs not as popular
Further Readings • Active learning • Bootstrapping from structured data • Transfer learning from domain adaptation • Collective inference