260 likes | 268 Views
Explore the detailed analysis of learning HMM structures for information extraction tasks, comparison with existing algorithms, experimental results, and conclusions in machine learning approaches to information extraction.
E N D
School of Computer Science School of Computer Science Information Extraction with HMM Structures Learned by Stochastic Optimization Dayne Freitag and Andrew McCallum Presented by Tal Blum for the course: Machine Learning Approaches to Information Extraction and Information Integration
Outline • Background on HMM transition structure selection • The algorithm for the sparse IE task • Comparison between their algorithm and Borkar et al. algorithm • Discussion • Results
HMMs for IE • Has been successfully used in many tasks: • Speech Recognition • Information Extraction (Biker et al.,Borkar et al.) • IE in Bioinformatics (Leek) • POS Tagging (Ratnaparkhi)
Sparse Extraction task • Fields are extracted from a long document • Most of the document is irrelevant • Examples: • NE • Conference Time & Location
HMM as a dynamic BN HMM as a BN S S1 S2 S3 Obs Obs1 Obs2 Obs3 t Learning HMM Structure? BN Y X Z W
X1 X2 X3 X4 X1 X2 X3 X4 X3 X4 X2 X1 Constrained Transition
country Zip code country Zip C2 Zip code country Zip C1 St. # street St. # street St. # street HMM Structure Learning • Unlike BN structure learning • Learn the structure of the transition Matrix A • Learn structures with different number of states
Why learn HMM structure? • HMMs are not specifically suited for IE tasks • Including structural bias can reduce the amount of parameters needed to learn and therefore require less data • The parameters will be more accurate • Constrain the number of times a class can appear in a document • Can represent class length more accurately • The emission probability might be multi modal • To model class left and right context of a class for the sparse IE task
Fully Observed vs. Partially Observed • The structure learning is only required when the data is partially observed • Partially Observed – a field is represented by several states, where the label is the field • With fully observed data we can let the probabilities “learn” the structure • Edges that are not observed will get zero probability • Learning the transition structure involves incorporating new states • Naively allowing arbitrary transition will not generalize well
The Problem • How to select the additional states and the state transition structure • Manual Selection doesn’t scale well • Human intuition do not always corresponds to the best structures
The Solution • A system that automatically selects a HMM transition structure • The system starts from an initial simple model and extends it sequentially by a set of operations to search for a better model • The model quality is measured by its discrimination on validation dataset • The best model is returned • The system is comparable with human constructed HMM structures and on average outperforms them
IE with HMMs • Each extracted field has its own HMM • Each HMM contains two kinds of states: • Target states • Non-Target states • All of the fields HMM are concatenated to a whole consistent HMM • The entire document is used to train the models with no need of pre-processing
Parameter Estimation • Transition Probabilities Estimation is done with Maximum Likelihood • Unique path – ratio of counts • Non Unique path – use EM • Emission Probabilities require smoothing with priors • shrinkage with EM
Learning State-Transition Structure • States: • Target • Prefix • Suffix • Background
Model Expansion Choices • States: • Target • Prefix • Suffix • Background • Model Expansion Choices: • Lengthen a prefix • Split a prefix • Lengthen a suffix • Split a suffix • Lengthen a target string • Split a target string • Add a background state
Discussion • Structure Learning is similar to rule learning for word or boundary classification • The search for the best structure is not comprehensive • There is no attempt to generalize better by using the same emission probabilities for different states
Comparison with Bokar et. al. algorithm • Differences • Segmentation vs. • Sparse Extraction • Background and boundaries • modeling • Unique Path - don’t use EM • Backward Search vs. Forward Search • Both assume boundaries and that the position is the more relevant feature that distinguish different states
Experimental Results • Tested on 8 extraction tasks over 4 datasets • Seminar Announcements (485) • Reuter Corporate Acquisition articles (600) • Job Announcements (298) • Call For Paper (363) • Training and Testing were equal size • Average performance over 10 splits
Experimental Results • Compared to 4 other approaches • Grown HMM – the structure learned • SRV – rule learning (Freitag 1998) • Rapier – rule learning (Califf 1998) • Simple HMM • Complex HMM
Conclusions • HMMs has been proved to be state of the art method for IE • Constraining the transition structure has a crucial effect on performance • Automatic Transition Structure learning compares and even outperforms manually crafted HMMs which require hard labor for manual construction