120 likes | 134 Views
C hunking with Support Vector Machines. Zheng Luo. Background. Chunking Identify proper chunks from a sequence of tokens. Classify these chunks into some grammatical classes. Support Vector Machine Soft margin hyperplane.
E N D
Chunking with Support Vector Machines Zheng Luo
Background • Chunking • Identify proper chunks from a sequence of tokens. • Classify these chunks into some grammatical classes. • Support Vector Machine • Soft margin hyperplane. • Polynomial kernel K(xi, xj) = (xi∙xj+ 1)d for combinations of features up to d.
Chunk Representations • IOB1 • I - Current token is inside of a chunk. • O - Current token is outside of any chunk. • B - Current token is the beginning of a chunk which immediately follows another chunk. • IOB2 • B tag is given for every token which exists at the beginning of a chunk. • IOE1 • E tag is used to mark the last token of a chunk immediately preceding another chunk. • IOE2 • E tag is given for every token which exists at the end of a chunk.
Chunk Representations • IOBES • B - Current token is the start of a chunk consisting of more than one token. • E - Current token is the end of a chunk consisting of more than one token. • I - Current token is a middle of a chunk consisting of more than two tokens. • S - Current token is a chunk consisting of only one token. • O - Current token is outside of any chunk.
Chunking with SVMs • One vs. All Others • Requires K SVMs for K different classes. • Pairwise Classification • K × (K − 1)/2 SVMs for K different classes. • Pairwise classification is used in this paper: • Better performance. • Tractable for small size of training data for individual SVMs.
Feature Design • Surrounding Context • Two direction • Forward Parsing • Backward Parsing
Weighted Voting • There are multiple systems of SVMs due to different combinations of chunk representations and parsing directions. Each system contains K × (K − 1)/2 SVMs for pairwise classification. • Uniform Weights (2) Cross Validation Final voting weights are given by the average of the N accuracies using N-fold cross validation.
Weighted Voting (3) VC Bound w = 1 − VC bound D ≈ max distance from the origin to a training sample
Weighted Voting (4) Leave-One-Out Bound w = 1 − El
Experiment Setting • 2 parsing directions for each of 4 chunk representations. • In total 8 systems of SVMs. • Convert results to one of 4 different chunk representations. • Make the results comparable in a uniform representation (for weighted voting). • 4 uniform representations and 4 types of weights. • 16 results for a given dataset using the 8 systems of SVMs.
Experiment Results Accuracy measure: baseNP-L: some experiments are omitted since the dataset is too large. Accuracy vs. Chunk Representation: SVMs perform well regardless of the chunk representation, since SVMs have a high generalization performance and a potential to select the optimal features for the given task.
Experiment Results Effects of Weighted Voting: By applying weighted voting, higher accuracy is achieved than any of single representation system regardless of the voting weights. VC bound has a potential to predict the error rate for the “true” test data accurately. Performance: VC bound is nearly the same as cross validation. Leave-one-out bound is worse.