C hunking with Support Vector Machines

Chunking with Support Vector Machines Zheng Luo

Background • Chunking • Identify proper chunks from a sequence of tokens. • Classify these chunks into some grammatical classes. • Support Vector Machine • Soft margin hyperplane. • Polynomial kernel K(xi, xj) = (xi∙xj+ 1)d for combinations of features up to d.

Chunk Representations • IOB1 • I - Current token is inside of a chunk. • O - Current token is outside of any chunk. • B - Current token is the beginning of a chunk which immediately follows another chunk. • IOB2 • B tag is given for every token which exists at the beginning of a chunk. • IOE1 • E tag is used to mark the last token of a chunk immediately preceding another chunk. • IOE2 • E tag is given for every token which exists at the end of a chunk.

Chunk Representations • IOBES • B - Current token is the start of a chunk consisting of more than one token. • E - Current token is the end of a chunk consisting of more than one token. • I - Current token is a middle of a chunk consisting of more than two tokens. • S - Current token is a chunk consisting of only one token. • O - Current token is outside of any chunk.

Chunking with SVMs • One vs. All Others • Requires K SVMs for K different classes. • Pairwise Classification • K × (K − 1)/2 SVMs for K different classes. • Pairwise classification is used in this paper: • Better performance. • Tractable for small size of training data for individual SVMs.

Feature Design • Surrounding Context • Two direction • Forward Parsing • Backward Parsing

Weighted Voting • There are multiple systems of SVMs due to different combinations of chunk representations and parsing directions. Each system contains K × (K − 1)/2 SVMs for pairwise classification. • Uniform Weights (2) Cross Validation Final voting weights are given by the average of the N accuracies using N-fold cross validation.

Weighted Voting (3) VC Bound w = 1 − VC bound D ≈ max distance from the origin to a training sample

Weighted Voting (4) Leave-One-Out Bound w = 1 − El

Experiment Setting • 2 parsing directions for each of 4 chunk representations. • In total 8 systems of SVMs. • Convert results to one of 4 different chunk representations. • Make the results comparable in a uniform representation (for weighted voting). • 4 uniform representations and 4 types of weights. • 16 results for a given dataset using the 8 systems of SVMs.

Experiment Results Accuracy measure: baseNP-L: some experiments are omitted since the dataset is too large. Accuracy vs. Chunk Representation: SVMs perform well regardless of the chunk representation, since SVMs have a high generalization performance and a potential to select the optimal features for the given task.

Experiment Results Effects of Weighted Voting: By applying weighted voting, higher accuracy is achieved than any of single representation system regardless of the voting weights. VC bound has a potential to predict the error rate for the “true” test data accurately. Performance: VC bound is nearly the same as cross validation. Leave-one-out bound is worse.

C hunking with Support Vector Machines

C hunking with Support Vector Machines

Presentation Transcript

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines

Support Vector Machines