Japanese Dependency Structure Analysis Based on Maximum Entropy Models

Japanese Dependency Structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research Center, Communications Research Laboratory ‡ New York University

Outline • Background • Probability model for estimating dependency likelihood • Experiments and discussion • Conclusion

dependency 太郎は赤い赤　い太郎　はバラ　を買い　ました。バラを Taro_wa bara_wo kai_mashita Aka_i Taro red rose bought 買いました。 bunsetsu Background • Japanese dependency structure analysis 太郎は赤いバラを買いました。 Taro bought a red rose. • Preparing a dependency matrix • Finding an optimal set of dependencies for the entire sentence

Background (2) • Approaches to preparing a dependency matrix • Rule-based approach • Several problems with handcrafted rules • Coverage and consistency • The rules have to be changed according to the target domain. • Corpus-based approach

Background (3) • Corpus-based approach • Learning the likelihoods of dependencies from a tagged corpus (Collins, 1996; Fujio and Matsumoto, 1998; Haruno et al., 1998) • Probability estimation based on the maximum entropy models (Ratnaparkhi, 1997) • Maximum Entropy model • learns the weights of given features from a training corpus

Probability model :bunsetsu dependency  or • Assigning one of two tags • Whether or not there is a dependency between two bunsetsus • Probabilities of dependencies are estimated from the M. E. model. • Overall dependencies in a sentence • Product of probabilities of all dependencies • Assumption: Dependencies are independent of each other.

M. E. model

Feature sets • Basic features (expanded from Haruno’s list (Haruno, 1998)) • Attributes on a bunsetsu itself • Character strings, parts of speech, and inflection types of bunsetsu • Attributes between bunsetsus • Existence of punctuation, and the distance between bunsetsus • Combined features

a b c d Feature sets dependency • Basic features: a, b, c, d, e • Combined features • Twin: (b,c), Triplet: (b,c,e), Quadruplet: (a,b,c,d), Quintuplet: (a,b,c,d,e) 赤　い太郎　はバラ　を買い　ました。 Taro_wa bara_wo kai_mashita Aka_i Taro red rose bought Anterior bunsetsu Posterior bunsetsu e “Head” “Type” “Head” “Type”

Algorithm • Detect the dependencies in a sentence by analyzing it backwards (from right to left). • Characteristics of Japanese dependencies • Dependencies are directed from left to right • Dependencies do not cross • A bunsetsu, except for the rightmost one, depends on only one bunsetsu • In many cases, the left context is not necessary to determine a dependency • Beam search

Experiments • Using the Kyoto University text corpus (Kurohashi and Nagao, 1997) • a tagged corpus of the Mainichi newspaper • Training: 7,958 sentences (Jan. 1st to 8th) • Testing: 1,246 sentences (Jan. 9th) • The input sentences were morphologically analyzed and their bunsetsus were identified correctly.

Results of dependency analysis • When analyzing a sentence backwards, the previous context has almost no effect on the accuracy.

0.8714 Relationship between the number of bunsetsus and accuracy • The accuracy does not significantly degrade with increasing sentence length.

a b c d Features and accuracy • Experiments without the feature sets • Useful basic features • Type of the anterior bunsetsu (-17.41%) and the part-of-speech tag of the head word on the posterior bunsetsu (-10.99%) • Distance between bunsetsus (-2.50%), the existence of punctuation in the bunsetsu (-2.52%), and the existence of brackets (-1.06%) • preferential rules with respect to the features Anterior bunsetsu Posterior bunsetsu e “Head” “Type” “Head” “Type”

Features and accuracy • Experiments without the feature sets • Combined features are useful (-18.31%). • Basic features are related to each other.

Lexical features and accuracy • Experiment with the lexical features of the head word • Better accuracy than that without them (-0.84%) • Many idiomatic expressions • They had high dependency probabilities. • “応じて(oujite, according to)---決める(kimeru, decide)” • “形で(katachi_de, in the form of) ---行われる(okonawareru, be held)” • More training data • Expect to collect more of such expressions

Number of training data and accuracy • Accuracy of 81.84% even with 250 sentences • M. E. framework has suitable characteristics for overcoming the data sparseness problem.

Comparison with related works

Comparison with related works (2) • Combining a parser based on a handmade CFG and a probabilistic dependency model (Shirai, 1998) • Using several corpora: the EDR corpus, RWC corpus, and Kyoto University corpus. • Accuracy achieved by our model was about 3% higher than that of Shirai’s model. • Using a much smaller set of training data.

Comparison with related works (3) • M. E. model (Ehara, 1998) • Set of similar kinds of features to ours • Only the combination of two features • Using TV news articles for training and testing • Average sentence length = 17.8 bunsetsus • cf. 10 in the Kyoto University corpus • Difference in the combined features • We also use triplet, quadruplet, and quintuplet features (+5.86%). • Accuracy of our system was about 10% higher than Ehara’s system.

Comparison with related works (4) • Maximum Likelihood model (Fujio, 1998) • Decision tree models and a boosting method (Haruno, 1998) • Set of similar kinds of features to ours • Using the EDR corpus for training and testing • EDR corpus is ten times as large as our corpus. • Accuracy was around 85%, which is slightly worse than ours.

Comparison with related works (5) • Experiments with Fujio’s and Haruno’s feature sets • The important factor in the statistical approaches is feature selection.

Future work • Feature selection • Automatic feature selection (Berger, 1996, 1998; Shirai, 1998) • Considering new features • How to deal with coordinate structures • Taking into account a wide range of information

Conclusion • Japanese dependency structure analysis based on the M. E. model. • Dependency accuracy of our system • 87.2% using the Kyoto University corpus • Experiments without feature sets • Some basic and combined features strongly contribute to improve the accuracy. • Number of training data and accuracy • Good accuracy even with a small set of training data • M. E. framework has suitable characteristics for overcoming the data sparseness problem.

Japanese Dependency Structure Analysis Based on Maximum Entropy Models