240 likes | 357 Views
Japanese Dependency Structure Analysis Based on Maximum Entropy Models. Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research Center, Communications Research Laboratory ‡ New York University. Outline. Background
E N D
Japanese Dependency Structure Analysis Based on Maximum Entropy Models Kiyotaka Uchimoto † Satoshi Sekine ‡ Hitoshi Isahara † † Kansai Advanced Research Center, Communications Research Laboratory ‡ New York University
Outline • Background • Probability model for estimating dependency likelihood • Experiments and discussion • Conclusion
dependency 太郎は 赤い 赤 い 太郎 は バラ を 買い ました。 バラを Taro_wa bara_wo kai_mashita Aka_i Taro red rose bought 買いました。 bunsetsu Background • Japanese dependency structure analysis 太郎は赤いバラを買いました。 Taro bought a red rose. • Preparing a dependency matrix • Finding an optimal set of dependencies for the entire sentence
Background (2) • Approaches to preparing a dependency matrix • Rule-based approach • Several problems with handcrafted rules • Coverage and consistency • The rules have to be changed according to the target domain. • Corpus-based approach
Background (3) • Corpus-based approach • Learning the likelihoods of dependencies from a tagged corpus (Collins, 1996; Fujio and Matsumoto, 1998; Haruno et al., 1998) • Probability estimation based on the maximum entropy models (Ratnaparkhi, 1997) • Maximum Entropy model • learns the weights of given features from a training corpus
Probability model :bunsetsu dependency or • Assigning one of two tags • Whether or not there is a dependency between two bunsetsus • Probabilities of dependencies are estimated from the M. E. model. • Overall dependencies in a sentence • Product of probabilities of all dependencies • Assumption: Dependencies are independent of each other.
Feature sets • Basic features (expanded from Haruno’s list (Haruno, 1998)) • Attributes on a bunsetsu itself • Character strings, parts of speech, and inflection types of bunsetsu • Attributes between bunsetsus • Existence of punctuation, and the distance between bunsetsus • Combined features
a b c d Feature sets dependency • Basic features: a, b, c, d, e • Combined features • Twin: (b,c), Triplet: (b,c,e), Quadruplet: (a,b,c,d), Quintuplet: (a,b,c,d,e) 赤 い 太郎 は バラ を 買い ました。 Taro_wa bara_wo kai_mashita Aka_i Taro red rose bought Anterior bunsetsu Posterior bunsetsu e “Head” “Type” “Head” “Type”
Algorithm • Detect the dependencies in a sentence by analyzing it backwards (from right to left). • Characteristics of Japanese dependencies • Dependencies are directed from left to right • Dependencies do not cross • A bunsetsu, except for the rightmost one, depends on only one bunsetsu • In many cases, the left context is not necessary to determine a dependency • Beam search
Experiments • Using the Kyoto University text corpus (Kurohashi and Nagao, 1997) • a tagged corpus of the Mainichi newspaper • Training: 7,958 sentences (Jan. 1st to 8th) • Testing: 1,246 sentences (Jan. 9th) • The input sentences were morphologically analyzed and their bunsetsus were identified correctly.
Results of dependency analysis • When analyzing a sentence backwards, the previous context has almost no effect on the accuracy.
0.8714 Relationship between the number of bunsetsus and accuracy • The accuracy does not significantly degrade with increasing sentence length.
a b c d Features and accuracy • Experiments without the feature sets • Useful basic features • Type of the anterior bunsetsu (-17.41%) and the part-of-speech tag of the head word on the posterior bunsetsu (-10.99%) • Distance between bunsetsus (-2.50%), the existence of punctuation in the bunsetsu (-2.52%), and the existence of brackets (-1.06%) • preferential rules with respect to the features Anterior bunsetsu Posterior bunsetsu e “Head” “Type” “Head” “Type”
Features and accuracy • Experiments without the feature sets • Combined features are useful (-18.31%). • Basic features are related to each other.
Lexical features and accuracy • Experiment with the lexical features of the head word • Better accuracy than that without them (-0.84%) • Many idiomatic expressions • They had high dependency probabilities. • “応じて(oujite, according to)---決める(kimeru, decide)” • “形で(katachi_de, in the form of) ---行われる(okonawareru, be held)” • More training data • Expect to collect more of such expressions
Number of training data and accuracy • Accuracy of 81.84% even with 250 sentences • M. E. framework has suitable characteristics for overcoming the data sparseness problem.
Comparison with related works (2) • Combining a parser based on a handmade CFG and a probabilistic dependency model (Shirai, 1998) • Using several corpora: the EDR corpus, RWC corpus, and Kyoto University corpus. • Accuracy achieved by our model was about 3% higher than that of Shirai’s model. • Using a much smaller set of training data.
Comparison with related works (3) • M. E. model (Ehara, 1998) • Set of similar kinds of features to ours • Only the combination of two features • Using TV news articles for training and testing • Average sentence length = 17.8 bunsetsus • cf. 10 in the Kyoto University corpus • Difference in the combined features • We also use triplet, quadruplet, and quintuplet features (+5.86%). • Accuracy of our system was about 10% higher than Ehara’s system.
Comparison with related works (4) • Maximum Likelihood model (Fujio, 1998) • Decision tree models and a boosting method (Haruno, 1998) • Set of similar kinds of features to ours • Using the EDR corpus for training and testing • EDR corpus is ten times as large as our corpus. • Accuracy was around 85%, which is slightly worse than ours.
Comparison with related works (5) • Experiments with Fujio’s and Haruno’s feature sets • The important factor in the statistical approaches is feature selection.
Future work • Feature selection • Automatic feature selection (Berger, 1996, 1998; Shirai, 1998) • Considering new features • How to deal with coordinate structures • Taking into account a wide range of information
Conclusion • Japanese dependency structure analysis based on the M. E. model. • Dependency accuracy of our system • 87.2% using the Kyoto University corpus • Experiments without feature sets • Some basic and combined features strongly contribute to improve the accuracy. • Number of training data and accuracy • Good accuracy even with a small set of training data • M. E. framework has suitable characteristics for overcoming the data sparseness problem.