240 likes | 266 Views
Backward Machine Transliteration by Learning Phonetic Similarity. Advisor : Dr. Hsu Presenter : Chien Shing Chen Author: Wei-Hao Lin and Hsin-His Chen. PRESENTED AT SIXTH CONFERENCE ON NATURAL LANGUAGE LEARNING, TAIPEI, TAIWAN,2002. Outline. Motivation Objective Introduction
E N D
Backward Machine Transliteration by Learning Phonetic Similarity Advisor :Dr. Hsu Presenter: Chien Shing Chen Author: Wei-Hao Lin and Hsin-His Chen PRESENTED AT SIXTH CONFERENCE ON NATURAL LANGUAGE LEARNING, TAIPEI, TAIWAN,2002
Outline • Motivation • Objective • Introduction • Grapheme-to-Phoneme(音素,音位) Transformation • Similarity Measurement • Learning Phonetic Similarity • Experimental Result • Conclusions • Personal Opinion
Motivation • a similarity-based framework to model the task of backward transliteration • a learning algorithm to automatically acquire phonetic similarities from a corpus • Backward transliteration: from a transliteration to original language, like “本拉登” =>Bin Laden
Objective • Backward machine transliteration by learning phonetic similarity • 雨果(Yu-guo) => Hugo
Introduction • IPA : International Phonetic Alphabet(國際音標) • Yu-guo =>h j u g oU • Hugo =>v k uo • Similarity Measurement
Introduction • CMU pronunciation dictionary 0.6 版 • ftp://ftp.cs.cmu.edu/project/fgdata/dict
Similarity Measurement-alignment • Set is the alphabet set of two strings S1 and S2. • ,where ‘_’ stands for space. • Space can be inserted into S1’ and S2’ • S1’ and S2’ are aligned
Similarity Measurement-score • <English,Chinese> <Hugo, Yu3-guo3> • the phoneme pair (v k uo, h j u g oU) • ={h, j, u, v, g, k, oU, uo, _}
Similarity Measurement-score • ={h, j, u, v, g, k, oU, uo, _}
Similarity Measurement-Dynamic • Dynamic programming to trade off : • alignment • similarity scoring matrix M • OPTIMAL • S1 (j h u g oU) • S2 (v k uo)
Dynamic programming-Dynamic • Set T is a n+1 by m+1 table where n is the length S1, m is the length of S2.
Learning Phonetic Similarity • develop a learning algorithm to • remove the efforts of assigning scores in the matrix • capture the subtle difference • How to prepare a training corpus, followed by the learning algorithm.
Learning Phonetic Similarity • Positive pairs: original words and the transliterated words are matched • Negative pairs: mismatch the original words and the transliterated words • Ei: original English • Ci: transliterated Chinese • Corpus with n pairs 克林頓 Clinton 本拉登 Bin Laden 魯賓遜 Robinson n positive pair n (n-1) negative pair
Learning Algorithm • Treat each training sample as a linear equation • m is the size of the phoneme sets, m=9 • wi,jis the row i and the column j of the scoring matrix • xi,j is a binary value indicating the presence of wi,j in the alignment • y is the similarity score.
Learning Algorithm • Linear equation in the corpus can be conveniently represented in the matrix form, • , R is the number of pairs in the corpus • i stands for the ith sample pair in the corpus • wi,jis the scoring matrix • xi,j is a binary value • y is the similarity score
Learning Algorithm • The criterion is the sum-of-squared error minimized. • The classical solution is to take the pseudo inverse of , i.e. ,to obtain the w that minimizes the SSE , i.e. • adopt the Widrow-Hoff rule to solve
Learning Algorithm • k stands for the kth row in the matrix X • i for the number of iterations • is the learning rate • is the momentum coefficient. • is empirically set as as follows,
Learning Algorithm • The w(i) is updated iteratively until the learned w appears to overfit. • The iterations to ensure the w will converge to a vector satisfying • Update w(i) immediately after encountering a new training sample instead of accumulating all errors of training samples • The other speed-up technique is the momentum used to damp the oscillations. .
Experiments • .corpus is consisted of 1574 pairs of <English,Chinese> names • 313 have no entries in the pronouncing dictionary. • 97 phonemes used to represent these names, in which 59 and 51 phonemes are used for Chinese and English names. • Rank is the position of the correct original word in a list of candidate words sorted.
Experiments • .
Experiments • .
Conclusions • Without any phonological analysis, the learning algorithm can acquire those similarities without human intervention.
Personal Opinion • Drawback • obtain the score matrix depend on a few empirically rule • Is the experiment tie in with the testing samples ? • Application • A different method to compute the similarity between words. • Future Work • The Widrow-Hoff rule may estimate the parameter to substitute for attempting intervention blinded. • Combine sound speech recognize with this method to output a new objectivity method