Using Pivot/Bridge Languages

Using Pivot/Bridge Languages Matthias Eck

General Problem • Resources are available between languages A and B… and between languages B and C… but not C and A • How to train translation models between C and A? A C B

1st paper Multipath Translation Lexicon Induction via Bridge Languages • Gideon S. Mann and David Yarowsky • NAACL 2001 • Method for inducing translation lexicons based on transduction models of cognate pairs via bridge languages

Lexicon via Cognate pairs Lexicon: • Mapping of word in source language to words in target language Here: • Lexicon is built between arbitrary languages using models of cognate pairs and cognate distance

General idea dictionary cognate model Romance Family English Spanish Portuguese Italian French Romanian source bridge target

Translation pairs • Cognate pairs can make up significant portion of lexicon if languages are in the same family and close

Cognate string edit distance • Obvious condition for a good distance D • So we choose …as the translation for s

Used distance measures • L: Levenshtein distance • Minimum sum of the costs of edit operations required to transform one string into another • Deletion, Substitution, Insertion – traditional cost 1 • S: Stochastic transducers • Probabilistic costs for each possible edit operation • H: Hidden Markov Model • Each character has separate edit operation parameters

Distance Measures Variants of Levenshtein distance: • L-V: vowel substitution cost only: 0.5 • L-S/L-A: Filter probabilities obtained by S into 3 classes 0.5, 0.75, 1 • L-S: Each pair separately trained • L-A: Collectively trained for all Romance languages Limitation • Method cannot discover translation pairs with having no surface form relationship • Assumed cognate pairs: Levenshtein edit distance < 3 • Few false positives

Intra Family Translation Lexicon Induction • Family: Romance languages • Available: dictionary (English/Bridge language) • General evaluation algorithm: • Select 100 word pairs from dictionary for testing • For adaptive metrics: Select hypothesized word pairs (Edit distance < 3) as cognate pairs and train on them • For each word in source language select closest word from the 100 target words

Results Source Languages: • Spanish, French, Italian, Romanian Target Language: • Portuguese • 1000 word pairs in dictionary for Spanish/Portuguese • 900 for other language pairs

Results • Pure Levenshtein distance works surprisingly well • S gives boost on French-Portuguese • Reason could be that Spanish-Portuguese are closer than French-Portuguese • L-S usually best

Consonant-to-consonant • Consonant-to-consonant edit operations • Most probable for French – Portuguese

Analysis

Analysis - Example

Multiple bridge languages dictionary cognate model Slavic Family English Czech Russian Ukrainian Polish Serbian source bridge target

Translation Lexicon Induction Algorithm (One or more bridge languages) For each word s  S For each bridge language B Translate s → b  B t  T, Calculate D(b,t) Rank t by D(b,t) Score t using information from all bridges Select highest scored t Map s → t

Results • One bridge languages, but multiple pathes

Examples

Different Pathways • English to Portuguese (via Romance languages) • English to Norwegian (via Germanic languages) • English to Ukrainian (via Slavic languages) • Portuguese to English (via Germanic languages, French)

Results

2nd Paper Inducing Translation Lexicons via Diverse Similarity Measures and Bridge Languages • Charles Schafer and David Yarowsky • COLING 2002 • Improves results of first paper by introducing additional similarity scores between candidate translations

Covered Languages Serbian Ukrainian English Slovene Czech Slovak Bulgarian Polish Punjabi Nepali Hindi Gujarati Bengali Marathi

Serbian – Czech – English Czech – English dictionary: 171k word pairs Corpora:English: 192M wordsSerbian: 12M(News data from web) Gujarati – Hindi – English Hindi – English dictionary:74k word pairs Corpora:Gujarati: 2M Resources

Problem with Cognate Pairs Serbian Czech English favor not correct prazan prizen grace pazen patronage blank prazdny correct empty

Idea Introduce additional similarity models • Weighted Levenshtein Similarity • Context Similarity • Date distributional Similarity • Relative frequency Similarity • Burstiness Similarity and Inverse Document Frequency • Use of Additional Bridge Languages • Combine them with weighted string distance

Weighted Levenshtein Similarity • 1. Iteration: Vowel cluster operations have half the cost of single consonant substitutions, insertions and deletions • dist(vowel+, vowel+) • Use highest weighted of the top 2000 to re-estimate edit weights • Some high probability substitutions:

Context Similarity Compare narrow and wide contexts for candidates Context: bag of words (Narrow: radius 1/ Wide: radius 10) • Calculate Context on Source Language (Serbian) • Translate to English using current estimations • Compare with English Contexts via Cosine Similarity

Context Similarity - Example Nezavisnost pravo: 2 suvereniteti: 3 deklaracije: 3 pokrajina: 4 Context in Serbian Corpus with frequencies

2 1.5 1.5 1.5 4 1.5 Context Similarity - Example Nezavisnost pravo: 2 suvereniteti: 3 deklaracije: 3 pokrajina: 4 majesty declaration justice sovereignty country ornamental Translate with Initial Lexicon

2 1.5 1.5 1.5 4 1.5 Context Similarity - Example Nezavisnost pravo: 2 suvereniteti: 3 deklaracije: 3 pokrajina: 4 majesty declaration 0 0 justice sovereignty country ornamental Independence 3 1 10 0 479 836 191 0 Freedom 681 184 104 0 21 4 141 0 expression Context of Candidates in English Corpus religion

2 1.5 1.5 1.5 4 1.5 Context Similarity - Example Nezavisnost pravo: 2 suvereniteti: 3 deklaracije: 3 pokrajina: 4 majesty declaration 0 0 justice sovereignty country ornamental COS Independence 3 1 10 0 479 836 191 0 Freedom 681 184 104 0 21 4 141 0 expression Cosine Similarity finds correct candidate (Independence) religion

Date distributional Similarity • News Data: • Events are reported in parallel in multiple languages (+/- 2 days) • Construct term frequency vectors over time and compare candidates

Date distributional Similarity

Relative Frequencies • Word and translation are likely to have similar relative frequencies • Modest frequency variations are expected • Useful to rule out pairings with several orders of magnitude difference in relative frequency • Ratio of logs of frequencies correlates well with translational compatibility

Relative Frequency Similarity • Correct translation “laud” has higher RF Score than higher ranked incorrect candidates “calibre”, “quarter” and “class”

Burstiness Similarity • Define Burstiness to measure differences

Burstiness Similarity • Burstiness matches better for correct translations “laud” and “praise”

Combine the different measures • Weighted Levenshtein distance to get initial candidate pairs • Calculate 8 similarity measures • Weighted Levenshtein • Wide bag-of-words context similarity • Narrow bag of words context similarity • Local News date distribution similarity • All News date distribution similarity • IDF similarity • Burstiness similarity

Combine the different measures • Integrate similarity measures into a single similarity function: • POS SimilarityBias in favor of compatible parts of speech (N, V, ADJ)Penalty for non-matching candidates • Sort candidates for each score in decreasing orderAssign Ranks 0,1,… and normalize by count • Scoring: Similarity models have associated weights

Weight Allocation

Evaluation 3 Evaluation Criteria • Exact Match Accuracy • Percentage of correct English in the top k ranks • Median Position of the per word highest ranked correct translation

Results

Results • Improvements with second bridge language

Additional Bridge Language Work Interlingua based Statistical Machine Translation • Manuel Kauers, Stephan Vogel, Christian Fügen, Alex Waibel • ICSLP 2002 • Paper covers SMT from Text to a structured Interlingua format (IF) • Corpus English/IF is available…but we also want to translate other languages into IF? English IF

Generalized problem • Assume we have translation model F to E and G to F… but we want G to E? • Decompose: • Because: E G F

And just translating… • Experiments done during PF-STAR project 2003/2004 • Training data: 48k lines of BTEC data • Test data: 506 lines, Test set for CSTAR 2003 • Translating Chinese → Italian • Also via a bridge language Chinese → English → Italian

Using Pivot/Bridge Languages