380 likes | 565 Views
Bilingual Chunk Alignment In Statistical Machine Translation An introduction of Multi-Layer Filtering (MLF) algorithm Dawei Hou LING 575 MT WIN07. What is the “Chunk” here ?. I n this paper: The “Chunk” doesn’t rely on the information from tagging, parsing, syntax analyzing or segmenting
E N D
Bilingual Chunk Alignment In Statistical Machine TranslationAn introduction of Multi-Layer Filtering (MLF) algorithmDawei HouLING 575 MT WIN07
What is the “Chunk” here ? In this paper: The “Chunk” doesn’t rely on the information from tagging, parsing, syntax analyzing or segmenting A “Chunk” is a continuous words order
Why do we use “Chunk” in translations? Can leads to more fluent translations since chunk-based translations capture local reordering phenomena. Can successfully makes long sentences shorter, which benefits SMT algorithm’s performance. Obtains accurate one-to-one alignment of each pair bilingual chunks. Greatly decrease search space and time complexity during translation.
What about other approaches? What about word-based translations?
Some background SMT systems employ word-based alignment models based on the five word-based statistical models proposed by IBM. Problem:Still suffer from poor performance when used in the language pairs which have great differences in structures since these models fundamentally rely on word-level translation.
Some background Alignment algorithms based on phrases, chunks or structures and most of them based on complex syntax information. Problem:Have proven to yield poor performance when dealing with long sentences;Heavily depend on the performance of associated tools such as parsers, POS taggers ....
How do we get improvements from those problems by using chunk-based translations?
Multi-Layer Filtering algorithm To discover one-to-one pairs of bilingual chunks in the untagged well-formed bilingual sentence pairs Multi-Layers are used to extract bilingual chunks according to different features of chunks in the bilingual corpus.
Summarization of Procedures Filtering the most frequent chunks Clustering the similar words and filtering the most frequent structures Deal with the remnant fragment Keeping one-to-one alignment
Filtering the most frequent chunks -- Step 1 Assumption: The most co-occurrent word lists might be a potential chunk. Apply the formula-1 list below, we filter those word lists as initial monolingual chunks; formula-1 formula-2
The result of Filtering Step 1 An example : What || kind || of || room || do || you || want || to || reserve 1.36 1.31 0.046 0.063 10.07 0.61 2.11 0.077 你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间 0.69 0.17 1.39 0.076 7.80 0.87 0.30 1.27 4.52
Filtering the most frequent chunks -- Step 2 Now we have : All the cohesion degrees between any two adjacent words in Source and Target sentences. Applying the formula-3 list below, we will find the entire set of initial monolingual chunks; formula-3
The result of Filtering Step 2-1 In this case: n = int{ 10/4 } = 2; What || kind || of || room || do || you || want || to || reserve 1.36 1.31 0.0460.063 10.07 0.61 2.11 0.077 你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间 0.69 0.17 1.39 0.076 7.80 0.87 0.30 1.27 4.52
The result of Filtering Step 2-(1)-EN Now we get a table of the initial monolingual chunks; formula-4
The result of Filtering Step 2-(2)-EN Set threshold Dk*> 1.0 , we get : We still need more steps to do maximum matching and overlap discarding;
The result of Filtering Step 2-(3)-EN According to the maximum matching principle and Preventing overlapping problem, we need to apply : formula-4: formula-5:
The result of Filtering Step 2-(4)-EN Deal with the remnant fragment:we simply combine such individual or sequential words as a chunk. So we get a much shorter sentence lists below: What&kind&of || room || do&you || want&to || reserve
The result of Filtering Step 2-(1)-CN In this case: n = int{ 10/4 } = 2; What || kind || of || room || do || you || want || to || reserve 1.36 1.31 0.0460.063 10.07 0.61 2.11 0.077 你 || 想 || 预 || 定 || 什 || 么 || 样 || 的 || 房 || 间 0.69 0.17 1.39 0.076 7.80 0.87 0.30 1.27 4.52
The result of Filtering Step 2-(2)-CN Now we get a table of the initial monolingual chunks; formula-4
The result of Filtering Step 2-(3)-CN Set threshold Dk*> 1.0 , we get : We still need more steps to do maximum matching and overlap discarding;
The result of Filtering Step 2-(4)-CN According to the maximum matching principle : 的 By applying formula-4: ? max( D什么样的/D什么样,D的房间/D房间) = max(2.44,1.30) = 2.44
The result of Filtering Step 2-(5)-CN Deal with the remnant fragment:we simply combine such individual or sequential words as a chunk. So we get a much shorter sentence lists below: 你|| 想|| 预&定||什 & 么 & 样 &的 || 房&间
Some problems After fisrt filtering process, suppose we found an aligned chunk pairs: || at&five&o’clock || || 在 &五 &点 || But some potentially good chunks like: Might have been broken into several fragments like: Since this structure include word sequences with low frequency of occurrence (we suppose “six” is lower frequent than “five” here ) || at&six&o’clock || || at|| six|| o’clock ||
Clustering the similar words and filtering the most frequent structures Many frequent chunks have similar structures but different in detail. We can cluster similar words according to the position vectors of their behavior relative to anchor words. For all of the words in the same class, we suppose they are good chunks, then filter the most frequent structures according the method introduced before.
Clustering the similar words and filtering the most frequent structures – Step 1 In the corpus resulting from the first filtering process, find the most frequent words as anchor words, for example: Why we use most frequent words? As the anchor words are the most common words, a great deal of information can be obtained.Words in similar position vectors in relation to anchor words can be assumed to belong to similar word classes.
Clustering the similar words and filtering the most frequent structures – Step 2 Build words vectors and define the size of the window for observation.(in this case windows size = 5)For instance, we build a word vector which anchor word is “in” and we observe a candidate word “the” to be clustered falls within the window: Formula-7,8:
Clustering the similar words and filtering the most frequent structures – Step 3 In order to compare vectors fairly, these vectors must be normalized by formular-9 as follows: Example : “in/that” and “in/this”
Clustering the similar words and filtering the most frequent structures – Step 4 Measure the similarities of various vectors and cluster the words which have similar distributions relative to the anchor words: Euclidian distance: Example result:
Clustering the similar words and filtering the most frequent structures – Step 5 For all of the words in the same class, replace with a particular symbol, and then consider this symbol as an ordinary word. Then filter the most frequent structures my Multi-Layer Filtering algorithm again. For instance, if we have: parallel word classes: || at&five&o’clock || { One, two,…, five..., twelve } & || 在 &五 &点 || { 一, 二,…, 五..., 十二} || at&one&o’clock || || at&two&o’clock || We will get : ... || 在 &一 &点 || || 在 &两 &点 ||
Keeping one-to-one alignment Next step:
Keeping one-to-one alignment Now we have a pair of new parallel sentences with chunks: What&kind&of || room || do&you || want&to || reserve 你|| 想|| 预&定||什 & 么 & 样 &的 || 房&间 Our purpose is to find one-to-one chunk alignment on the assumption that the chunks to be aligned may occur almost equally in the corresponding parallel texts.
Keeping one-to-one alignment By applying the formular-11, we can get a alignment table: formular-11:
Experiments Training data: 55,000 pairs of Chinese-English spoken parallel sentences Test data: 400 pairs of Chinese-English spoken parallel sentences were chosen randomly from the same corpus.These 400 pairs sentences manually partitioned to obtain monolingual chunks and then manually aligned the corresponding bilingual chunks for computing the chunking and alignment accuracy.
Experiments Evaluation: Comparing the automatically obtained monolingual chunks and aligned bilingual chunks to chunks discovered manually, we compute their precision, recall and F-Measure value by the followed formula:
Experiments Results:
Experiments Comparisions of chunk-based translation to word-based translation: The improvement is about 10%.
Conclusions This chunking and alignment algorithm doesn’t rely on the information from tagging, parsing or syntax analysis, and doesn’t even require sentence segmentation. It obtains accurate one-to-one alignment of chunks It greatly decreases search space and time complexity during translation. The performance is better than baseline word alignment system. (in some tasks)
Problem / Weakness Authors didn’t say anything. Maybe we can do some improvement at: The step of maximum matching The step of building position vectors