110 likes | 370 Views
“Poetic” Statistical Machine Translation: Rhyme and Meter. Genzel , Uszkoreit , Och ; Google, 2010. Challenge . Automatic translation of poetry is possibly the most difficult problem in Computational Linguistics, MT, and AI. Few humans are capable of Poetry Translation.
E N D
“Poetic” Statistical Machine Translation: Rhyme and Meter Genzel, Uszkoreit, Och; Google, 2010
Challenge • Automatic translation of poetry is possibly the most difficult problem in Computational Linguistics, MT, and AI. • Few humans are capable of Poetry Translation. • No previous attempts to apply MT to poetry.
Defining Poetry Translation • A poem’s form and meter (韻律) must be preserved in translation, if at all possible. • Poetic form as constraint of possible translation outputs. • Naïve approach: Perform MT and then use a poem detector among the results. • Better approach: Recast “poem-likeness” as a feature function that has 0 cost, and as a local feature to guide the decoder search.
Reduce Hypothesis Space • H = {h | h is an acceptable translation} • Reduce the size of H • H of poems = {h | h is an acceptable translation, maintains syllable structure (音節), rhythm(韻律), and rhyme(押韻)}
Types of Poetry • Line length: Haiku (三行俳句詩) (5-7-5) • Rhythmic Poetry: 0 for no stress, 1 for stress(blank verse) • Iambic foot (01)* (抑揚格) • Dactylic foot (100)* (揚抑抑格) • Rhythmic and Rhyming: Sonnet (abbaabbacdecde) • Lines have the same meter • {abab, a:010101, b:10101010}
Stress Pattern Feature Function • Use text-to-speech to capture stress. • Phrase-based: current h-length mod foot length. 2-syllable: 0 or 1, 3-syllable: 0, 1, or 2. Cost is number of mismatches. • Hierarchical: States of how well a partial hypothesis is modeled, length and cost. Can combine states. • “Whatever fits”: modify translation trivially to fit the pattern. Take care to combine right pattern score.
Framework for General Poetic Form Feature Function 1/3 • Track the target length: • dynamic programming over phrase lattice. Max source phrase size is k, length is n and max target length is l, then the sweep requires . Can reduce to with precomputationof size range.
Framework for General Poetic Form Feature Function 2/3 • State Space for the feature function: • Current sentence length in syllables • Set of uncovered ranges • Letters from the rhyming scheme
Framework for General Poetic Form Feature Function 3/3 • Algorithm: • Hypothesis state with phrase pair p • 1. Cost as 0, as • 2. Update : increment sentence length by target phrase length, update covered range • 3. Compute min and max achievable sentence length; if desired length not in range, cost++ • 4. For each word in the target phrase: • (a) If syllable pattern does not match, cost++. • (b) If at end of line: • i. if line ends mid word, cost++ • ii. Let x be the rhyme scheme letter • iii. If x is in the state , check if the word associated with x rhymes with the current word, if not cost++ • iv. Remove x with associated word from the state • v. If letter x occurs further in the rhyming scheme , add x with current word to
Results • No objective evaluation of “poetic” quality • Test: percentage of sentences that can be translated while maintaining the stress pattern, and the impact of this constraint on the BLEU Score. • Baseline score is 35.33, and stress pattern constrained system is 18.93. • If we allow no stress errors than 85% of the sentences were matched. • If we allow one stress error than 92% were matched.