Automated Multimodal Subtitle Generation: Methods and Challenges in Speech and Language Technologies

Multimodal multilingual information processing for automatic subtitle generation: Resources, Methods and System Architecture (MUSA) S.Piperidis, I.Demiros, P.Prokopidis {spip, iason, prokopis}@ilsp.gr Languages & The Media, 4 Nov 2004, Berlin

Objectives • explore the degree to which subtitlingcan be automated by using the appropriate technologies • focus on human language technologies • explore the degree to which speech and language technologies can be integrated • try out system architecturessimulating the underlying cognitive processes Languages & The Media, 4 Nov 2004, Berlin

Challenges of Subtitling • the challenge in automated generation is that there must be agreement between subtitles, the spoken source language and the corresponding image • generated subtitles must meet a set of constraintsimposed by the visual context of the text and spatio-temporal factors • subtitle text is no longer normal written text but rather oral text Languages & The Media, 4 Nov 2004, Berlin

Experiments in MUSA • experiments on monolingual and multilingual subtitle generation • Languages : English : source & target French & Greek : target • Technologies used • English ASR component for the transcription of audio streams into text • Subtitling component producing English subtitles from English audio transcriptions • Translation component integrating machine translation and translation memory, for EN-FR & EN-EL Languages & The Media, 4 Nov 2004, Berlin

Architecture Languages & The Media, 4 Nov 2004, Berlin

Resources for subtitling • in order to train and evaluate system components, • an array of application specific resources is needed • primary audiovisual data from BBC World Service, • documentaries and “newsy” current affairs • for each programme, the following parallel data • are sourced • the actual video of the programme • its script or hand-made transcript • English, Greek and Frenchsubtitles • topically relevant newspaper • and web-sourced texts Languages & The Media, 4 Nov 2004, Berlin

Resources overview Languages & The Media, 4 Nov 2004, Berlin

Speech recognition component • Use of parallel corpus of BBC programs, audio and hand-made transcripts, as well as topically relevant newspaper texts • Tuning of acoustic and language models of the KUL/ESAT recogniser • Background noise & non-native speech hinder the process • Aligning audio with hand-made transcripts proved to be a working solution helping overcome noise and non-native speakers problems Languages & The Media, 4 Nov 2004, Berlin

Speech recognition component (2) Languages & The Media, 4 Nov 2004, Berlin

Constraints & Requirements • subtitlingconventions in various EU countries • constraints entail that compression of • transcripts’ segments is required • compression rate expressed in # of words and • # of chars to delete Languages & The Media, 4 Nov 2004, Berlin

Subtitling engine & resources • Use of a parallel corpus of BBC programs featuring program hand-validated transcripts and their hand-made subtitles • Align sentences and words in the parallel corpus • Extract a table of paraphrases to aid compression • Example • Within the next few years -> Soon • During the years when -> While • It was clear that -> Clearly Languages & The Media, 4 Nov 2004, Berlin

Subtitling engine & resources (2) • If compression rate is not reached by using paraphrasing, apply syntactic rules to delete low-importance units (e.g. adverbs, adjectives, etc) • Hand-crafted deletion rules making use of • a shallow-parse of the segments • surprise values for each word, computed on the basis of a large text corpus. • If more deletable segments than necessary exist, start by deleting the least important segments first. Languages & The Media, 4 Nov 2004, Berlin

Translation component • integrate TM (Tr•AID)and MT (Systran) • align EN hand-made subtitles • with FR and EL hand-made subtitles • build a translation memory database (high % • of unique translation units, not unexpected) • perform term extraction on the parallel corpus • hand-validate automatically extracted terms and • use them for translation customisation purposes Languages & The Media, 4 Nov 2004, Berlin

Subtitle editing • responsible for textual operations, tokenisation and • subtitle text splitting, calculation of • cue-in/cue-out timecodes • requirement: subtitled text should be segmented • at the highest syntactic nodes possible • hand-crafted rules, e.g.“cut after punctuation”,“cut • after personal pronouns following a verb phrase” • For EN use of available shallow parse information • For FR and EL, use of part-of-speech information • did not produce worse results Languages & The Media, 4 Nov 2004, Berlin

Evaluation • so far, relatively poor ASR results for subtitling • alignment mode of ASR yielded >97% accuracy • grammaticality and semantic acceptability • of subtitles with targeted compression reached>70% • acceptability of translated subtitles • in the range of 45%-55% • evaluation of integrated prototype very encouraging, • entailing considerable productivity gains Languages & The Media, 4 Nov 2004, Berlin

The MUSA prototype Musa_EN_Demo.asx Musa_FR_Demo.asx Musa_EL_Demo.asx Languages & The Media, 4 Nov 2004, Berlin

Conclusions • human subtitling is an extremely complex process • a simplified computational model is feasible • an architecture for a multilingual subtitling system • is implementable • useful arrays of resources can be sourced and • processed at different levels, yielding useful • derivative resources Languages & The Media, 4 Nov 2004, Berlin

What’s next for today • the session eTools and Translation II, • after the break is dedicated to MUSA • the MUSA team will be around, available • for demonstrations of the system • and further discussions • MUSA on the web : http://sifnos.ilsp.gr/musa Languages & The Media, 4 Nov 2004, Berlin

Automated Multimodal Subtitle Generation: Methods and Challenges in Speech and Language Technologies

Automated Multimodal Subtitle Generation: Methods and Challenges in Speech and Language Technologies

Presentation Transcript

Toward a Cognitive Historiography of Mathematics Education

WIKI-bowl- SEOUL or ifip-tc3//IMG/pdf/boule-seoul.pdf

An interactive environment for creating and validating syntactic rules

Chapter 13