130 likes | 212 Views
IKTA5-146/2002. Development of an Intelligent Translation Memory. MorphoLogic http://www.morphologic.hu SZAK Publishers http://www.szak.hu Balázs Kis (kis@morphologic.hu). Rome, 21 May 2003. IKTA5-146/2002. Project Details. Duration 3 March 2003 – 25 February 2005 Budget
E N D
IKTA5-146/2002 Development of an Intelligent Translation Memory MorphoLogic http://www.morphologic.hu SZAK Publishers http://www.szak.hu Balázs Kis (kis@morphologic.hu) Rome, 21 May 2003
IKTA5-146/2002 Project Details • Duration 3 March 2003 – 25 February 2005 • Budget Total: 96,8 M HUF [387 200 €] Funding: 57,1 M HUF [228 400 €] • Consortium MorphoLogic Ltd. (84 %) SZAK Publishers Ltd. (16 %) Project leader: dr. Gábor Prószéky Rome, 21 May 2003
IKTA5-146/2002 The Problem and Its Impact (1.) • Current state-of-the art translation memories • store previously translated segments and translations • offer look-up for similar source segments backed by character-based fuzzy indexes • Advantage: • this is language independent, and inexpensive to develop and support Rome, 21 May 2003
IKTA5-146/2002 Rome, 21 May 2003 The Problem and Its Impact (2.) • Disadvantages of current TM technologies • they ignore relationships between syntactic structures, therefore • long segments or those with similar meaning or syntactic structure often stay hidden, so • many segments included in the translation memory are simply lost Rome, 21 May 2003
IKTA5-146/2002 Before the project started... • MorphoLogichad at hand • Human Language Technologymodules from morphology to every level of parsing syntax • a localisation department with very specific technological needs (still pending) • SZAK Publishershad at hand • many years experiencewith translation and terminology • a parallel corpus of technical texts of approx. 1,5 million words (under processing for project needs) Rome, 21 May 2003
IKTA5-146/2002 Main Objective • Development of a Translation Memory equipped with Linguistic Intelligence • finding source segments based on their grammatical similarity; • making changes to stored translations according to the current source segment • Long-term objective: • an improvement in the quality of translations and a decrease in the translation effort (time) Rome, 21 May 2003
IKTA5-146/2002 Project Constraints • An important remark: • This will be a language-dependent translation memory(linguistic intelligence assumes language-specific HLT modules) • First phase: using English and Hungarian HLT modules Rome, 21 May 2003
IKTA5-146/2002 Project Contents • The result is an integrated CAT tool(CAT = Computer Assisted Translation) • The tool consists of • A terminology management module (already available) • A text alignment program • A translation memory Rome, 21 May 2003
IKTA5-146/2002 Project Phases • Planning and Specification (completed) • Corpus Building • Core Research Phase:Development of Grammatical Proximity Search and Translation Correction modules • Implementation of Database Engine • Integration and Test Translation Rome, 21 May 2003
IKTA5-146/2002 Grammatical Proximity Search • Research on Non-Exact Matching of Phrases and Sentences (this is not fuzzy!) • A procedure for matching grammatical structuresnormalized by means of syntactic and semantic features • Critical evaluation of some „traditional” procedures • Research on Adapting Stored Translations to current source segment Rome, 21 May 2003
IKTA5-146/2002 A sample match Stored source segment FrontPage opens the current page in Page view. A FrontPage az aktuális oldalt a Page nézetben nyitja meg. Stored translation Current source segment recognized Word opens the second file in Print Layout view. A Word a második fájlt a Print Layout nézetben nyitja meg. Adapted translation Traditional TMs do not find a match with the default 70% threshold! Rome, 21 May 2003
IKTA5-146/2002 Expected Results... • Experiments start Autumn 2003 • First Test Version End of 2003 Rome, 21 May 2003
IKTA5-146/2002 Further Steps • Making the tool known in Hungary and abroad • Improvement of Services based on User Feedback • Addition of Further Language Pairs Rome, 21 May 2003