10 likes | 97 Views
NLP. NLP tools handle the SL input yielding an SL sequence annotated with grammatical & syntactic information. Lexicon Lookup. The SL sequence is enhanced by translation equivalents & PoS info, thus resembling a TL pattern. Core Engine.
E N D
NLP NLP tools handle the SL input yielding an SL sequence annotated with grammatical & syntactic information. Lexicon Lookup The SL sequence is enhanced by translation equivalents & PoS info, thus resembling a TL pattern. Core Engine The core engine of METIS II system is fed with a sequence of TL-like patterns, handled by the pattern-matching algorithm. It proceeds in 2 stages involving wider and narrower contexts, thus generating a TL sequence. Token Generation The token generation module receives as input a sequence of translated lemmas & their respective tags; it is responsible for the production of tokens out of lemmas. Web Interface The end user selects the preferred SL and enters the text to be translated. Database Server Lexicon Weights BNC Clauses BNC Chunks Token Generation Rules Final Translation METIS II: Statistical Machine Translation using Monolingual Corpora(FP6-IST-003768) Profile Evaluation Evaluation Setup For the system evaluation an experimental corpus extracted from real texts, mainly from newspapers, was used. It consisted of 200 sentences, 50 per language pair. The test sentences were of relative complexity, containing one to two clauses each and covered various syntactic phenomena such as word-order variation, NP structure, negation, modification etc. The reference translations have been restricted to 3 and were produced by humans, while BLEU & NIST metrics have been used for the evaluation. METIS II, the continuation of the successful assessment project METIS I, is an IST Programme, with a 3-year duration (01/10/2004 – 30/09/2007). The METIS II consortium comprises the following partners: • Institute for Language & Speech Processing [ILSP] (co-ordinator) • Katholieke Universiteit Leuven [KUL] • Gesellschaft zur Förderung derAngewandten Informationsforschung [GFAI] • Universitat Pompeu Fabra [UPF] German Results Fig. 5: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the BLEU metric Fig. 6: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the NIST metric Evaluation Results The METIS Approach Greek Results Spanish Results METIS II is a hybrid system, combining various approaches to machine translation (rule-based, statistical, pattern-matching techniques). It makes use of readily availableresources, such as bilingual dictionaries or basic NLP tools, and it can be easily customised to handle different source (SL) and target language (TL) tags. Most importantly, however, METIS II is innovative because it does not need bilingual corpora for the translation process, but exclusively relies on monolingual TL corpora. METIS II handles sequences both at sentence and sub-sentential level, achieving thus to exploit the recursive property of natural language. METIS II employs a series of weights, i.e. system parameters, in various phases of the translation process. Weights are associated with system resources and employed by the pattern-matching algorithm; they can be automatically adjusted to customise system performance. Four (4) language pairs have been developed as yet, namely Greek, Dutch, German & Spanish English. Fig. 7: Comparative analysis of the scores obtained for different settings of METIS II and SYSTRAN using the BLEU metric Fig. 1: Comparative analysis of the score ranges obtained for METIS II and SYSTRAN using the BLEU metric Fig. 2: Comparative analysis of the score ranges obtained for METIS II and SYSTRAN using the NIST metric Fig. 8: Comparative analysis of the scores obtained for different settings of METIS II and SYSTRAN using the NIST metric Dutch Results Future Work Fig. 3: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the BLEU metric • Future work involves further investigation of METIS II system architecture. More specifically, work towards the system optimisation includes the following: • Further system testing with a big number of test suites that will have more elaborate structures and deal with a wider range of phenomena • Algorithm optimisation in terms of accuracy • Automatic fine tuning of weights • Implementation of a post-editor module Fig. 4: Comparative analysis of the scores obtained for METIS II and SYSTRAN using the NIST metric METIS II Architecture