240 likes | 379 Views
ICT 2008, 26 Nov 08. Roberto Cencioni Kimmo Rossi Multilingual Web Theme 5 of the ICT-PSP Workprogramme DG Information Society and Media Unit INFSO.E1 Language Technologies & Machine Translation infso-e1@ec.europa.eu. Baseline. Why?
E N D
ICT 2008, 26 Nov 08 Roberto CencioniKimmo Rossi Multilingual WebTheme 5 of the ICT-PSP Workprogramme DG Information Society and Media Unit INFSO.E1Language Technologies& Machine Translation infso-e1@ec.europa.eu
Baseline • Why? • new online paradigms centred around communication, collaboration, co-creation … but significant language barriers remain • EU comprises 27 countries & 23 official languages • single European Information Space – one of the i2010 objectives • EC communication on Multilingualism (Sept ‘08) calls fora broader policy framework & joint action • Purpose: support & enhance • interpersonal & business communication • information access & publishing across languages
A few facts • EU official languages: 23 x 22 = 506 pairs • EC MT (Systran core engine) has 18 pairs in operation& 10 more pairs at prototype stage • 60+ national, regional & minority languages within the EU • English accounts for 30% of today’s Web content • 50% in 2000, 35% in 2004 • Arabic, Chinese, Portuguese … growing very fast • nearly 1,5 billion internet users worldwide (2008) • c 320 million native EN speakers in the world • basic requirements for the “digital translation market”: • volume • access • personalisation • real quick, real cheap
Here we are • a new unit established in July 2008 • Language Technologies & Machine Translation (INFSO.E1) • high expectations vs. low rate of EC S&T activity in the last few years • language is everywhere • written & spoken; documents, messages, databases, webpages, multimedia objects etc; information as well as meta-information • but our resources are limited, so initial focus on • multilingual technologies, services, applications • two instruments in 2009: • Research: FP7 ICT, call 4 • Objective 2.2 – Language based Interaction • Innovation: CIP ICT-PSP, call 3 • Theme 5 – Multilingual Web • total budget of 40 Meuro
Research vs. Innovationdivision of labour • from • long term foundational research (FP7) • through • applied research & technology development (FP7) • to • integration & demonstration (FP7 + PSP) • infrastructure & resources (FP7 + PSP) • different scale of ambition (€) • different degree of maturity (technology service) • different timescales & partnerships
LT Days 14-15 January, 2009 Luxembourg, JMO conference complex EC presentations, sessions w/ext speakers, proposal clinics, self-presentations & posters Agenda & registrations: cordis.europa.eu/fp7/ict/ language-technologies/fp7-call4_en.html
Web sources INFSO.E1 website: cordis.europa.eu/fp7/ict/language-technologies/.. • FP7-ICT: ../fp7-call4_en.html • ICT-PSP: ../cip-psp_en.html • Events & Presentations • Call guidance notes • Background material & useful Links …
Pre-proposals& Clinics 3 pages max, mail to: infso-e1@ec.europa.eu • describe the problem your proposal addresses, in particular • specify the intended user profile and related tasks • describe actual or prospective applications • detail data sets: source(s), typology, volume • how will the proposed project contribute to the outcomes and impacts set out in the work programme? • what are the key innovations? • what will be the main concrete results? • what public outputs are foreseen? • what impact do you expect? • describe the consortium • give partners' names or profiles and the intended skills mix • indicate the intended instrument (if known) • indicate the scale of your ambition • what is the estimated effort (man-months) • how long will the proposed project last? • what amount of EU funding are you looking for?
ICT-PSP Call Overview
ICT-PSP Call 3,~Feb 09 ICT Policy Support Programme (PSP) within the Competitiveness & Innovation Framework Programme (CIP) (adopted in October 2006) • geared towards innovation & ICT uptake: • development of the Single European information space • strengthening of the internal market for ICT products and services and ICT-based products and services • stimulation of innovation through the wider adoption of and investment in ICT • ensure seamless access to ICT-based services • improve the conditions for the development of digital content, taking into account multilingualism & cultural diversity Takes over eContentplus activities from Jan 2009
“Europe’s language is Translation” • translation & interpretation market (exc. in-house): • c $15 billion; €1.1 billion for EU institutions alone (2006) • est. 300,000 full time salaried translators worldwide (37% in Europe) • market fragmentation • big players < 1000 employees • top EU-based translation company posted a revenue of $175 million in 2006 • a good European base • SDL, Star, RWS, XRX, Euroscript, Logos, Moravia, VistaTEC, Semantix … • ESTeam, Lucy Software … • a largely untapped potential • 4x according to some companies
Business world • new models: Most companies follow the age-old translate-edit-proofread model of translation. Collaborative, web-based technologies allow translation to become more agile, faster, and better with fewer steps (CSA Inc.) • new markets: Language Weaver is entering the three new strategic markets – Web Content, Business Intelligence and Customer Care – to provide high-volume, high-speed, and accurate automated translation solutions at a price that would have been unfathomable just a few years ago • new approaches: If you don't see your native language here, you can help Google create it by becoming a volunteer translator. Check out our Google in Your Language program • and then of course: Unfortunately for Google as a person with 7 years of translation experience myself I can tell that you will hardly ever find a translatorwho will agree that machine translation can be useful for anything. (a Russian translator)
ICT-PSP Call 3,Theme 5:Multilingual Web • 3 objectives: • machine translation for the multilingual Web (pilot projects) • multilingual Web content management (pilot projects) • best practices & standards for the multilingual Web (thematic network) • 14 Meuro in total, around 6 projects “The duration of the pilot is expected to be 24 to 36 months within which there should be a 12-month operational phase.”
ICT-PSP Call 3,Theme 5:Multilingual Web • research: no, at least not ICT research … • development/engineering: • configuration, optimisation, customisation, integration … of existing (state of the art) methods, tools & services with a view to defining new approaches, offerings & practices • demonstration: • innovative combination is key; new business models, processes & services, organisational setups, usability … • evaluationalong user, technical & (socio-)economic dimensions • problem orientation: • useful & useable although possibly not perfect;think ROI
Scope & defs • MT as defined in the ICT-PSP workprogramme encompasses • fully automatic machine translation, whatever the technology • interactive computer-aided translation (eg TM) • a suitable combination of 1. and/or 2. with web based • human translation, proof-reading & post-editingincl. where relevant methods inspired from social networks • workflow & content management systems, … • innovative & effective combination of people, processes& technology; the end result is not science, rather • more and/or better output • save time • cut cost • emphasis on language transfer, from source language to target language(s) • language input-output (e.g. speech-to-text) is not the focus • cross-platform, multi-format content access/delivery is key
Language coverage • some of the work is expected to be language independent • flexibility & ease of adaptation to other languages are key factors • content authoring & management, collaboration & workflow … are language independent anyway • project outcomes must be validated in 3+ languages • preferably belonging to different linguistic families • target languages are chosen & justified by the proposers bearing in mind the following priorities (from high to low): • EU official languages • nationally recognised languages • regional languages • minority languages • Non-EU world languages linked to global markets & exports can be considered as well • on a proposal by proposal basis
Cont’d • project’s language coverage driven by the need to: • address gaps & overcome barriers e.g. cross-border communication for less-developed languages, or • exploit opportunities e.g. address emerging markets & sizeable language communities • impact is key, so: viability, sustainability, exploitation channels, deployment prospects … • main findings must be pro-actively disseminated • some form of public showcase is mandatory • participants should include • private or public sector content owners & aggregators • providers of language services, technology suppliers • (online) communities of interest where relevant • 6-7 partners/project, up to €2.5 million funding, up to36 months
ICT-PSP Call 3exp. Feb 09 3 intertwined objectives: 5.1 machine translation for the multilingual Web (projects) • information access: MT and other multilingual solutions for information access & use, esp. cross-lingual search & retrieval • information publishing: MT to create, distribute and (re-)use more widely & effectively online content in a multilingual environment 5.3 multilingual Web content management (projects) • communication: multilingual Web content development & management; design, authoring, versioning & maintenanceof multilingual Web sites, portals or repositories 5.2 standards & best practices for the multilingual Web (network) • conventions & best practices for multilingual Web content
ICT-PSP, 5.3multilingual Web content management • methods, techniques, metrics … for developing & managing multilingual web content & services • much more than translation; significant cultural elements • think of • one big website in many languages, or • several interrelated websites, one country/language each • now think of how to maintain the integrity & consistency of such resources, effectively & over a long period of time • and how to detect & repair gaps or inconsistencies • so, beyond the “translation” step (obj 5.1): • design, authoring, versioning & maintenance of (multiple, parallel, interconnected …) websites, portals or repositories • in a distributed collaborative environment, possibly across organisational boundaries • so as to turn a multi-million endeavour into a viable proposition for a much broader range of companies & administrations
ICT-PSP, 5.1machine translation for the multilingual Web 5.1 can be seen as a subset & central component of obj 5.3 (its “translation box”) • different usages: • web at large, enterprise, public information repositories … • different users: • teams as well as individuals, engineers as well as analysts, sales & marketing, language professionals, … you & me • different content rich, information bound sectors, private & public • quality depends on task & user • from raw translation & “gisting” up to error-free translation • two important conditions: • widely recognised, well argued problem; clearly identified target community • thorough validation in a given domain / for a given task • volume • metrics
ICT-PSP, 5.2standards & best practices Thematic network • covers the same broad issues as 5.3 • “the web as THE vehicle for multilingual content & services” • provides a forum for multilateral exchange of experience & consensus building • structure & tasks to be defined by the proposers, indicative list: • bring together a meaningful subset of the main stakeholders, possibly through their own groups & associations • ICT & language industries, content aggregators/distributors, e-services, multinational agencies, industry & de-jure standards bodies … • analyse current situation, identify gaps & bottlenecks; assess market failures if any, specify technical & non-technical conditions to be met and the respective actors • establish roadmap (trends, requirements, dependencies …) for further developments in the coming years • stimulate consensus & active involvement/coordination; take part in leading conferences, liaise with primary associations etc. • explore means to promote best practice (conferences, portals, publications, training …) beyond current channels • identify & describe suitable follow-on actions
ICT-PSPInstruments & Funding • pilot B projects: • min. 4 partners from 4 different countries • 50% of eligible direct costs • 30% overhead rate of personnel costs • thematic networks: • min. 7 partners from 7 different countries • lump sum; for 3 years and 1+10 participants: • coordinator: 95 Keuro • other participants: 24 Keuro each ec.europa.eu/information_society/activities/ict_psp/participating/index_en.htm
Practical info ICT-PSP Theme 5 – Multilingual Web budget: 14 Meuro under Call 3 managed by: Unit E1 Email: infso-e1@ec.europa.eu EC contact: Mr Kimmo Rossi • inquiries: from the call publication date (~Feb) • pre-proposals: from publication until 3 weeks before the call closing date
Events Language Technology Days: 14-15 Jan 2009, Luxbg ICT-PSP Info Day: 26 Jan 2009, Brussels (tbc) Email: INFSO-E1@ec.europa.eu URL: cordis.europa.eu/fp7/ict/language-technologies/.. FP7-ICT: ../fp7-call4_en.html ICT-PSP: ../cip-psp_en.html