140 likes | 289 Views
Ministère de l’Education Nationale, de l’Enseigneme nt Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director « Information & Communication Technologies » Department French Ministry of Research. Support to LT: Techno-langue.
E N D
Ministère de l’Education Nationale, de l’Enseignement Supérieur et de la Recherche Language Technologies for a Multilingual Europe Joseph Mariani Director « Information & Communication Technologies » Department French Ministry of Research
Support to LT: Techno-langue • Report to the Prime Minister (November 2000) • Techno-langue Action • Language technology survey and evaluation • Articulate with related existing programs • ICT Research & Innovation Technological Networks (RRIT) • Telecommunications, Software Engineering, Audiovisual & Multimedia • Ministry of Research action on Business Intelligence Tools (VSE) Cocosda / WRITE Workshop
Techno-langue structure Infrastructure program to support core LT progress, while innovative application projects stay with RRIT (110 M€ / year) TELECOM SOFT AMM VSE Cocosda / WRITE Workshop
Techno-langue Call • Language resources (data, tools) • Evaluation (technology, application) • Standards • Technological survey Cocosda / WRITE Workshop
Techno-langue Call • Launched in 2002, 3 year duration • Funding by 3 ministries (Research, Industry, Culture) • Same on Vision Technology (Techno-vision) in 2005 (MoD) • International cooperation • Foreign entities may participate in the projects, with their own funding • All funded projects completed in 2006 • Joint Techno-X workshop (ASTI conference, October 2005) • Paper at LREC’2006 (S. Chaudiron, J. Mariani) + 16 papers • Book under preparation • Public presentation of results (Fall 2006) • Feedback to research and industry (RRIT, VSE/Business Intelligence) • Presentation to administration Agencies (DoD, MAE…) • LT in 2006 « Data Masses and Ambient Intelligence » CfP • Managed by ANR – 3 M€ funding for LT Cocosda / WRITE Workshop
Results of the Call • 52 proposals submitted • 21 projects funded • 94 participants • 33 industry • 39 public research • 11 other categories (Associations, CEA, French DoD…) • 11 foreign (Bell Labs (USA), NII (Japan), EPFL, LATL…) • Budget: 20 M€ effort- 7.5 M€ public funding (3 years) • Special attention to the distribution of Language Resources and Evaluation packages Cocosda / WRITE Workshop
21 funded projects • 10 on Language Resources (data and tools) • 2 on Standards (Spoken / Written) : support to ISO TC37-SC4 • 1 on Technological survey (Portal) : http://www.technolangue.net • 8 on Technology Evaluation • Written language processing (5) • EASY: Syntactic parsing • ARCADE 2: Text alignment • CESART: Terminology extraction • EQUER: Information query • CESTA: Machine translation • Spoken Language processing (3) • EVASY: Speech synthesis • MEDIA: Spoken dialog • ESTER: Speech transcription / automatic indexing Cocosda / WRITE Workshop
ESTER • Task: «Rich» speech transcription and indexing evaluation • Broadcast news data in French (radio/TV) • 100 h manually transcribed (1 MW,350 speakers) + 1600 h untranscribed • Second largest worldwide • 13 participants (3 companies) • Written transcription (RT / non RT) • Segmentation (sound, speaker recognition / diarization) • Named Entity recognition (from speech / transcribed text) • Topic detection and tracking for indexing : postponed • Final internal Workshop in March 2005 • Distribution of Evaluation Package • Development and Test data, scoring, results. Data used in EASY. • Workshop for linguists in May 2005 • Data and tools available, Results • Open issues necessitating Basic Scientific Research investigations Cocosda / WRITE Workshop
LT for a Multilingual Europe • Language as a specific issue for Europe • Economical, cultural and political challenge with 2 dimensions: • A) Preserve the EU Member States cultures • Preference for native language (Web sites in German (75%)...) • 50% of European citizens only speak one language • (3% of Japanese people speak a foreign language) • B) Allow for communication across member states • 1170 translators at the EC - 1.3 Mpages translated in 2001 • 30% European Parliament budget (300 M€) – 500 translators • EU: 25 countries, 20 languages / 380 language pairs • Enormous cost for the EU, while mandatory • Need for the assistance of Language Technologies • Huge effort (# LT * # languages), too large for the EC alone • Should be shared with EU Member States (subsidiarity) Cocosda / WRITE Workshop
Language Technologies EU Program • European Research Area (ERA) • Coordinate EC (< 15%) and MS (> 85%) research efforts • ERA-Net initiative in FP6 to coordinate MS national programs • LT well fitted with ERA • EC prime responsibility : • the coordination: management, standards, technology evaluation, communication... • the development cost of generic Language Technologies: • Speech recognition, synthesis, understanding, spoken dialog, language tagging, parsing, analysis, generation, text retrieval, document understanding, machine translation... • Each Member State would primarily have the responsibility of ensuring a proper coverage of its language(s): • Language Resources (essential) : (annnotated) corpus (spoken / written), lexicon (including pronunciations), dictionaries… • Language specific technology development/adaptation Cocosda / WRITE Workshop
Lang-Net proposal • Build-up ERA-Net proposal of infrastructural nature • Language Resources, LT evaluation, Standards, Survey • Share of information • Strategic activities and Best Practice • Implementation of joint activities • Transnational research activities • Identify EU countries or regions having similar programs • 11 countries / regions in partnership : Germany, France, Italy, Trento region, Czech Republic, Denmark, Norway, The Netherlands / Belgium-Flanders (Dutch Language Union), Spain, Basque region, Sweden • Austria, Catalonia, Finland, Greece, Iceland, Portugal, Switzerland, UK (contacts) • Extendable to other partners • NMS (Slovenia, Cyprus, Poland, Hungary, Malta, Baltic countries…) • AS (Romania, Bulgaria…) • USA, Japan, South Africa, Israel, Canada… (contacts) Cocosda / WRITE Workshop
Joint LT program proposal • DG Research (ERA-Net program) • Lang-Net proposal submitted in march 2005, not selected • Look forward for Thematic ERA-Net+ in FP7 • DG INFSO + Media • «Science & Technology Forum on Multilingualism» • June 2005 and February 2006 in Luxembourg • DG Education, culture and multilingualism • « A new framework strategy for multilingualism » (Nov. 2005) • http://europa.eu.int/languages/ Web site in the 20 EU languages • EC will set up a High Level Group on Multilingualism • A EU ministerial conference will be held • Further communication will be presented by EC to Parliament and Council • Committee of the regions (use of regional Spanish languages) • TC-Star report : Introduction signed by V. Reding & J. Figel Cocosda / WRITE Workshop
French support to LT in FP7 • Visit of a French delegation to EC E Directorate • H. Forster & B. Smith (September 2005) • French Memorandum for a Digital Europe (i2010) • EuropeanDigital Library • EU ICT Directors meeting (Vienna, March 2006) • FP7 ICT program (2007-2013) • Technology pillar :Simulation, Visualization, Interaction, mixed realities • « Multilingual and automatic machine translation systems » • Replace / add LT • « Language technology, including multilingual and automatic MT systems » • FP7 Budget reduction (12 B€ to 9 B€ for ICT) • «language-enabled … interaction & communication» Cocosda / WRITE Workshop
LT in FP7 • Article 169 large (several 100 M€) EC + MS + industry program) on LT in FP7 ? • Present topics: SMEs, Metrology, Research in the Baltic sea… • Joint support to LT in FP7 from MS Cocosda / WRITE Workshop