250 likes | 362 Views
The European Resources Landscape. Steven Krauwer ELSNET / Utrecht University The Netherlands. Overview. About ELSNET Main characteristics of the European scene Impact of EU funding policies Bottom-up resources infrastructure actions Concluding remarks. What is ELSNET.
E N D
The European Resources Landscape Steven Krauwer ELSNET / Utrecht University The Netherlands steven.krauwer@elsnet.org
Overview • About ELSNET • Main characteristics of the European scene • Impact of EU funding policies • Bottom-up resources infrastructure actions • Concluding remarks steven.krauwer@elsnet.org
What is ELSNET • European Network in Human Language Technologies (ca 145 academic and industrial member organisations) • Funded by the European Commission • Created in 1991 as one network out of (eventually) ca 25, covering all subfields of ICT • Objectives • bringing together the language and speech communities • bringing together academia and industry • facilitating R&D in language and speech technology • Info: elsnet@elsnet.org http://www.elsnet.org steven.krauwer@elsnet.org
What we do • Spreading knowledge, e.g.: • Training (e.g annual summer schools, curriculum development) • Information dissemination (newsletter, website, etc) • Knowledge transfer (directories, workshops) • Creating common foundations: • language resources • common standards and evaluation methods • Roadmapping: • Establishing a broadly supported common vision of where the language and speech field is going steven.krauwer@elsnet.org
Main characteristics of the European Landscape • Multilinguality: coping with many languages and crossing language boundaries • Fragmentation of all R&D efforts over national funding schemes and policies • Unbalanced efforts over languages, even though all languages are equally hard steven.krauwer@elsnet.org
Languages in Europe • European Union has • 15 member states, with 11 official languages (plus quite a few ‘unofficial languages’) • 10 new member states with (at least) 10 new official languages joining May 1st 2004 • 3 applicant countries in the waiting room with at least 3 extra languages • Europe has • 17 other countries, with quite a few additional languages (think of Russia!) steven.krauwer@elsnet.org
Languages in the world The Ethnologue (http://www.ethnologue.org): • Europe: 230 languages • The Americas: 1013 languages • The Pacific: 1311 languages • Africa: 2058 languages • Asia: 2197 languages steven.krauwer@elsnet.org
Languages in Japan • Just one language: Japanese …. • But even in Japan multilinguality is a factor, e.g: • Export market requires localized products (e.g. user interfaces) • Users require documentation in their own language • Business to business communication crosses language boundaries • Immigrants steven.krauwer@elsnet.org
Resources in Europe • Language resources collection started in most countries as a cultural or political activity • Most activities in larger countries with bigger funding programmes • Adoption or creation of resources for industrial application started much later • Most of them addressing commercially interesting languages • Result: very uneven coverage steven.krauwer@elsnet.org
Impact of the EU • During 70s and 80s EU becomes a major funder of technology programmes • For smaller languages EU becomes main funding source • Political requirement of multinational consortia and balanced participation over member states gave strong boost to resources development for smaller languages steven.krauwer@elsnet.org
Recent EU policies • EU focus shifting to activities with a more direct commercial impact • EU focus shifting from spreading excellence to boosting excellence: only invest in sectors where Europe can maintain or strengthen world leadership (over e.g. US and Japan) • EU moves from many small projects (up to 5 million euro) to few big projects (up to 50 million) • Language and speech technology have disappeared from the agenda, and Interfaces and Knowledge Systems have taken their place steven.krauwer@elsnet.org
Result of new policies • Strong emphasis on the commercially interesting languages • Language and speech will only appear as embedded technologies • Creation of language resources in EU projects only if needed for the main objectives of the project, i.e. never as a goal per se • Fragmentation of language and speech technology activities over many projects steven.krauwer@elsnet.org
Impact on infrastructures • Creation and distribution of resources, standards, and evaluation are infrastructural in nature (as opposed to research and development) • They require continuity and active industrial involvement • Very hard to accomplish in EU funding context because of short duration of projects and requirement that industries contribute 50% of their costs themselves • Resources actions now mostly at national level steven.krauwer@elsnet.org
Overall picture … • … not very good: very little to expect from EU as far as improvement of the language resources situation is concerned for the duration of the present Framework Programme (2003-2007) • But there are some signs that the situation will improve in the next Framework Programme, • And there are still a number of bottom up activities (emerging from the community, with or without EU support) steven.krauwer@elsnet.org
Ongoing resources infrastructure actions • ELSNET: still running (since 1991, hopefully secured until summer 2005; funded by the EU as a series of independent 2-3 year projects), still supporting resources and evaluation, now focusing on the roadmap for language and speech technology and for language and speech resources • ELRA/ELDA: Resources Association and Agency; European counterpart (although not twin sister) of LDC steven.krauwer@elsnet.org
Ongoing actions,continued • ENABLER: • Network aiming at coordination of national resources activities; EU funding has ended, but it remains active. • Surveys and other useful material on website (www.enabler-network.org) • Involved in resources roadmap and landscape (see later) • Asian and US participation steven.krauwer@elsnet.org
Cocosda • International committee for the coordination and standardisation of speech databases and assessment techniques • International, not just European – also active Asian involvement • Not funded, but alive steven.krauwer@elsnet.org
ICCWLRE • International coordination committee for written language resources and evaluation. • Written language counterpart of Cocosda • Goal is to join forces with Cocosda • To be launched at LREC 2004 in Lisbon • International, active Asian participation steven.krauwer@elsnet.org
LREC • Biannual international conference on resources and evaluation • Initiated in 1998, very successful, and truly international • Only conference on this topic and only conference bringing together language and speech communities steven.krauwer@elsnet.org
Ongoing actions,continued • The Language Resources Roadmap: • Joint activity of ELSNET/ENABLER/ELRA • Aimed at creating a broadly supported common vision of where the field is going, and what the implications are for language resources • Workshops (www.elsnet.org/roadmap.html) • Graphical representation at elsnet.dfki.de steven.krauwer@elsnet.org
Ongoing actions,continued • The Resources Landscape: • Joint project by ELSNET/ENABLER • Aimed at creation and continued maintenance of a full landscape of the world of language resources (actors, actions, projects, events, resources, etc) • Still under construction • See www.enabler-network.org steven.krauwer@elsnet.org
EAGLES/ISLE/Wordnet • EAGLES (and its successor ISLE) were EU funded projects aimed at standards in language and speech processing • Projects have ended, but there are still some ongoing activities, such as MILE (the Multilingual ISLE Lexical entry) • WordNet has had a number of European spin-offs, such as EuroWordNet, BalkaNet and local instantiations for other languages steven.krauwer@elsnet.org
Ongoing actions: BLARK • Define (in a language-independent way) the minimal set of language resources that is necessary to do any precompetitive R&D and education at all for a language (the Basic Language Resource Kit or BLARK) • Determine for each language which components are already available (survey) • Make for each language a priority plan to complete the BLARK (and to get funding) steven.krauwer@elsnet.org
New initiatives • Proposal to create BLARKnet: rejected by EU because language and speech are no core objectives • In France the successful launch of the new national programme TechnoLangue, explicitly addressing resources and evaluation • In Europe the initiative towards LangNet, a network aimed at coordination of national language and speech technology programmes (including resources and evaluation) • Some of the new EU projects will address resources problems, but project info has not been released yet steven.krauwer@elsnet.org
Concluding remarks • We have seen some problems that are inherent to the situation in Europe and that will not go away: linguistic fragmentation and uneven balance in distribution of R&D efforts over languages • We have seen self-imposed problems (EU funding schemes and policies); they may go away if and when the funders change their minds • But we have also seen that there is still place for a variety of resources related initiatives in Europe, many of which could benefit from collaboration with e.g. Japan steven.krauwer@elsnet.org