230 likes | 347 Views
European Language Resources Association. ELRA’s Services 15 Years on... Sharing and Anticipating the Community. Victoria Arranz & Khalid Choukri ELRA/ELDA 55 Rue Brillat-Savarin, F-75013 Paris, France Tel. +33 1 43 13 33 33 -- Fax. +33 1 43 13 33 30 Email: {arranz, choukri}@elda.org
E N D
European Language Resources Association ELRA’s Services 15 Years on... Sharing and Anticipating the Community Victoria Arranz & Khalid Choukri ELRA/ELDA 55 Rue Brillat-Savarin, F-75013 Paris, France Tel. +33 1 43 13 33 33 -- Fax. +33 1 43 13 33 30 Email: {arranz, choukri}@elda.org http://www.elra.info/ or http://www.elda.org/
Before ELRA was established … once upon a time rational behind its foundation and its Mission(s) ELRA Activities: Identification and Distribution Production of LRs Evaluation of Human Language Technologies Dissemination New Visions: Large International Cooperation Advocating for a Backbone of Language Resources and HLT evaluation, Open and Shared Overview
Created in February 1995 Funding from the European Commission: 3 years Main rationale: bring into focus the need for a mutual exchange and use of LRs A Repository Center: Technical & Logistic issues Commercial issues (prices, fees, royalties) Legal issues (Licensing, IPR) Information Dissemination Infrastructure for the evaluation of Human Language Technologies providing resources, tools, methodologies, logistics, Exit strategies / Capitalization on evaluation packages Operational body: ELDA ELRA’s Foundation & Mission
Identification and Distribution • LR licensing: priority to simplify relationship between providers and users -> drafted generic contracts • Contracts: • establish usage: research / technology development • protect data owners and their LRs • available on www.elda.org/article1.html • designed before CC licenses: future mergings or joint designing? • 500 have been signed Contract model LREC-2010 Workshop on Legal Issues
Identification and Distribution • More than 1,000 LRs catalogued and available: ELRA Catalogue of Language Resources: http://catalog.elra.org Number of LRs within the ELRA Catalogue over the years
Distribution of Resources vs Usage • ELRA has distributed over 3,500 LRs: • 48% research in academia • 37% research and technology development in industry • 16% evaluation • Further 1,500 copies distributed within evaluation campaigns
The Universal Catalogue & the ELRA Catalogue
The Universal Catalogue & the ELRA Catalogue • Over 1,700 LRs compiled in the Universal Catalogue: http://universal.elra.org • Antechamber of ELRA Catalogue • Window-shopping nature: allows users to realise about existence of LRs for future availability? ELDA team helps to clear out legal situation • New feature: simplified collaboration form (following users’ feedback) • Also related: LREC Map initiative: LR identification tool during LREC submission time (ELRA & FlaReNet). See: Calzolari, N., Soria, C., Del Gratta, R., Goggi, S., Quochi, V., Russo, I., Choukri, K., Mariani, J. and Piperidis S.: The LREC 2010 Resource Map.LREC 2010.
Identification and Distribution: The ELRA Catalogue • Two interesting novelties: • ELRA’s implication in evaluation: • Distribution of evaluation packages (with definition of new type of use/agreement « Evaluation Packages End-User Agreement » + new pricing policy. • Technology evaluation: products, systems and applications • ELRA’s Catalogue of LRs for R&D: • Easy and fast access to LRs dedicated to academic research at an affordable price: http://catalogue.elra.info/retd
Production of LRs • Production or commissioning production • Production: • Within the framework of European and international projects: NEMLAR, Neologos, OrienTel, Speecon, C-ORAL-ROM, CHIL, TC-STAR, ESTER, MEDIA, MEDAR, PASSAGE, etc. • In support of companies or institutions: sometimes confidential • ELRA’s advisory role: • PCom • VCom
Production of LRs • Current technological development demands more ambitious resources: size, type of linguistic information, quality of the end-result • These are main objectives for ELRA and have triggered: • LRs compiled in more than 25 languages • High quality LRs + strict validation • Involved in every stage of production
Production of LRs • Production through ELDA: • (i) speech data for a variety of languages (e.g., Hindi, Korean, Colloquial Arabic(s), Canadian French, US Spanish, etc.), • (ii) Broadcast News Speech Corpus for Arabic, French, Spanish, etc., • (iii) corpora for languages such as Catalan, Kazakh, Romanian, Turkish, etc., (iv) aligned textual corpora for Machine Translation in languages such as Arabic, Chinese, English, French, German, Spanish, etc., • (v) video annotations with audio transcriptions, • (vi) collections of SMS data, • (vii)recordings of Wizard-of-Oz based data for dialogue systems, etc.covering different types of LRs and for different technologies
HLT Evaluation • ELRA has ensured infrastructure for technology evaluation, also with web-based service platforms • Through participation in European projects (CHIL, TC-STAR, CLEF) and in French national programmes (Technolangue) • Collaborative and customized services for HLT Evaluation • on-demand evaluation services • customized LRs for laboratories and/or companies • Important end-result: evaluation packages compiled and made available: • they contain required DBs, tools, methodologies and protocols to conduct comparable experiments
HLT Evaluation • More than 20 technologies and over 40 eval packages are available on ELRA Catalogue. • Some covered technologies: • Text processing: Information retrieval, Question Answering, Machine Translation, Automatic Summarization, Parsing, Multilingual Text Alignment, Terminology Extraction, • Speech processing: Automatic Speech Recognition, Speech Synthesis, Speech Translation, Broadcast News Transcription, Acoustic Person Tracking, Acoustic Speaker Identification, Speech Activity Detection, • Multi-modal interfaces: Multimodal Person Tracking, Audiovisual Speech Recognition, Multimodal Person Identification. • Reference Portal for HLT Evaluation: http://hlt-evaluation.org
Dissemination • ELRA has increased its activities for the dissemination of information on LRs: • “Speaker’s Corner” for the researchers and developers of the area • Events: • Language Resources and Evaluation Conference (7th edition): http://www.lrec-conf.org • LangTech • European LR and Technologies Forum (within FlaReNet) • MEDAR Conferences • Worshops of less-resourced languages (within LTC’09) • Language Resources and Evaluation Journal (Springer): http://www.springerlink.com/ • Newsletter • Members’ News • BLARK web site: http://www.blark.org/ + other web sites
Today's Context...a New Landscape? New Visions.... Part of ELRA’s success and evolution has implied facing and anticipating the realities/needs of the community History: Why the community established agencies like LDC, ELRA and some others How things evolved and saw new players (BAS, CSLU, LDC India, GSK, etc.) emerging What their missions were and what it is today Is there a role for such organizations? Impact of new Instruments on the field How did the web and Internet shake up the whole structure (internet .... web ............. web 2.0) New facilities (personal pages, free /easy web hosting) Easiness to use new media , new means to store, share resources But did our LR consumption/ “behaviour” change that much?
LR Distribution user-centered platform Search Engine Access (samples) Access (free LRs) Licensing Registration LRs @ELRA LRs @BAS LRs @Univ A LRs @Repository B Guidelines/best practices New Layered Approach … with Distributed Services … Functionalities
This new vision is being implemented in collaboration with a number of experts International initiatives working to design this new vision: PANACEA (www.panacea-lr.eu) Defining new legal frameworks... Cost-effectivenesss in LR production and automation Web-based factories META-NET (www.meta-net.eu/) NoE towards an open, integrated, secured and interoperable exchange Towards securing the largest Global Catalogue of LRs: LDC, NICT, OLAC, etc...harmonise their catalogues with the Universal Catalogue. The LREC Map and other future Maps (from other conferences) New Visions...
Overview on latest developments in ELRA’s services around: Identification & Distribution Evaluation Production Dissemination From early archiving and distribution to LR identification, collection, validation and distribution platform.... ...with clear and well-established legal frameworks... ...enhancing work on evaluation (with new techniques, covering more technologies and languages, providing more evaluation packages, setting up the HLT Eval portal....) ... increasing work on LR production and its coverage ...big push to dissemination... Concluding Remarks
...encouraging international cooperation... ...after years of consolidation...ELRA is looking forward to the new challenges emerging from new trends... ...a new interoperable exchange is certainly on sight. Concluding Remarks