310 likes | 563 Views
European Language Resources Association. ELRA/ELDA: The Importance of Sharing Linguistic Resources. Victoria Arranz ELRA/ELDA 55-57 Rue Brillat-Savarin, 75013 Paris, France Tel. +33 1 43 13 33 33 -- Fax. +33 1 43 13 33 30 Email: arranz@elda.org
E N D
European Language Resources Association ELRA/ELDA: The Importance of Sharing Linguistic Resources Victoria Arranz ELRA/ELDA 55-57 Rue Brillat-Savarin, 75013 Paris, France Tel. +33 1 43 13 33 33 -- Fax. +33 1 43 13 33 30 Email: arranz@elda.org http://www.elda.org/ or http://www.elra.info/
ELRA’s Foundation and Mission ELRA’s Strategic Technical Activities: Issues Concerning Archiving and Providing of Language Resources for the HLT Community Administrative and Legal Issues Regarding Language Resources Involvement in the Evaluation of Human Language Technologies Promotion of the Language Technology Field Concluding Remarks Presentation Outline
Centralized Non-profit organisation for the: Collection, distribution and validation of LRs and tools, Production or commissioning of production of LRs Evaluation of Human Language Technologies European Language Resource Association An Improved Infrastructure for Data Sharing and HLT Evaluation Operational agency ELDA: Evaluation & Language Resources Distribution Agency
Europe and the Multilingual Issue More languages...Even within the same country
European Language Resource AssociationAn Improved Infrastructure for Data Sharing An Association of users of Language Resources • A Repository Center: • Technical & Logistic issues • Commercial issues (prices, fees, royalties) • Legal issues (Licensing, IPR) • Information Dissemination Infrastructure for the evaluation of Human Language Technologies providing resources, tools, methodologies, logistics Exit strategies / Capitalization on evaluation packages
Contrastive AnalysisPublic “role”- Streamlining public funds BEFORE ELRA: No resources were made available Duplication of efforts and funding <== LRs NEVER DISTRIBUTED ACQUILEX ARS GENELEX ONOMASTICA PLUS POLYGLOT REWARD SUNDIAL SUNSTAR AFTER CE Established ELRA (LRs CURRENTLY DISTRIBUTED See ELRA Catalogue) ACCOR COST-232 CRATER MULTEXT PAROLE SPEECHDAT Family TSNLP (Euro)WordNet • Return on Investment • Rational Capitalization • "Public" Assets
Issues concerning Archiving and Providing of Language Resources for the HLT community: Maintenance of ELRA’s Catalogue Development of the Universal Catalogue and Identification of Useful Resources Development of the Basic LAnguage Resource Kit (BLARK) Development of the Extended LAnguage Resource Kit (ELARK) Validation and Quality Assessment Production or Commissioning of the Production of LRs Language Resources Activities
Administrative and Legal Issues regarding Language Resources: Handling of Legal Issues related to the Availability of LRs Distribution of LRs and Pricing Policy Language Resources Activities
Involvement in the Evaluation of Human Language Technologies: Evaluation projects within ELDA Language Resources Activities
Promotion of the Language Technology Field: Information Dissemination, Promotion and Awareness, Market Watch and Analysis Language Resources Activities
Issues Concerning Archiving and Providing of Language Resources for the HLT Community
Compilation of necessary LRs that should be available for all languages & regarding different Language Technologies and Modules Defined by Steven Krauwer and first launched through ELSNET with Dutch Initiative “Dutch Human Language Technologies Platform” In the framework of ENABLER thematic network (European National Activities for Basic Language Resources): ELDA elaborated report Sample: work done for Arabic resources within NEMLAR project: http://www.nemlar.org Basic LAnguage Resource Kit (BLARK)
Users need info on LRs: technical specifications, assurance of quality,... Validation is used by ELRA in reference to: checking the adherence to standards, QC of LRs, suitability for market Definition of validation methodology required: Projects for the production of guidelines, standards and specifications: EAGLES, PAROLE, SPEECHDAT, INTERVAL, ... Establishment of a Validation Committee (VCom) SPEX (for speech resources): http://www.spex.nl/validationcentre/ CST (for written resources): http://cst.dk/validation/index.html Cooperation with key players with relevant expertise to act as validation units Validation and Quality Assessment
Spoken Language Resources: CHIL C-ORAL-ROM OrienTel Neologos Speecon TC-STAR ESTER MEDIA ELRA’s Production of LRs • Written LRs: • Technolangue projects: • Lexitec • Euradic • Technolangue/Evaldaplatform projects: • Equer • EASy
Administrative and Legal Issues Regarding Language Resources
Handling of Legal Issues Regarding LRs • Support of lawyers for basic principles on LRs licensing • One of ELRA’s priorities: simplify rel. between providers & users • Drafting of generic contracts (with responsibilities & obligations) to encourage producers/providers to distribute • Establish what usage is allowed for LRs: R&D or Research only? • Protect providers and their LRs • Are available on ELDA’s web site: http://www.elda.org/article69.html • ELRA: • not owner of LRs & sets fair price with owner, based on: • Production costs • Expected revenues • ELRA’s distribution policy: try to offer discounted price for members • ELRA’s effort to obtain low prices for R&D
Involvement in the Evaluation of Human Language Technologies
Levels of Evaluation (ELSE) • Basic Research Evaluation (validate research direction) • Technology Evaluation (assessment of solution for well defined problem) • Usage Evaluation (end-users in the field) • Impact Evaluation (socio-economic consequences) • Programme Evaluation (funding agencies)
Reusability of Resources ELSE: Reusability of resources from one campaign to another
Carrying out evaluation campaigns: further involvement at: French (Technolangue/Evalda) and European (CLEF, TC-STAR, CHIL, etc.) levels. Distributing evaluation packages Producing or commissioning the production of needed LRs for evaluation Conducted with the support of an Evaluation Committee: ECom ELRA’s Activity on Evaluation
CLEF 2004 campaign: http://clef.isti.cnr.it/ TC-STAR: http://www.tc-star.org/ CHIL:http://chil.server.de/servlet/is/101/ Technolangue/Evalda: ARCADE II, CESART, CESTA, EASY, ESTER, EQUER, EVASY, MEDIA Evaluation Projects within ELDA
Considerable effort on the promotion and dissemination of the LT field & its activities: Partner/coordinator in several conferences, meetings, workshops: LREC Conference series (~900 participants) SCALLA 2004 International Conference on Arabic Language Resources and Tools 2004 Speech and Language System for Human Communication Conference 2004 LangTech Conferences Information Dissemination, Promotion and Awareness, Market Watch and Analysis
Elaboration and distribution of the ELRA Newsletter, quarterly publication dedicated to inform readers about HLT events, projects, new LRs, etc. Maintenance and/or setting up of a number of web sites: ELRA: www.elra.info ELDA: www.elda.org LREC: www.lrec-conf.org COCOSDA: www.cocosda.org NEMLAR conference: www.nemlar-conf.org LangTech conferences: http://www.lang-tech.org/ Technolangue: www.technolangue.net Information Dissemination, Promotion and Awareness, Market Watch and Analysis
Public announcements posted to mailing lists Sponsorship activities Carrying out of market analysis and survey Information Dissemination, Promotion and Awareness, Market Watch and Analysis
Overview on major activities at ELRA and ELDA Aim of ELRA/ELDA continue active role in support of HLT community: Supporting researchers and developers in their needs of LRs Evaluation guidelines Validation guidelines Development of standards Working on the legal, distribution or pricing issues We believe: we’re moving towards the future needs of LT in an organised and productive manner Concluding Remarks
LREC-2006… www.lrec-conf.org