200 likes | 370 Views
ENABLER, BLARK, what’s next?. Steven Krauwer Utrecht University / ELSNET. Overview. ENABLER BLARK BLARK Results Recent developments CLARIN Some reflections MyBLARK Concluding remarks. ENABLER. EU Project, FP5, under Information Society Technologies (see www.enabler-network.org)
E N D
ENABLER, BLARK, what’s next? Steven Krauwer Utrecht University / ELSNET
Overview • ENABLER • BLARK • BLARK Results • Recent developments • CLARIN • Some reflections • MyBLARK • Concluding remarks
ENABLER • EU Project, FP5, under Information Society Technologies (see www.enabler-network.org) • bringing together national language resources projects in many EU countries • aimed at providing a cooperative framework to foster cooperation and interoperability • with a strong industrial drive • led by Pisa, and ended –as an EU project– in 2004 • … but still existing as a community, in close collaboration with ELSNET (www.elsnet.org)
BLARK (1) • Basic Language Resource Kit • Idea (first launched in 1998): definition of the minimal set that is needed to do any (precompetitive) R&D and education at all • Definition should be in principle language independent (although specific languages may require specific adaptations)
BLARK (2) • Definition should include both data collections (corpora, lexicons) and modules (taggers, parsers, synthesizers, annotation tools) • It should include both qualitative aspects (e.g. standards) and quantitative aspects (e.g. size)
BLARK (3) • Once the definition is available it can be used as a common reference point that allows to • assess the resources situation of a language (how much of the BLARK is available, and what is still missing) • make priority plans for bringing the resources situation up to date
BLARK (4) • Note that the BLARK is necessarily dynamic, as new technological developments will come with new requirements • Note that the BLARK for a language will only work if there is a body that takes responsibility for its implementation and for the maintenance and distribution of the resources created
BLARK Results • First adopted by the Dutch Language Union, resulting in a first 12 Meuro implementation programme launched at the end of 2004 • Explored and developed for Arabic in the NEMLAR project (CST, ELDA, ELSNET, and others; see www.nemlar.org and the presentation at this conference O27-G on Thursday) • BLARK concept included in a number of proposals, but without tangible results • Suggestions for a more advanced variant (ELARK) have been put forward by ELDA and others
Recent developments • CLARIN: Common Language Resources and Technology Infrastructure (see LREC 2006 workshop on May 22, or otherwise www.mpi.nl/clarin) • NOT a project proposal, but rather a proposal for a Research Infrastructure to be included in the European Roadmap for Research Infrastructures
CLARIN (1) • Creation of open European Language Resources Network with strong service centers and repositories, providing the humanities community at large (i.e. not just the language and speech technology community) with • knowledge about which language resources and tools exist and how to use them • access to existing language resources • coordinated creation of new resources • access to advanced services for access and adaptation • bundling of expertise in specific problem areas • training centers
CLARIN (2) Three important observations: • CLARIN has no industrial drive • CLARIN aims at addressing all languages in the EU (and associated countries) • One of CLARIN’s objectives is the definition and the coordinated creation of BLARKs for all languages of the EU
Some reflections • Whatever progress has been made (DLU, NEMLAR, ELARK) was mostly inspired by industrial needs • Industrial considerations do not favour smaller languages • Progress of the BLARK since 1998 has been slow • No new funding opportunities in FP6 to get anything done • CLARIN may offer exciting opportunities (if successful), but this will take a lot of time
More reflections • The present (embryonic) BLARK definition may be one or more steps too far for under-resourced languages • So, why not add to the concept the BLARKette, which should represent a very basic entry level variant of the BLARK, targeting exclusively the research and (especially) education community • Small and simple, should fit on a CDROM
And yet more reflections • Nothing funded will happen before well into 2007 • Why wait until then, e.g. if and when CLARIN is in place and some formal process has put into motion to define the BLARK (and the BLARKette)? • Why not start an action to consult the language communities and to arrive at a first proposal for a BLARK and BLARKette definition?
MyBLARK, the proposal • We initiate MyBLARK, aiming at collecting (for each language in the EU) • a description of the essential components of the BLARK • and of the BLARKette • We try to distill from this a broadly supported proposal for the definition of both concepts • We offer this as an input to the CLARIN project if it ever happens, or otherwise use it to launch other initiatives
MyBLARK, the process • ELSNET (possibly in collaboration with COCOSDA/WRITE) will send out a simple questionnaire to all known language resources centers, asking for descriptions of BLARK and BLARKette components • ELSNET (maybe with COCOSDA/WRITE) will set up a committee to synthesize the results in the form of recommendations
MyBLARK participants • Language resources centers for languages of EU and associated countries known to us • Language resources centers in the EU (+associated countries) that send me a message that they are willing to participate (steven.krauwer@elsnet.org)
Language Type of resource Usage Size Annotation required Brief description Available for your language? If so: pointer to it If not, pointer to similar resource for another language References Comments MyBLARK Questionnaire
MyBLARK Schedule • June – August 2006: collection of contacts • Sept 2006: questionnaires sent out • October 2006: questionnaires in, 1st analysis and draft definition proposals • November 2006: proposals sent out for feedback • December 2006 – January 2007: collecting feedback • February 2007: Final report
Concluding remarks • I have proposed the introduction of a slightly weaker variant of the BLARK, the BLARKette, for under-resourced languages • I have proposed an action entitled MyBLARK to arrive at an initial definition of both the BLARK and the BLARKette • I hope that this will (a) speed up the process, and (b) provide an intermediate coverage level for under-resourced languages