370 likes | 622 Views
Results of R&D: BLaRK for Dutch Helmer Strik Dept. of Linguistics Centre for Language and Speech Technology (CLST) Radboud University Nijmegen, the Netherlands. Radboud University Nijmegen. Introduction. Terminology: BLaRK: Basic Language Resources Kit BaTaVo: Basis Taal-Voorzieningen
E N D
Results of R&D: BLaRK for DutchHelmer StrikDept. of LinguisticsCentre for Language and Speech Technology (CLST)Radboud University Nijmegen, the Netherlands Radboud University Nijmegen
Introduction • Terminology: • BLaRK: Basic Language Resources Kit • BaTaVo: Basis Taal-Voorzieningen • Platform-BC: see this presentation • Period • 2000 – : plans • – 2002 : results, future Cape Town, 24-11-2008
NTU & Dutch HLT Platform • NTU - Nederlandse Taalunie • (Dutch Language Union) • Mission: Strengthening the position of the Dutch Language • Dutch HLT Platform • Aim: To contribute to the further development of an adequate language and speech technology infrastructure for Dutch Cape Town, 24-11-2008
HLT platform Participants • Flanders: • Ministry of the Flemish Community • IWT (Flemish Institute for the Promotion of Scientific-technological Research in Industry) • FWO (Fund for Scientific Research - Flanders) • Netherlands: • Dutch Ministry of Education, Culture and Sciences • Dutch Ministry of Economic Affairs • Senter (agency of Dutch Ministry of Economic Affairs) • NWO (Netherlands Organisation for Scientific Research) Cape Town, 24-11-2008
Objectives • Strengthening the position of Dutch in HLT • Establishing the proper conditions for a successful management and maintenance of basic HLT resources developed through governmental funding • Stimulating co-operation between academia and industry in the field of HLT • Contributing to the realisation of European co-operation in HLT-relevant areas • Establishing a network that brings together supply and demand for knowledge, products, and services Cape Town, 24-11-2008
Action plan • ‘Action plan for Dutch in language and speech technology’ was defined to achieve objectives • Activities organised in four action lines (A, B, C, and D) Cape Town, 24-11-2008
Dutch HLT PlatformFour action lines • Performing a market place function • Strengthening the HLT infrastructure • Working out standards and evaluation criteria • Developing a management, maintenance, and distribution plan Cape Town, 24-11-2008
Action line A • Encourage co-operation between industry, academia and policy institutions • Raise awareness and give publicity to the results of HLT research “Performing a market place function” Cape Town, 24-11-2008
Action line B • Defining the BLaRK (Basic Language Resources Kit) for Dutch • Carrying out a survey to determine what is needed to complete the BLaRK: field survey • Drawing up a priority list with cost estimates serving as policy guidelines “Strengthening the digital language infrastructure” Cape Town, 24-11-2008
Action line C • Drawing up standards and criteria for evaluation of basic materials in BLaRK and for assessment of project results “Working out standards and evaluation criteria” Cape Town, 24-11-2008
Action line D • Defining a Blueprint for management including intellectual property rights, maintenance, and distribution of HLT resources “Developing a management, maintenance, and distribution plan” Cape Town, 24-11-2008
Actions carried out • Conducted mailings to contacts (about 1000) • Contacted and visited companies with HLT related needs, to: • Demonstrate benefits of HLT • Get clear picture of company’s knowledge status and future plans • Provide information on cross-linking services • Organised seminars and workshops Cape Town, 24-11-2008
Platform BC • Performing a market place function • Strengthening the HLT infrastructure • Working out standards and evaluation criteria • Developing a management, maintenance, and distribution plan • B+C Platform BC Cape Town, 24-11-2008
Platform BCWho? • Steering committee: • 8 HLT experts • NTU • NWO (funding body) • Field survey, 4 researchers • 2 language technology • 2 speech technology Cape Town, 24-11-2008
Platform BCWho? • Steering committee: 8 HLT experts Cape Town, 24-11-2008
Platform BCHow? • Three stages: • Defining the BLaRK for Dutch • Making inventory of HLT resources • Establishing priority list Cape Town, 24-11-2008
BLaRK: Basic Language Resources Kit • Components: • Data: sets of language data and descriptions in machine readable form • Modules (or semi-products): the basic software components of HLT applications • Applications: classes of applications rather than specific applications or products • 2 matrices: • Modules x Data • Applications x Modules • BLaRK Cape Town, 24-11-2008
Data Applications Modules Language Technology Quantify: 0, 1, or 2 (+’s) Field survey & Expert opinions Speech Technology Cape Town, 24-11-2008
BLaRKLanguage technology • Modules • Robust modular text preprocessing • Morphological analysis and morphosyntactic disambiguation • Robust syntactic analysis • Aspects of semantic analysis (word meaning and reference) • Data • Monolingual lexicon • Annotated corpus of written Dutch • Benchmarks for evaluation Cape Town, 24-11-2008
BLaRKSpeech technology • Modules • Automatic speech recognition • Speech synthesis system • Tools for annotation of speech corpora • Confidence measures and utterance verification • Identification (speaker, language, dialect) • Data • Monolingual speech corpora for specific applications • Multilingual speech corpora • Multimodal/medial speech corpora • Benchmarks for evaluation Cape Town, 24-11-2008
From BLaRK to priority lists • BLaRK: Basic Language Resources Kit • Inventory & Evaluation • Priority lists BLaRK inventory priority Cape Town, 24-11-2008
2. Inventory & Evaluation • Inventory: • Which components in BLaRK are available? • Bought • Freely obtainable • Reusable • Of sufficient quality • Evaluation: • And of sufficient quality? • Checklist approach (vs. formal evaluation) Cape Town, 24-11-2008
Modules Availability Quantify: 1-10 Field survey & Expert opinions Data Cape Town, 24-11-2008
3. Priority lists • The prioritisation was based on the following requirements: • The components should currently be unavailable, inaccessible, or of insufficient quality. • The components should be relevant for a large number of applications. • Developing the components should be possible in the short term. Cape Town, 24-11-2008
Consensus, broad support • Report version 1 • Feedback Academia & industry • Sent to the Dutch-Flemish HLT field (1000 sites) • Workshop 15/11/2001 • Report version 2, final version Cape Town, 24-11-2008
From BLaRK to priority lists • BLaRK • Inventory & Eval. • Priority lists Report 1 • HLT Field • Workshop Feedback: • BLaRK • Inventory & Eval. • Priority lists Report 2 Cape Town, 24-11-2008
Report • Version 1: • Version 2, final version: • W. Daelemans & H. Strik (eds.) (2002) • Het Nederlands in taal- en spraaktechnologie: • prioriteiten voor basisvoorzieningen Cape Town, 24-11-2008
Recommendations (1) • Met betrekking tot de BaTaVo: • Verzamelen van bestaande onderdelen • Vervolledigen (stimulering, fondsen) • Beheer & onderhoud (actielijn D) • Aanbieden, ‘open’ licentie • Evaluatie: testcorpora & methodologie Cape Town, 24-11-2008
Recommendations (2) • Algemeen: • Meer Taal & Spraak-technologen (opleiding, scholing, projecten) • Meer samenwerking • Naast middelen voor toepassingsgericht onderzoek, ook middelen voor fundamenteel onderzoek Cape Town, 24-11-2008
Priority listLanguage technology • 1. Annotated corpus of written Dutch • 2. Syntactic analysis • 3. Robust text pre-processing • 4. Semantic annotations for treebank in 1 • 5. Translation equivalents • 6. Benchmarks for evaluation Cape Town, 24-11-2008
Priority listSpeech technology • 1. Automatic speech recognition • 2. Speech corpora • 3. Multi-media speech corpora • 4. Tools for (semi-) automatic transcription of speech data • 5. Speech synthesis • 6. Benchmarks for evaluation Cape Town, 24-11-2008
Future prospects [2002] • Action line A: • Stimulate HLT in The Netherlands and Flanders • Cooperation: industry, academia, etc. • Action line B & C: • Collect existing resources • Ensure priorities are realized • Action line D: • Implementation of recommendations in the Blueprint Cape Town, 24-11-2008
When BLaRK is established... • Intellectual rights by NTU • Actual management and maintenance of resources by HLT agency, to be founded • Maintenance of expertise by • Dutch-Flemish steering committees and • HLT management committee, • both to be founded Cape Town, 24-11-2008
General conclusions [2002] • Goals have been achieved so that the proper prior conditions for development of materials in BLaRK are created • This work, carried out in the Dutch speaking area, can be profitable for others when starting similar activities • Part of the report is translated into English • Presentations & publications • Other domains • Other countrie • http://lands.let.kun.nl/~strik/BLaRK.html Cape Town, 24-11-2008
Questions? THE END Cape Town, 24-11-2008