290 likes | 462 Views
Dutch HLT Resources: from BLARK to Priority Lists. Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A 2 RT, Dept. of Language and Speech, Nijmegen * NTU, Dutch Language Union, The Hague Walter Daelemans Dept. of CNTS Language Technology, Antwerp.
E N D
Dutch HLT Resources: from BLARK to Priority Lists Helmer Strik, Diana Binnenpoorte, Janienke Sturm, Folkert de Vriend, and Catia Cucchiarini* A2RT, Dept. of Language and Speech, Nijmegen * NTU, Dutch Language Union, The Hague Walter Daelemans Dept. of CNTS Language Technology, Antwerp
Dutch HLT Platform NTU • NTU - Nederlandse Taalunie • (Dutch Language Union) • Mission: Strengthening the position of the Dutch Language • Dutch HLT Platform • Aim: To contribute to the further development of an adequate language and speech technology infrastructure for Dutch
Dutch HLT PlatformOther participants • Ministry of the Flemish Community • Flemish Institute for the Promotion of Scientific-technological Research in Industry • Fund for Scientific Research - Flanders • Dutch Ministry of Education, Culture and Sciences • Dutch Ministry of Economic Affairs • Netherlands Organisation for Scientific Research • Senter (an agency of the Dutch Ministry of Economic Affairs)
Dutch HLT PlatformFour action lines • Performing a market place function • Strengthening the HLT infrastructure • Working out standards and evaluation criteria • Developing a management, maintenance, and distribution plan
This presentationPlatform BC • - • Strengthening the HLT infrastructure • Working out standards and evaluation criteria • - • B+C => Platform BC • Focus on method (skip many details) • More details: see publications, web sites
Platform BCWhat? • BLARK: Basic LAnguage Resources Kit • Inventory & Evaluation • Priority lists
Platform BCWho? • Steering committee: • 8 HLT experts • NTU • NWO (funding body) • 4 field researchers
Platform BCHow? • BLARK • Inventory & Eval. • Priority lists Report 1 • Dutch HLT Field • Workshop 15/11/2001 Feedback: • BLARK • Inventory & Eval. • Priority lists Report 2
1. BLARK Basic LAnguage Resources Kit • Components: • Applications: classes of applications rather than specific applications or products. • Modules (or semi-products): the basic software components of HLT applications. • Data: sets of language data and descriptions in machine readable form.
BLARK Basic LAnguage Resources Kit • 2 matrices: • Modules x Data • Modules x Applications • => BLARK
Data Applications Modules
BLARKLanguage technology • Modules • Robust modular text preprocessing • Morphological analysis and morphosyntactic disambiguation / unknown words • Robust syntactic analysis • Aspects of semantic analysis (word meaning and reference) • Data • Monolingual lexicon • Annotated corpus of written Dutch • Benchmarks for evaluation
BLARKSpeech technology • Modules • Automatic speech recognition • Speech synthesis system • Tools for annotation of speech corpora • Confidence measures and utterance verification • Identification (speaker, language, dialect) • Data • Monolingual speech corpora for specific applications • Multilingual speech corpora • Multimodal/medial speech corpora • Benchmarks for evaluation
2. Inventory & Evaluation • B. Inventory: • Which components in BLARK are available? • C. Evaluation: • And of sufficient quality? • Checklist approach • => B&C together: platform BC • See matrix 3 - Availability
Modules Availability
3. Priority lists BLARK Inventory Priority lists
Priority lists • The prioritisation was based on the following requirements: • The components should currently be unavailable, inaccessible, or of insufficient quality. • The components should be relevant for a large number of applications. • Developing the components should be possible in the short term.
Priority listLanguage technology • 1. Annotated corpus of written Dutch • 2. Syntactic analysis • 3. Robust text pre-processing • 4. Semantic annotations for treebank in 1 • 5. Translation equivalents • 6. Benchmarks for evaluation
Priority listSpeech technology • 1. Automatic speech recognition • 2. Speech corpora • 3. Multi-media speech corpora • 4. Tools for (semi-) automatic transcription of speech data • 5. Speech synthesis • 6. Benchmarks for evaluation
Feedback • Report 1 • Feedback • Sent to the Dutch-Flemish HLT field (2000) • Workshop 15/11/2001 • => Report 2
Platform BCHow? • BLARK • Inventory & Eval. • Priority lists Report 1 • Dutch HLT Field • Workshop 15/11/2001 Feedback: • BLARK • Inventory & Eval. • Priority lists Report 2
When BLARK is established... • Intellectual rights by NTU • Actual management and maintenance of resources by HLT agency, to be founded • Maintenance of expertise by • Dutch-Flemish steering committees and • HLT management committee, • both to be founded
General conclusions • Goals have been achieved so that the proper prior conditions for development of materials in BLARK are created • This work, carried out in the Dutch speaking area, can be profitable for other countries when starting similar activities: • Presentations & publications • Part of the report is translated into English
Web sites • http: • //www.taaluniversum.org/tst/ • //www.hltcentral.org/htmlengine.shtml?id=996 • //lands.let.kun.nl/TSpublic/strik/platform-BC.html
Web sites • http: • //www.taaluniversum.org/tst/ • //www.hltcentral.org/htmlengine.shtml?id=996 • //lands.let.kun.nl/TSpublic/strik/platform-BC.html
Objectives • strengthening the position of Dutch in HLT • establishing the proper conditions for a successful management and maintenance of basic HLT resources developed through governmental funding • stimulating co-operation between academia and industry in the field of HLT • contributing to the realisation of European co-operation in HLT-relevant areas • establishing a network that brings together supply and demand for knowledge, products, and services
Platform BCWho? • Steering committee: 8 HLT experts