90 likes | 104 Views
Discover the reasons why investing in the Digital World Foundation (DWF) is crucial for advancing language research. From understanding language roots to accessing required resources and developing automatic language processing, DWF offers a wide range of tools and services. However, challenges remain in accessing and harmonizing data across disciplines. Join the global forum to design a common data infrastructure and unlock the potential of data-driven research.
E N D
Why should we invest in DWF? Peter Wittenburg CLARIN Research Infrastructure EUDAT Data Infrastructure www.clarin.eu www.eudat.eu
Things that keep us busy I • understanding language roots • feature matrix extracted from many • cross-disciplinary & cross-country • resources • phylogenetic algorithms to compute • dependency trees • can’t easily access required resources • understanding language machine • so many institutes creating brain • image data • do we know about them and their • recording contexts? • can we access them easily?
Things that keep us busy II • automatic language processing • speech and body movement (gesture, signing, mimics, etc.) recognition is hard • no one stochastic recognizer will do • there is so much technology out there worldwide and components from different disciplines • do we know about them • can we easily access them
In CLARIN we are so good • developed a flexible component model to allow user to create metadata profiles • have established an open Data Category Registry (ISOcat) system based on ISO 12620 (compliant with ISO 11179) • got a professional tool set allowing users • to create, register and share components and profiles • to create MD descriptions efficiently
In CLARIN we are so good Virtual Language Observatory
In CLARIN we are so good • got a distributed SOA domain with many language&speech tools integrated / being integrated • use metadata profile matching to find appropriate tools when chaining services
but ... • there is so much data (& software) out there no one still knows of resp. no one is able to access • from about 200 linguistic departments creating data there are less than a handful centers in EU who have a proper repository, do archiving and curation, give access, allow computation and enrichments, are audited, etc. • no way to allow machines currently to access most of the resources blindly - common way: download & squeeze each individual resource/collection • proper metadata at high granularity still unpopular • only some harmonization at international level • only incidentally discipline crossing chats
cross-disciplinary aspect network of discipline hubs • large number of discipline-specific centers with access services • all disciplines similar • should we all do LTA, offer capacity computing, run PID, etc.? • a network of strong data & compute hubs • let them give COMMON services such as LTP, data staging, PID, AAI, etc. network of large data hubs
but ... • do we know what common services are and do we accept • do we understand data organizations of communities to design services • do we have agreed mechanisms working on large and complex data sets in a secure way in a federation • do we agree on the same essential building blocks for a common data infrastructure • AND - many communities are organized worldwide • Thus - need a GLOBAL forum to agree on some essentials that will make data-driven research more efficient and foster new insights