160 likes | 385 Views
Workshop Goals DELAMAN and DAM-LR. Peter Wittenburg MPI for Psycholinguistics. Access Management Nijmegen November 2004. When did we start? . it is just 5 years that we started in our discipline speaking about large digital online collections standardizing the formats
E N D
Workshop GoalsDELAMAN and DAM-LR Peter Wittenburg MPI for Psycholinguistics Access Management Nijmegen November 2004
When did we start? • it is just 5 years that we started in our discipline speaking about • large digital online collections • standardizing the formats • XML was new and users were very skeptical • MPEG was and is something still not well understood • open metadata to come to browsable and searchable domains • using metadata to create well-organized archives • interoperability • LREC Athens 2000 • first workshop on these issues • start of the ISLE project (linguistic concepts, lexicon, metadata, …) • start of the IMDI work • in 2000 also first LDC workshop with OLAC as focus • little later DOBES was granted and E-Meld started • this is very short time when you want to convince a community Access Management Nijmegen November 2004
What did we achieve? • have “large” on-line digital archives/collections/Digital Libraries • MPI ~40.000 session bundles / ~10 TB • DOBES ~1.500 session bundles/ 1500 h • AILLA • PARADISEC • Lund corpora • also in HLT domain • LDC • ELRA • BAS • also “traditional” archives (Phonogramm Archiv, NAA, …) • etc • some of us became “archivists” by practice • idea of web visibility and online accessibility spreads • despite archiving attempts: according to D. Schüller ~80% of the • digitized material is endangered Access Management Nijmegen November 2004
What did we achieve? • much evangelization and agreement about standards • DOBES workshops and documents • LDC workshops and documents • E-Meld workshops and excellent web-site • ISLE workshops with IMDI result • PARADISEC workshop with DELAMAN result • HRELP workshops • LREC workshops and contributions • ACL workshops and contributions • IASA/IAML conference • etc • “everyone” agrees with XML, UNICODE and linear PCM • “everyone” understands the relevance of schemas to make • linguistic structure and encoding explicit • wrt JPEG and MPEG we are shooting on a moving target, but • don’t yet have real alternatives Access Management Nijmegen November 2004
What did we achieve? • created awareness about the need of metadata for visibility • created operational metadata infrastructures within 4 years • structured IMDI for discovery and management • OLAC for overall discovery • gateways between the two domains • however, still not satisfying situation • > 50 institutions are using IMDI (as far as we know) • ?? institutions are providing OLAC records • still only a small fraction of the language resources are visible • MD creation is hard • it is work for others – although this increasingly often is wrong • it means cleaning up your own holding and figure out what is available • it means to write “correct” scripts and to learn new software • it means being disciplined • have done our development job – have to continue dissemination • despite limitations we hope that people stick to what is out there Access Management Nijmegen November 2004
What did we achieve? • interoperability is still a dream however … • have metadata gateways in our discipline (OLAC-IMDI) • increasingly often tools are producing correct XML, UNICODE, … • have filters for character encodings and formats although • we miss well-designed and comprehensive services • have started with ontological work to tackle the linguistic aspects • GOLD ontology from E-Meld • ISO TC37/SC4 Data Category Registry • TDS (Dutch Typology Project) meta-language • EAGLES/ISLE/TEI specifications • we are at the beginning • cannot speak yet about fully operational infrastructures • but there are islands like FIELD, LEXUS, ONTO-ELAN, … Access Management Nijmegen November 2004
Changing role of Language Archives different groups of people contribute The Archive specialists maintain, unify, check quality, etc different groups of people use the content • at the MPI it is understood that the archive is the capital to build on • in the DOBES programme the point to make results explicit and accessible • only works if we don’t have an “inert, dusty” archives • not an attractive perspective – hear more about this from D.Schüller Access Management Nijmegen November 2004
Vision for a single archive Archive Utility Layer done in progress to start Ontological Knowledge User Authentication Access Rights Metadata Tools Lexicon Exploration Text Exploration Data Ingestion& Management Lexical Encoding Web Commentary The Archive Web-based Archive Exploration Annotation Exploration Domain of Registered Primary and Secondary Resources User Domain of Descriptive Metadata Primary Resources: Texts Images Sound Movies (Web-based) Archive Enrichment Media Annotation Access Management Nijmegen November 2004
Everything ok – so let’s go home … Raw Data Raw Data • what about the following scenario? Metadata Metadata data exchange for data survival reasons archive A archive B Access Management Nijmegen November 2004
Everything ok – so let’s go home … DOBES Archive • what about the following scenario? Raw Data DOBES Trumai Metadata my personal Trumai archive AILLA Archive Raw Data AILLA Trumai not just copies but result of own creative process Metadata Access Management Nijmegen November 2004
DELAMAN • Digital Endangered Languages and Music Archive Network • loose network of “archives” sharing a set of visions such as • want to exchange data automatically (list driven) • want to allow people to create integrated virtual working spaces • want to have an integrated access management domain • first talks in Nijmegen and at HRELP workshops 2003 • foundation at PARADISEC meeting in Sydney 2003 • no deep discussions about wishes in detail and implementation • therefore this workshop in Nijmegen • it’s about future usage scenarios with distributed archives Access Management Nijmegen November 2004
DELAMAN / DAM-LR Map EMELD ELAR INL MPI Lund ANLC AILLA AMPM LACITO AIATSIS PARADISEC • DELAMAN is an international network • DAM-LR • Distributed Access Management for Language Resources • 3 year EU project starting at 1.1.05 – yes we have money to start • centered around the DELAMAN intentions Access Management Nijmegen November 2004
Workshop • want to get a deeper understanding of what “we” want • need good requirements specifications • want to get a deeper understanding what others are doing • our ideas are not new – we share them with others • Digital Library initiatives (FEDORA, …) • GRID initiative(s) (SRB, GTK, …) • compute/function/data GRID • therefore we invited • linguists knowing about potential and real user wishes • “archivists” knowing about maintaining large repositories • technologists knowing about current and future developments • some of us looked into the legal and ethical aspects • at the end we should be ready to start Access Management Nijmegen November 2004
Programme 1. Day Access Management Nijmegen November 2004
Programme 2. Day times not too strict – it’s a workshop Access Management Nijmegen November 2004
Let’s go … • The MPI team wishes us two interesting and highly interactive days in Nijmegen • Daan, Andreas Technology • Paul, Roman Archive • Peter ?? Access Management Nijmegen November 2004