260 likes | 495 Views
Introducing the ELAR information system architecture . Robert Munro & David Nathan. Endangered Languages Archive (ELAR), School of Oriental and African Studies, London. Outline. Introduction The ELAR architecture User Requirements Ingestion Archive & dissemination Conclusions.
E N D
Introducing the ELAR information system architecture Robert Munro & David Nathan Endangered Languages Archive (ELAR), School of Oriental and African Studies, London
Outline • Introduction • The ELAR architecture • User Requirements • Ingestion • Archive & dissemination • Conclusions
Introduction – who we are • Part of the Hans Rausing Endangered Languages Project (HRELP), based at the School of Oriental and African Studies (SOAS), University of London. • Funded by the Lisbet Rausing Charitable fund • The other two parts are: • Academic Programme (ELAP) runs postgraduate courses, seminars and workshops • Documentation Programme (ELDP) funds endangered language documentation projects
ELAR – current state • In the process of designing and implementing key systems: • accession system (ingestion system) • archive information system • catalogue serving system • archive access system • data storage • long-term backup system
ELAR – current state • Source of materials supporting the systems analysis and design: • literature review • review of exemplar materials • interaction with associated archives • interaction with ELDP grantees • interaction with members of ELAP • departmental seminars on language documentation • seminars focused on archiving
ELAR – architecture • Strongly informed by the Open Archive Information System (OAIS) Reference Model (CCSDS, 2002)
afd_34 afd_34 afd_34 afd_34 afd_34 dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds The OAIS model Producers Ingestion Archive Dissemination Designated communities
afd_34 afd_34 afd_34 afd_34 afd_34 dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds The OAIS model Producers Ingestion Archive Dissemination Designated communities Identify the nature of the materials (content, format and structures) that data producers will create
afd_34 afd_34 afd_34 afd_34 afd_34 dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds The OAIS model Producers Ingestion Archive Dissemination Designated communities Identify the intended users of the archive, and their user requirements
afd_34 afd_34 afd_34 afd_34 afd_34 dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds The OAIS model Producers Ingestion Archive Dissemination Designated communities Define dissemination formats, data structures and procedures that support the user requirements of the designated communities
afd_34 afd_34 afd_34 afd_34 afd_34 dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds The OAIS model Producers Ingestion Archive Dissemination Designated communities Design an archive information system able to store all the information and produce the required dissemination packages.
afd_34 afd_34 afd_34 afd_34 afd_34 dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds The OAIS model Producers Ingestion Archive Dissemination Designated communities Define ingestion (accession) formats and structures that minimise the conversion cost
afd_34 afd_34 afd_34 afd_34 afd_34 dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds dfa dfadf fds fdafds The OAIS model Producers Ingestion Archive Dissemination Designated communities The archive needs to define three types of ‘packages’: ingestion, archive and dissemination.
Ingestion • A set of formats & structures that can be converted to archive formats with minimal effort: • file formats conforming to the 7 + 1 dimensions of portability (Simons and Bird, 2003; Johnson 2004) • support incremental assembly of the deposit • well-documented structures: XML with schema ideal • ELAR preferences: • uncompressed, nonpropriety formats • well-documented structures: (OLAC, IMDI, custom)
Ingestion • Filenames and structure of deposit: • we convert deposits to formats / structures appropriate for the archive information system • …but, we record the filenames and directory structures of the deposit, allowing depositors to navigate the materials via them
Ingestion • Access protocols • … tomorrow
Archive and dissemination • Granularity: • archive objects can be bundles • archive objects can be a subsection of a file • the types of related materials and their relationships should play a part in the search options
Archive and dissemination • Version control: • modeling versions of materials are required • multiple types of versioning might be required (migration / dissemination / content update) • versions will be ‘invisible’ to most dissemination packages
Archive and dissemination • Adding materials and metadata: • users can add comments to data • users can add metadata values not provided by a depositor • users can make relationships between items, including mapping • users can supplement the kinds of metadata and relationships in the archive. • note: all the above require moderation and supporting architecture
Archive and dissemination • Language support: • users should be able to add comments / metadata in any language • users should be able to navigate the archive access system via the language preference(s) of their choice • the archive architecture needs to support translations of metadata and comments
Archive and dissemination • Archive services • advice and conversion services to depositors • response to requests for information • supporting communications between individuals associated with the archive
Archive and dissemination • Archive information system: • separate metadata from materials • avoid redundancy • Dissemination packages: • favour embedding metadata • redundancy ok if an aid interpretation • Technical solutions: • we use MySQL to support the archive • for dissemination, we favour XML and formats allowing metadata to be embedded (PDF, BWF)
Conclusions • ELAR is newly opened for deposits • Key systems are in the process of development • Significant features include: • modelling archive objects at different granularities • modelling relationships between objects • users can enter/define their own metadata • users can translate information into the language of their choice • users can navigate via the language(s) of choice