250 likes | 261 Views
This presentation highlights the importance of digital archives in preserving and providing access to endangered languages in the Asia-Pacific region. It discusses the benefits of digital formats for transcription, analysis, and distribution, as well as the role of quality-controlled primary data resources in supporting research results. The presentation also mentions other regional digital language and music archives, as well as the establishment and progress of PARADISEC, a collaborative project focused on preserving field recordings of endangered languages and musics.
E N D
Large-scale digital archives of endangered Asia-Pacific languages Pacific And Regional Archive for Digital Sources in Endangered Cultures Linda Barwick, University of Sydney Presentation to APAN E-science workshop, Honolulu, 28 Jan 2004
Endangered regional languages • Approx. 2500 of the world’s 6000 languages in Australia’s region (Oceania, E and SE Asia) • Majority of these 2500 are endangered - number of languages likely to fall to a few hundred by 2100 (UNESCO) • Loss of language -> loss of cultural knowledge (e.g. ecological knowledge) and expressions (e.g. songs) -> loss of human diversity
Why digital archives? • Salvaging materials recorded in endangered analogue formats • Only means of ensuring long-term preservation and access to audio • Optimal format for transcription and analysis • Distributed management & access (including authentication) via broadband R&E networks • Participation in international consortia for resource discovery and advice • Quality-controlled citeable primary data resource to support research results
The coming revolution … • Quality-controlled citeable primary data resource to support research results requires: • Authenticated resource creation path • Finegrained description of resource • Metadata • Transcript • Timecoding • (Translation… ) • Sustainability, security, discoverability and accessibility of resource (i.e. needs to be online) • Instantiation of links between research results and primary data (e.g. via electronic publication)
Other regional digital language and music archives • Archive of Maori and Pacific Music, U. Auckland • Tjibaou Cultural Centre, New Caledonia • Vanuatu Cultural Centre • Institute of Papua New Guinea Studies Music Archive, Port Moresby • Australian Institute of Aboriginal and Torres Strait Islander studies audiovisual archive • Alaskan Native Languages Center • Archive of Indigenous Languages of Latin America • Formosan Language archive • Others … e.g. Malaysia ….
Some European archives hosting Asia-Pacific region material • DoBeS (Documentation of Endangered Languages) Archive, Max Planck Institute, Nijmegen, Holland • Endangered Languages Programme Archive, SOAS, UK • Vienna Phonogrammarchiv • Berlin Phonogrammarchiv • LACITO, France • Musée de l’homme, France • British National Sound Archive …
About PARADISEC • Established 2003 to preserve and make accessible Australian researchers’ field recordings of endangered languages and musics from the Asia-Pacific region • Collaborative project funded by Australian Research council, participants Universities of Sydney, Melbourne and ANU • Does not include Australian languages - these managed via AIATSIS • Present focus on audio recordings - plan to include and integrate other digital resources
Collection status Jan 2004 • 1324 assessed records, covering approx. 150 regional languages from 14 countries • (Australia, Burma, Fiji, Indonesia, Japan, Laos, Malaysia, Micronesia, New Zealand, Papua New Guinea, Singapore, Taiwan, Vanuatu, Vietnam) • 392 hours ingested and online via password, APAC store account - on target for 500 hours (1 terabyte) in first year • Metadata quality control via registration with Open Language Archive Community (6/03) and OAI • First collections digitised and returned to depositors
Metadata - shared online database • For description, assessment, rights, access • Filemaker Pro while in development • Currently moving to MySQL/PHP • Created & managed online in shared server space • Public access to catalogueplanned for 2004 • Will link to collection(for authorised users) Nick Thieberger, Melbourne unitPARADISEC project manager
PARADISEC audio standards • 24-bit 96khz Broadcast Wave Format (uncompressed PCM audio with encapsulated metadata) 2GB/h • Ingestion managed via Quadriga system (also used by National Library of Australia, Screensound, etc) • CD-audio and Mp3 browser copies via batch processing Frank Davey, audio engineer, Sydney unit
Depositor and user liaison • PARADISEC digital archive only - provides temporary storage while objects are digitised • originals returned tooriginating institution/depositor with CD-audio copy • depositors have onlinepassword-protectedaccess to full-resolutiondigital files • we provide advice on archiving of originals if requested • born-digital originals will revolutionise work practices Amanda Harris, project administration, Sydney unit
depositor password authentication owner APAC national facility (Canberra) cultural centre authorised general user “Azoulay” archive space data entry/ administration working space digitisation (Sydney) Usyd MSS PARADISEC structure metadata/ database design (Melbourne)
Rights • Depositor and user agreement forms online • Rights information embedded in the processing system for eventual automated access or restriction of access • Trial password access currently implemented on APAC store and shared database
Access (audio online) • Download whole files from data store (e.g. for authorised community use) • Streaming MP3 (browsing) • Audition section of file (in development 2004) • Transcript, dictionaries, maps, images etc as point of entry to collection (in development 2004) • Effective access depends on transcripts with translations and timecoding • Need ‘timecoding for dummies’ tools • Encouragement for users to add value to repository by lodging transcripts, indexes etc.
Training & Resources • Demand for practical workshops for researchers and communities • Researcher training to archive in everyday practice not just as end point • Website as gateway for online resources • Potential for online collaboration with users and stakeholder communities in adding value to collection through timecoding and metadata
International discipline- related digital entities National media archives Regional stakeholders and cultural centres Australian Higher Education Sector PARADISEC’s communities PARADISEC
Regional stakeholders PARADISEC Regional community • Speakers/performers and their inheritors • Local and national cultural centres • Vanuatu Kaljoral Senta • Institute of PNG Studies • Etc… • Must be involved for ethical and rights reasons • Significant user community
Regional stakeholders PARADISEC Issues • Differentials in infrastructure • Differentials in funding • Training and career structures • Technical support • Local language access interface
Regional stakeholders PARADISEC PARADISEC Wishlist • Effective international networking links to stakeholder communities • User-friendly, cost-effective and open-source database, indexation and annotation software • More opportunities for user workshops and skillsharing within the Asia-Pacific region • Greater awareness of potential for cultural heritage applications in the planning/feasibility study stages of regional infrastructure projects
Sub-community of Open Archives Initiative Worldwide virtual library of language resources PARADISEC one of 27 participating archives AIMS develop consensus on best current practice for digital archiving of language resources develop network of interoperating repositories & services for housing & accessing such resources PARADISEC International discipline-related digital entities OLAC http://www.language-archives.org
PARADISEC International discipline-related digital entities DELAMAN http://www.delaman.org Other participants include: • Alaska Native Language Center Archives (University of Alaska Fairbanks, USA) • Archive of Indigenous Languages of Latin America (University of Texas, USA) • Archive of Maori and Pacific Music (University of Auckland, New Zealand) • DoBeS archive (Max Planck Institut für Psycholinguistik, Holland) • ELAR archive (School of Oriental and African Studies, UK)
PARADISEC International discipline-related digital entities Issues • Differentials in scope and mission of participants • Differential IP and rights protocols across international boundaries • Differentials in data structures, standards and system architectures
PARADISEC International discipline-related digital entities Wishlist • Networking, ethical agreements and standards to allow mirroring of data between participating archives to provide secure backup and efficiencies in data provision to global user communities
Linkages • Support and advice from ... • ANU Internet Futures, APAC, Grangenet • ScreenSound • National Library • AIATSIS • Collaborations ... • EMELD (Electronic Metastructures for Endangered Languages Data) • DELAMAN and OLAC • Regional cultural organisations • Strategic partnerships with other digital archives
Contacts Please visit our website http://www.paradisec.org.au Director (Sydney unit) lb@paradisec.org.au Project manager (Melbourne) nickt@paradisec.org.au