240 likes | 339 Views
PARADISEC background, current structures, and thoughts on international collaborations. Linda Barwick, University of Sydney DELAMAN workshop, MPI Nijmegen, 29 November 2004. PARADISEC structure. CIs: Cliff Goddard Hugh de Ferranti. CIs: William Foley Allan Marett Jane Simpson.
E N D
PARADISEC background, current structures, and thoughts on international collaborations Linda Barwick, University of Sydney DELAMAN workshop, MPI Nijmegen, 29 November 2004
PARADISEC structure CIs: Cliff Goddard Hugh de Ferranti CIs: William Foley Allan Marett Jane Simpson Audio Archiving Unit Director: Linda Barwick Audio: Frank Davey Project Liaison: Amanda Harris CIs: Andrew Pawley John Bowden Malcolm Ross Alan Rumsey CIs: Steve Bird Nick Evans Cathy Falk Janet Fletcher John Hajek Store account - web interface Stuart Hungerford Project Manager (Metadata guru) Nick Thieberger
PARADISEC rationale • prioritises Asia-Pacific region materials not otherwise catered for; • provides a rational framework for prioritising and managing University research recordings using international archival formats and standards; • implements IP arrangements tailored to University needs and practices; • involves researchers in specialist description of resources; • streamlines consortium processes to salvage important recordings and make them available for research in a timely and cost-effective way
Research applications • Making Australian research available internationally • Fieldwork - use for elicitation and documentation, and for language learning in preparation for fieldwork • Return of materials to communities • Digital tools for optimal transcription and analysis • Comparative studies - historical recordings give time depth for area language and music studies • Better understanding of diversity - data from some languages only in older recordings • Incorporation of primary data in presentations and, ultimately, publications
Staged approach • Metadata - 1623 records, to make resources discoverable even if not yet digitised • PIs and content metadata need to be assigned before digitisation (some refinement during process) • Repository - 807 items digitised to date, some complex e.g. fieldnotes (page images) or transcripts accompanying tapes
Metadata November 2004 • 1623 records in the metadata repository with data from 24 countries in Asia-Pacific(Australia, Chile, Cook Islands, Fiji, French Polynesia, Hong Kong, Indonesia, India, Japan, Korea, Lao, Malaysia, Federated States of Micronesia, Myanmar (Burma), New Zealand, Palau, Papua New Guinea, Reunion, Singapore, Solomon Islands, Taiwan, Tonga, Vanuatu, Vietnam)
Repository contents • Repository totals 26 November 2004 • total files: 2582 • total items: 807 • total size: 1.0TB • total hours audio: 627.3 hours • file types: .wav, .mp3 (1040); .tif, (179), .jpg (46), .pdf (34), .txt (3), .rtf (8), .xml (32)
Repository Collections McIntyre (10hr) Margetts (17hr) Rumsey (17hr)* San Roque (1hr) Sam (4hr)* Tepano (19hr) Thieberger (39hr) Toulmin (35hr) Voorhoeve (33hr)* Wurm (2)* Evans (Hons thesis) Thieberger (PhD thesis) Bradley (5hr) Capell (9hr)* Corris (6hr) Crowther (2hr) Donohue (3hr) Dutton (266hr) Fedden (7hr) Foley (23hr) Gardner (56hr) Kartomi (2hr)* Laycock (29hr) Lawton (3hr) McElhanon (41hr) * Ingestion ongoing November 2004
PARADISEC Repository Languages November 2004 PALAU Palauan PAPUA N. GUINEA Abau Ambonese Pidgin Angoram (Kanduanuin) Angoram (Moim dialect) Aomie Arapesh Arifama Aunalei Auwim Awomo Ba Balawaia Barai Baruga Barupu (Warapu) Be'anivia Biage Bibo Binandere Bodinumu Boera Boine Boku Boridi Bouxula BratMomire Buin Burum Chimba Chirima Daga Darava Dawawa Dedua Qld Pidgin Rabuka Raepa Tati Saliba Samo Sene Sepik Tok Pisin Sialum Sinaugoro Sona Suau Suku Surai Taboro Tairuma Tauade Tobo Tok Pisin Tolai Uberi Ubir Ubir Gonjoe Vesilogo Vioribaiwa Wamora Wangun Wiga Wosera Yele. Yewudu Yimas Yoba Dima Dimadima Dina Doga Domu Doromu Doura Efogi Efogi Dialects Emo Enivilogo Fore Fuyugey Gabadi Ginuman Gwedena Herei Hiae Motu Hiri Motu Hube Hula I'ai Ikega Ioma Isaka (Krisa) Kaipi Kairi Kambot Kanga Karama Karawari Lg (Ambinwari) Karukaru Kâte Kinalaknga Kimi Kiriwina Koiari Koita Koitabu Kokila Kokoro Komba Kopar Koriki Koriko Kosorong Kovai Kovio Kubuirubu Kuman Kumukio Kuni Kunimaipa Kwale Laimodo Mada'a Magi Mâgobineng Magore Maisin Maiwa Managalas Manam Manubara Manumu Mapei Mapena Mari Maria Mekeo Melpa Mian Mid-Wahgi Migabac Mindik Miniafa Mogoni Mom Mor Motu Muhiang Arapesh Nabak Naga Namanadza Naoro Nara New Ireland Pidgin Ngala Nomu Notu Ondoro One (Onne) Onjab Ono Opao Orokaiva Orokolo Ouma Paiwa Police Motu Porome SOLOMONS Babatana Ririo Ruviana Varese Lau Santa Cruz INDONESIA Asmat Brat Hatam Inanwatan Manikion Moi Ningrum Sahu Sebyar Tinam Todahe Tok Pisin Yahadian VANUATU South Efate Bislama Lelepa FIJI Lauan TONGA Tongan COOK ISLANDS Rarotongan Pukapuka FRENCH POLYNESIA Tahitian CHILE >>> Rapa Nui INDIA Rajbangsi NEW CALEDONIA Dehu .
Regional links • Institute of Papua New Guinea Studies • Vanuatu Kaljoral Senta • Archive of Maori and Pacific Music, U. Auckland • University of Hawai’i • New Caledonia - Tjibaou Cultural Centre • Indonesia - UIN, Jakarta • Malaysia - Universiti Malaya • Rapa Nui - Museo antropologico P. Sebastian Englert • Micronesia - Historical Preservation Office, Yap
Audio Ingest • Initially ingested as raw WAV on AudioCube 5 Dell 670 workstations running Wavelab (2005 will add remote Pyramix workstations) • Masters 24-bit 96khz Broadcast WAV Format (uncompressed audio with encapsulated metadata) • Some lower rate if digital original (e.g. 16bit 48khz from DAT) • WAV > BWF by Quadriga software • derivatives produced by batch processing - CD-audio quality (16-bit, 44.1khz) and mp3 quality(128bps)
Digital preservation • “Azoulay” server partitioned for working files and archive partition for sealed masters - current capacity 750GB (>3TB in 2005) • Sealed masters archived to 100GB data tapes on University of Sydney LTO Mass Data Storage System (high-low watermark script) - duplicate data tapes kept at 2 locations on campus • Sealed masters mirrored to APAC national Store facility (Canberra) nightly - nearline storage • Password-protected online access to Store facility
Networking • Main campuses (University of Sydney, University of Melbourne, Australian National University) connected by Grangenet (next generation research network, 10Gbps connections) • Pay subscription, not traffic costs • Satellite campus UNE connected by AARnet (Australian research and education network - currently billed traffic cost, 155Mbps connection) • Both with connections to APAN community (Asia Pacific Advanced Networks) - potential for linking to regional and international R&E networks - potential traffic costs an issue
Storage • Australian Partnership for Advanced Computing National Facility Mass Data Storage System - Hierarchical Storage Manager system • Funded by consortium of Australian higher education bodies • Tape robot system - can handle 1.2PB • PARADISEC will add 2-3TB per year once satellite ingest commissioned • Current horizon of facility 2008 - project PARADISEC collection up to 9TB by then • Will need to apply to host material/share data from other DELAMAN collections
Streaming • GrangeNet streaming server currently in trial mode - only available within network • Soon to have automatic copying of main collection to streaming server • Foresee higher demand for access when scaled streaming access to excerpts available; but also greater resources needed to mount and manage • Will depend on researchers’ provision of timecoded transcripts/glosses • Access and authentication protocols yet to be developed • Testbed for citation/integration into e-publications
Software • Initial metadata database in Filemaker Pro 6 with periodic XML dumps for OLAC static harvesting • Currently being ported to MySQL/PHP to allow dynamic harvesting and other functionality • Python software for managing repository and website (Stuart Hungerford, ANU) • Developing Java-based geographic search interface (TimeMap) • All based on Open Source tools
Implications • Implementations will change over time - foundation for cooperation must be agreements and alignment of strategic objectives • Minimal shared standards needed on formats, ethics, description, rights - what else? • Possibility of staged modular approach • federated discovery platform • proof-of-concept pilot studies/trials • targeted data sets for exchange • dark hosting/mirroring • tools development and testing
Issues • Transnational projects - how to identify and coordinate international funding opportunities? • Projections of international traffic & storage charges - funding implications • Sustainability of our collections - how to cost overheads and source long-term funding commitments • DELAMAN governance and administration structures? How to resource and support without duplication/reinventing the wheel, adding to administrative burden? • How to involve all stakeholders (including local/national bodies of originating communities)?
APAN Bangkok 2005 • E-science workshop: Toward a semantic web for digital data archives (convenor V. Balaji, Princeton) • Immense quantities of digital data and images are now archived and publicly available through the web. These include domain-specific data archives, covering such domains as weather and climate, seismology and geophysics, astronomy and particle physics, as well as images and digital copies of non-textual human cultural production. Describing, cataloguing, searching and locating information within digital data and image archives is one of the grand technological challenges of the semantic web era. This session will draw together participants from diverse fields of science and the humanities to share their experience on metadata, standards and techniques for access to large digital archives. • Tentative Titles of presentations: • 1) The Hierarchical Data Format for EOS (HDF-EOS), Richard Ullman, NASA Goddard Space Flight Center (Invited) • 2) Metadata Requirements for Global Climate Models, V. Balaji, NOAA Geophysical Fluid Dynamics Laboratory • 3) DELAMAN?? Remote presentation…
PARADISEC gratefully acknowledges support from: • Partner Universities (Sydney, Melbourne, ANU, UNE) • Australian Research Council LIEF scheme • Australian Partnership for Sustainable Repositories (SORRT testbed) • Australian Partnership for Advanced Computing • Grangenet • ANU Internet Futures
Contact us • http://www.paradisec.org.au • Linda.Barwick@paradisec.org.au (Director) • Nicholas.Thieberger@paradisec.org.au (Project Manager)
Relevant URLs • PARADISEC website http://paradisec.org.au/ • PARADISEC repository login http://store.apac.edu.au/cgi-bin/pdsc-v3.0.cgi/login • PARADISEC streaming trial http://paradisec.org.au/streamingtrial.html • Transcript page image trial http://www.austehc.unimelb.edu.au/~gavan/lana/hdms.htm • TimeMap digitiser tool proof of concept http://acl.art.usyd.edu.au/TMDigitiser/