1 / 17

Automating the Extraction of Metadata From Archaeological Data Using iRods Rules

This paper discusses the challenges of managing and preserving digital archaeological data and presents a solution for automating the extraction of metadata using iRODS rules. It highlights the collaboration between ICA and TACC, the development of file and directory naming conventions, and the mapping of metadata schemas. The use of iRODS and client interfaces is explored, and conclusions regarding the benefits of this approach are drawn.

ernaf
Download Presentation

Automating the Extraction of Metadata From Archaeological Data Using iRods Rules

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Automating the Extraction of Metadata From Archaeological Data Using iRods Rules David Walling, Maria Esteva 6th International Digital Curation Conference

  2. Outline • Challenges of digital archaeological data at ICA • Collaboration with TACC • File and directory naming convention • Metadata mapping • iRODS and Jython metadata extractor script • Client interfaces • Conclusions

  3. Archaeological Data Site is destroyed

  4. Data Creation and Management at ICA • Increasing use of digital photography, GIS, databases, 3D models, digital text, spreadsheets, etc. • Archaeological Recording Kit (ARK) – doesn’t support all data types, raw vs. lower quality images • Multiple data pipelines • Raw data stored in multiple and scattered storage devices • Bottlenecks that complicate the research process

  5. ICA’s Needs • Inventory and organize the raw data • Ensure there is enough metadata for raw data so we know what it is • Preserve raw data object’s relationship to ARK entries and other data objects (context) • Ensure raw data useable in the long-term (site destroyed) • Extract metadata from ARK so it can be both preserved and shared

  6. TACC • Mission: to enable discoveries that advance science and society through the application of advanced computing technologies • Provide HPC, visualization, and data management resources and consulting, training, R&D • Traditionally physical sciences and engineering • Traditionally HPC and visualization • Humanities, Art, Herbarium

  7. TACC/ICA Collaboration • ICA data managers, TACC providing resource and consulting • Inventory of ICA data collection • Document archaeological research workflow and identify points where data is both created and moved to storage • Development of file and directory naming convention • Development of metadata extractor script to generate metadata on iRODS ingestion

  8. Directory Hierarchy • Folder name = subject category • Familiar concept • Flexible, developed by users

  9. File Naming Convention • Example: sfi_CH01PA_8_a2_m.JPG • sfi_CH01PA_8 • The object code as it exists in ARK • Connect to metadata in ARK • a2 • Research Stage (before/after/during conservation, lifting, microscope, studio) • m • ‘m’ = master/raw • ‘p’ = publication/re-purposing

  10. Metadata Schemas • Descriptive metadata -> Dublin Core Qualified (DC-Q) • Technical metadata -> Preservation Metadata: Implementation Strategies (PREMIS) • Encapsulate both in Metadata Encoding and Transmission Standard (METS)

  11. Mapping to DC-Q • Directory names -> dcterms:subject • Object code in file name (sfi_CH01PA_8) -> dcterms:isPartOf • Stage of research -> dcterms:isPartOf • Context code from ARK -> dcterms:isPartOf • Description in ARK -> dcterms:description

  12. Mapping to PREMIS • File Information Tool Set (FITS) aggregates several open source preservation metadata generators and outputs to XML. • Jhove, Exiftool, NLNZ Metadata Extractor, DROUID, FFIdent, File Utility • Straightforward mapping to PREMIS schema

  13. iRODS and Corral • Integrated Rule Oriented Data System (iRODS) provides a virtual file system and rule engine for long term data preservation. • Hosted on TACC’s 1.2+ PB multi-purpose storage application resource Corral. • This basic infrastructure provides flexible backbone for wide variety of data collection projects.

  14. Task Workflow

  15. Client Interfaces

  16. Client Interfaces

  17. Conclusion • Organizational method allowing for the automatic extraction of metadata as it is ingested to long-term storage soon after creation • Template for running external tools with iRODs • Template for extracting PREMIS data from files • Technical details and code/configuration available in paper

More Related