120 likes | 229 Views
Jim Tuttle North Carolina State University Libraries. Tools Development and Demonstration: North Carolina Geospatial Data Archiving Project. Process Overview . Data transfer Threat and format analysis, validation Archive package organization Selective format migration
E N D
Jim Tuttle North Carolina State University Libraries Tools Development and Demonstration:North Carolina Geospatial Data Archiving Project
Process Overview Data transfer Threat and format analysis, validation Archive package organization Selective format migration Metadata normalization and supplementation Source metadata translation Statistics collection Extra-repository AIP management
Data Transfer Python Md5sum comparison 'Transfer set' metadata capture in 'Seed file'
Threat and format analysis, validation Python wrappers for the following: Virus – ClamAV Compressed files (tar, zip, gzip, bzip) Geodatabases (extension and size) Executable files (magic numbers) Jhove validation
Archive package organization ESRI ArcGIS toolbar for selected formats
Archive package organization • Rule-based python logic • filestem • extension relationships ( multi-file format validation) • directory structure • Manual intervention • metadata.doc • NOID assignment
Selective Format Migration Coversions using ArcGIS toolbar e00 interchange to coverage to shapefile geodatabase to raster, shapefile, etc Original files retained
Metadata Normalization & Supplementation Agency-specific XML templates in ArcCatalog with synchronization flags Provenance and curation metadata scripted
Source Metadata Translation • Hub-and-spoke model a la Echo Depository • repository agnostic • modular conversion hub • facilitate repository software migration & inter-archive exchange
Statistics Collection Python scripted statistics generation: number of files by format cumulative size by format mean file size collection size agency contribution
Extra-repository AIP management Workflow Management Database populated as a spoke on the metadata/ingest hub External tracking of NOID, Handle, ISO keywords, other metadata for interaction with other systems
Questions? Jim Tuttle Geospatial Data Librarian &Project Coordinator NCGDAP NCSU Libraries jim_tuttle at ncsu dot edu http://www.lib.ncsu.edu/ncgdap/