130 likes | 203 Views
Metadata Tools for JISC Digitisation Projects of still images and text. Ed Fay BOPCRIS, Hartley Library University of Southampton. Overview: BOPCRIS today. Move to work natively with standards Interoperability Preservation Design project procedures from ground up with metadata in mind
E N D
Metadata Toolsfor JISC Digitisation Projectsof still images and text Ed Fay BOPCRIS, Hartley Library University of Southampton
Overview: BOPCRIS today • Move to work natively with standards • Interoperability • Preservation • Design project procedures from ground up with metadata in mind • File-naming and directory structuring • Metadata capture processes • Production workflow that automates where possible • Minimize possibility for human error / subjectivity • “Final package” of digital object that records preservation information on the “digital shelf” and aims for maximum interoperability between systems, all in one place
Overview: technical details • File-naming / directory structure • Incorporating project-specific “unique ids” • Final package (digital object) • Internally consistent “tarball” [*.TAR] • Relative path-naming conventions • METS wrapper • Extension formats for metadata: descriptive (MODS); technical (MIX); process (PREMIS) • Production workflow • Automated production of final package • Metadata recording • Dynamic input by scanner operators
History • Eighteenth Century Parliamentary Papers • Project under Phase 1 of JISC Digitization Programme • Proprietary system and data formats (Agora) • Manual input of metadata • Descriptive and Structural • Advantages and Disadvantages
History: Advantages • Proprietary system with advanced functionality: • OCR workflow • Web presentation • Highly customizable • Metadata fields specified and modified at will
History: Disadvantages • Non-standard metadata fields • No mapping to standard formats • difficulties: interoperability; metadata harvesting • Translation • Between systems, or between “use” and “archive” formats • introduces possibility of versioning issues • No scope for preservation metadata • Separation between workflow / presentation system and preservation strategy • Resulted in disparate collection of scripts and tools to manage data
Present: Metadata Standards • Bibliographic database export • File-system level • Directory structure • File-naming conventions • Scanning level • TIFF headers • Additional descriptive metadata • METS profile • Tailored to project needs • Extension formats (MODS, MIX, PREMIS) • Checksums (MD5)
Present: Metadata Origins File-naming Directory structure Bibliographic Metadata MARC21 / MODS / etc. PRECURSORS GENERATED • Scanned Images • TIFF headers • MIX • (Z39.87) • Other metadata • Process • Additional descriptive • PREMIS • Custom dmdSec OCR (Agora / ABBYY) METS • File formats • TIFF master / Derived JPEG • Flat text (TXT) & Word-co-ordinated OCR (TAR)
Present: Digital Object (“final package”) (1) ID.TAR METS XML ./ID.XML dmdSec MODS XML amdSec MIX, PREMIS XML fileSec ./master (TIFF) ./derived (JPEG) ./txt (plain text) ./idx (word-co-ordinated) structMap physical logical Master images (TIFF) ./master/ Derived images (JPEG) ./derived/ Text OCR (TXT) ./txt/ Word-co-ordinated OCR (IDX) ./idx/ (2) ID.CHECKSUM (MD5)
Future • One tool for entire process, from scanned images to METS • Tool would: • Extract technical metadata • Include descriptive metadata • Build flat-structure METS • Tool would require: • File-naming, directory-structuring conventions • Image file sources
Future: Advantages • Abstraction = standardization • All digitization projects will produce metadata in similar formats interoperability • Certain technical base-standards will be present preservation • Any centrally developed preservation or presentation systems would be able to ingest output from any project • Saves wasted effort developing similar solutions many times, when one solution can be developed once and adapted
Future: Questions… • Usefulness of such a tool? • Relevance to your project? • Problems / obstacles? • How much flexibility is necessary? • Manual input / editing? • Main points: • Abstraction, functionality, flexibility
Further information • Ed Fay, Software Developer • BOPCRIS, Hartley Library • University of Southampton • ef1@soton.ac.uk • 023 8059 3575