300 likes | 319 Views
Learn about the North Carolina State University Libraries' projects in building a distributed institutional repository, including collections on electronic theses and dissertations, technical reports, faculty publications, and more. Explore their repository planning, governance structure, and partnerships with departments and institutes.
E N D
Digital Repository Projects at the North Carolina State University Libraries James Jackson Sanborn Jim Tuttle Open Repositories/DSpace User Group ‘07
Early Repository Planning • Digital Repository Planning Committee • What it wouldn’t be (at least to start) • Distributed community structure • Open submission • ‘Institutional’ Repository • What it would be (at least to start) • Library-managed collections • Building block for campus partnership • Learning opportunity
Repository Building Blocks • NCSU Electronic Theses and Dissertations • Started 1997 • Mandatory since 2002 • Virginia Tech’s ETDdb • ~3,000 ETDs • NCSU Authors Database • Started 1995 • Access Database/Cold Fusion front-end • ~22,000 citations
Repository Building Blocks (cont’d) • Technical Reports Print Collection • Campus Institutes and Departments • Massive fall-off in print distribution • Special Collections Resource Center • Digitized texts and photographs • Campus Newsletters • GIS Data • Library managed/acquired data collection • Homegrown data layer database/discovery tools
Repository Plan • Target ‘Research’ collections first • Technical Reports • ETDs • Faculty Publications/Citations • Treat each collection as its own project • Actively pursue common technological solutions
Technical Reports • DSpace Application • Lightly Customized • Library Harvested • Local Cataloging/Metadata database • Scripted Ingest Object Creation • Batch Ingest • Mix of ongoing submission by institute/departmental personnel and Library capture.
Electronic Theses & Dissertations • Partnership with Graduate School • Hybrid System: DSpace and ETD-db • ETD-db submission/approval/management • Direct database extract for DSpace Ingest Object creation • Scheduled Batch Ingest process • DSpace Considerations/Alterations • Metadata Mapping • Author Browse (exclude contributor.advisor) • Various interface changes
Faculty Publications • Built on Existing Author Database • Rebuilt Authors DB from Access/ColdFusion to Oracle/PHP • Re-modeled data • Added Functionality • OpenURL • ‘Vita-like’ citation display • Full-text or submission links • Full-text stored in DSpace • Citation metadata and file exported by script • DSpace Identifier currently manually entered
Faculty Publications Schematic Scholar Submit Citations and/or Text View full-text S+R Citations Web Submission Form Web interface (php) DSpace Item Display PostgreSQL (metadata) DSpace Java/JSP (full-text only) Oracle Faculty Publications DB (citations) Handle IDs File System (files) Access ISI Ann. Reps Etc. Add/Edit data Cataloging and Coll. Mgt.
Repository Governance • Internal • Digital Repository Planning Committee • Data Repository Architect • External • Faculty Repository Advisory Committee • Partnerships with departments and institutes
NCGDAP: Overview • NDIIPP: National Digital Information Infrastructure and Preservation Program • Collaboration with Library of Congress • 1 of 8 three year projects to study long-term (50+ years) digital preservation • Objective: engage existing state/federal geospatial data infrastructures in preservation • Project approaches: Technical and Social
Repository Requirements • Dim archive with possible future access • minimal IR/access component • Minimal repository imprint on data • repository agnostic ingest and export • Simple digital curation functions • Periodic MD5 checksum validation • Structured metadata index • Expected archived-data exchange • Leverage existing investments • Free Software with active community
Automation: Threat and format analysis, validation Python wrappers for the following: • Anti-virus – ClamAV • Compressed files (tar, zip, gzip, bzip) • At-risk formats • Executable files (magic numbers) • Jhove validation
Automation:Archive package organization • ESRI ArcGIS toolbar for selected formats
Automation:Archive package organization • Rule-based python logic • filestem • extension relationships ( multi-file format validation) • directory structure • Manual intervention • NOID assignment
Metadata:Seed file form • 'Transfer set' metadata capture in 'Seed file' • communicates with DSpace backend, generates xml used to inform later scripts
Metadata:Communities and Collections • Search by type for 100+ communities • Facilitates creation and reduces errors
Curation Processing • At-risk format migration, original retained • Agency-specific XML templates in ArcCatalog with synchronization flags • Provenance and curation metadata scripted
Source Metadata Translation • Repository agnostic approach • Spokes for each transformation • Facilitates export from Dspace into other repositories • Generate Dspace QDC, METS; populate Workflow database
Extra-repository AIP management • Workflow Management Database (WMD) populated as a spoke on the metadata/ingest hub • External tracking of NOID, Handle, ISO keywords, other metadata for interaction with other systems • Integrates with existing GIS Lookup tool
Repository Architecture Overview PostgreSQL One shared username. Separate database for each app repository tomcat instance Tomcat DSpace Internal Faculty Publications PHP/DSpace hybrid Repository (DSpace) • Technical Reports • ETDs Collections (DSpace) SCRC --Course Catalogs --Green ‘N’ Growing NDIIPP (DSpace) SCRC (DSpace) Asset Store/ ATABeast (sub-directory for each DSpace app)
Upcoming Repository Related Projects • Enhancements to current system • XTF search interface • Inter-archive exchange • Digital Collections Repository • Special Collections Research Center • Other non-faculty collections • Data Repository • Scientific data • Statistical resources
For More Information: • James Jackson Sanborn • james_sanborn@ncsu.edu • Jim Tuttle • jim_tuttle@ncsu.edu