680 likes | 766 Views
Beyond code: Versioning data with Git and Mercurial. Stephanie Collett and Martin Haye. California Digital Library, University of California. Not on Agenda. Agenda. Background Case Study #1: eScholarship Backup Case Study #2: Zephir Metadata Summary. Version Control Repository. Code.
E N D
Beyond code: Versioning data with Git and Mercurial Stephanie Collett and Martin Haye California Digital Library, University of California
Agenda • Background • Case Study #1: eScholarship Backup • Case Study #2: Zephir Metadata • Summary
Version Control Repository Code
Version Control Repository Data/Metadata
Case #1 eScholarship Data/Metadata Backup
10 files per work XML Metadata }
~500,000 files total XML Metadata }
XML Metadata Single Mercurial Repository
Working Repository Backup Repository Nightly Sync (hg push)
XML Metadata Single Mercurial Repository
XML Metadata Single Mercurial Repository .hgignore
Working Storage Backup Storage } { Nightly Sync (rsync)
30-60 minutes for the batch job
Logs Date } Commit History Annotation Change
Case #2 Zephir Metadata Management System
File system record/
File system record/ marc.xml
File system record/ marc.xml attrbutes.xml summary.xml transform.xsl
File system record/ .git/ marc.xml attrbutes.xml summary.xml transform.xsl
... /pairtree/ab/cd/e/record/.git /pairtree/ab/cd/ea/record/.git /pairtree/ab/cd/ez/record/.git /pairtree/ab/cd/f2/record/.git /pairtree/ab/cd/f9/record/.git /pairtree/ab/cd/ff/record/.git /pairtree/ab/cd/fm/record/.git /pairtree/ab/cd/fq/record/.git /pairtree/ab/cd/gi/record/.git /pairtree/ab/cd/gw/record/.git /pairtree/ab/cd/gz/record/.git /pairtree/ab/cd/hs/record/.git /pairtree/ab/cd/ht/record/.git /pairtree/ab/cd/i/record/.git ... 10 million }
Versioning + Audit Trail + Diffing + Debugging
record/ marc.xml
record/ marc.xml attrbutes.xml summary.xml transform.xsl
.git/ branches/ config description HEAD hooks/ index info/ objects/ refs/
43 files, ~132k record/ + record/.git
~132k x 10 million record/ + record/.git
43 files x 10 million record/ + record/.git