130 likes | 138 Views
JHOVE2 A Next-Generation Architecture for Format-Aware Preservation Processing. Digital Library Federation Fall Forum Philadelphia, November 5-7, 2007. Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford University. JHOVE2 project.
E N D
JHOVE2A Next-Generation Architecture forFormat-Aware Preservation Processing Digital Library Federation Fall Forum Philadelphia, November 5-7, 2007 Stephen Abrams Harvard University Evan Owens Portico Tom Cramer Stanford University
JHOVE2 project • Two year NDIIPP-funded collaborative project to develop “next generation” architecture for format-aware preservation processing • Harvard University • Stephen Abrams, Gary McGath, Robin Wendler • Portico • Evan Owens, John Meyer, Sheila Morrissey • Stanford University • Tom Cramer, Richard Anderson, Hannah Frost, Rachel Gollub, Nancy Hoebelheinrich, Keith Johnson • Open source • Educational Community License (ECL) • SourceForge
JHOVE2 project goals • Refactor the existing architecture • Rectify known inefficiencies and idiosyncrasies • Simplify the process of integration • Encourage third-party extensions • Provide enhancements • Separate identification from validation • Standardized error handling • Standardized handling of validation profiles • Standardized reporting using METS, with XSL transform • More sophisticated data model • Arbitrary processing modules
JHOVE2 project goals • Develop modules • Signature-based identification using DROID • Validation and characterization • Symbolic display of selected binary formats • API-level editing capability • Policy-based assessment
Data model • Implicit assumption in JHOVE • 1 object = 1 file = 1 format • But what about… • TIFF with embedded ICC profile and XMP metadata • 1 object = 1 file = 3 formats • JPEG 2000 JPX fragmentation • 1 object = n files = 1 format • ESRI Shapefile • 1 object = 3 files = 3 formats • JHOVE2 will support processing of complex aggregate objects and nested formatted bit streams • 1 object = n files = m formats
Common “backplane” • Outer loop is an iteration over digital objects • Inner loop of processes applied against each object, passing a common memory structure while (has-another-object) { while (has-another-process) { process (object, state); } }
Validation • There is a useful distinction between well-formedness, validity, renderability, and usability • Well-formedness and validity are “bright line” determinations relative to a specification • Renderability is a “bright line” determination relative to a specific rendering tool • Usability is a “fuzzy” determination relative to local policies and heuristics
Policy-based assessment • Evaluate objects based on prior characterization and locally-defined policy rules and heuristics, for example: • Risk of technological obsolescence • Risk of transformative loss • Codify assessment methodologies and best practice recommendations • Develop a formal language in which to express policy rules • Implement a rules engine
Format support • Audio AIFF, WAVE • Color ICC • Document PDF • GIS Shapefile • Image GIF, JPEG, JPEG 2000, TIFF • Text ASCII, HTML, SGML, UTF-8, XML
Schedule • 6 months of community outreach, requirements gathering, and design • 6 months implementation of core APIs and the engine • 1 year implementation of modules • Continual prototyping and re-factoring
Questions (for you)? • Do you care about the open source license (ECL)? • Do you care about the distribution platform (SourceForge)? • Do you have functional requirements or use cases? • How do you use JHOVE today? • What needs doesn’t it meet? • What types of policy assessments do you perform? • How do you quantify risk? • What is your underlying assessment model? • Are you aware of existing expression languages and engines for rules-based assessment?
Questions (for you)? • What can we do to facilitate integration into existing (or planned) systems and workflows? • What can we do to facilitate third-party development and extension? • What help would you need to implement your own modules? • Would you be interested in a co-development arrangement with the JHOVE2 project? • Do you have interesting test files that you are willing to share?