260 likes | 271 Views
Explore the difficulties of preserving digital information, from unraveling complex levels of data to understanding the meaning of bits. Discover the importance of representation information and the need for a designated community to ensure the long-term accessibility of digital objects.
E N D
Why is digital preservation so difficult? September 15/16 2009 Rome
What’s so special about things digital? • 1’s and 0’s difficult to see
Pits on CD Closeup (actual photomicrograph) of stamped CD-ROM, data pits clearly visible Picture rom: http://www.flickr.com/photos/eaj836/2559025266/
CD • Need to unravel the various levels of : • Bit stuffing • Error correction codes • Logical addressing • Fragmented files • File systems
Alternatives • Carve 1’s and 0’s in stone • Write very small characters into titanium sheets But • What do the bits mean?
Components of an Interactive Multimedia Performance (IMP) • People: directors, performers, technicians, programmers, etc • Documents: e.g. performance plan and procedure, music scores, documentation about performance context, programmes • Musical instruments: traditional (e.g. violin, cello, piano), augmented, … • Particular focus on 3D motion of the performer • Mapping and content generation application (e.g. Max/MSP patches) • Output multimedia contents (e.g. video, graphics, sound) • Supporting applications (e.g. multimedia applications for processing and rendering) • Operating system • Hardware used in the performance: computer, camera, mixer, speakers, … Input 3D motion data Analysis & Processing Mapping Parameters Mapping GUI Multimedia Generation Multimedia output
Just Format? representation information rules sfqsftfoubujpo jogpsnbujpo svmft You have a file JHOVE tells you it is WORD version 7 Format Registries – useful but not enough: formats can be used for multiple purposes e.g. audio files used to store configuration parameters
Data… Level 2 GOME Satellite instrument data
Example: Identification of an Attribution Right LF1. Written_Norm Art. X of Law Y Legislation is_documented_in 100% precision CR. Activity_Type CR51. Attribution_Right generates To claim authorship Singleton CR20. Perform allows Singleton Work’s Provenance 100% recall, <100% precision has_type has_type E7. Activity Kia claiming authorship E7. Activity E39. Actor F28. Expression_Creation Kia Ng Activity of Improvisation on the Violin performed_by carried_out has_right_type created E30. Right E72. Legal Object CR.Ownership Right is_on F22. Self_contained_Expression Kia’s right to claim authorship Expression of the Improvisation on the Violin Derived Property Rights became_owner_of Thanks to MetaWare FRBRoo Rights Ontology CIDOC-CRM
What is needed? MONEY
Disincentives for curation: cost Budget available • Future generations do NOT: • - Vote • - Pay taxes Money If cost of preserving old information increases… Time Need to show that costs will be contained
Digital Preservation… • Easy to do… • …as long as you can provide money forever • Easy to test claims about repositories… • …as long as you live a long time • Reference Model for Open Archival Information System (OAIS) provides an approach • ISO 14721 and also free from http://public.ccsds.org/publications/archive/650x0b1.pdf
Data – OAIS view A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing • In 2006, the amount of digital information created, captured, and replicated worldwide was 161 exabytes(161 billion gigabytes) -roughly 3 million times the information in all the books ever written! • Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes IDC (2007) The Expanding Digital Universe
Key OAIS Concepts • Claiming “This is being preserved” is untestable • Essentially meaningless • Except “BIT PRESERVATION” • How can we make it testable? • Claim to be able to continue to“do something” with it • Understand/use • Need Representation Information • Still meaningless… • Things are too interrelated • Representation Information potentially unlimited • Designated Community • Many other concepts identified • Finer grained taxonomy than simply saying “metadata” • Allows one to ask if one has all the required types
Information Object 1+ interpreted interpreted using Data Representation 1+ using Object Information Physical Digital Object Object 1+ Bit Sequence Representation Information The Information Model is key Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region)
FITS FILE FITS DICTIONARY FITS STANDARD DICTIONARY SPECIFICATION PDF STANDARD FITS JAVA s/w XML SPECIFICATION PDF s/w UNICODE SPECIFICATION JAVA VM
Rep • Info • Virtualisation /DISCIPLINE
Things change/disappear How can we ensure that the information trapped in the “bits” remains understandable despite all these changes? • Software • Hardware • Environment • E.g. Network links to related information • People • What is “common knowledge” How can a digital curator even be aware of these changes?
Digital Preservation • Need to preserve information & knowledge – not just “the bits” • Documents, videos are rendered – simple? • Data – must be processed – in new ways - harder • Need to manage knowledge to keep archives alive through time • Preservation is a process, not a one-time event • Preservation is expensive – costs need to be shared • The alternative is money – endless supplies of money • Open Archival Information Systems Reference Model (ISO 14721) provides a general conceptual framework and terminology • (http://public.ccsds.org/publications/archive/650x0b1.pdf) • OPEN process – not just “Open Archives” Need more than just formats
Survey and preliminary results • PARSE.Insight plus Alliance for Permanent Access • Plus customised surveys with CASPAR, DCC etc • Targets • Researchers • Plus case studies in HEP, (Earth Observation and Social Sciences) • Publishers • Funders • Data managers • Almost 3000 responses so far 1) Creation and use of digital research data 2) Data Re-use 3) Data Preservation 4) Publishing Your Work 5) Final questions http://www.parse-insight.eu/