1 / 63

Infrastructure for preservation

Infrastructure for preservation. David Giaretta Director of CASPAR Project and Associate Director UK Digital Curation Centre. Outline. OAIS concepts Preservation Infrastructure – CASPAR and DCC Registry issues. OAIS (ISO14721). Open Archival Information System Reference Model

wilmet
Download Presentation

Infrastructure for preservation

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Infrastructure for preservation David Giaretta Director of CASPAR Project and Associate Director UK Digital Curation Centre

  2. Outline • OAIS concepts • Preservation Infrastructure – CASPAR and DCC • Registry issues

  3. OAIS (ISO14721) • Open Archival Information System Reference Model • referenced in just about any serious work on digital preservation • Development hosted by CCSDS Panel 2 • 5 year ISO review underway • minor corrections and updates • No major changes • Revised version due early 2008

  4. Key OAIS (ISO 14721) Concepts • Claiming “This is being preserved” is untestable • Essentially meaningless • How can we make it testable? • Claim to be able to continue to“do something” with it • Understand/use • Need Representation Information • Still meaningless… • Things are too interrelated • Representation Information potentially unlimited • Designated Community • Plus many other concepts

  5. Preservation Planning P R O D U C E R C O N S U M E R Descriptive Info. Data Management Descriptive Info. queries result sets Ingest Access orders SIP DIP AIP AIP Archival Storage Administration MANAGEMENT SIP = Submission Information Package AIP = Archival Information Package DIP = Dissemination Information Package OAIS Functional Entities

  6. Information Object 1+ interpreted interpreted using Data Representation 1+ using Object Information Physical Digital Object Object 1+ Bit Sequence Information Model & Representation Information The Information Model is key Recursion ends at KNOWLEDGEBASE of the DESIGNATED COMMUNITY (this knowledge will change over time and region)

  7. FITS FILE FITS DICTIONARY FITS STANDARD DICTIONARY SPECIFICATION PDF STANDARD FITS JAVA s/w XML SPECIFICATION PDF s/w UNICODE SPECIFICATION JAVA VM

  8. interpretedUsing User Profile RImodule InfoObject DataObject m1 u1 p1 o1 m2 m4 o2 u2 p2 m3 RepInfo Gap Management

  9. Information is the important thing Information: Any type of knowledge that can be exchanged. In an exchange, it is represented by data. • What information? • Documents…… • Data……. • Original bits? • Look and feel? • Behaviour? • Performance? • Explicit/ Implicit/ Tacit Long Term is long enough to be concerned with the impacts of changing technologies, including support for new media and data formats, or with a changing user community. Long Term may extend indefinitely. Ensure that the information to be preserved is Independently Understandable to (and usable by) the Designated Community.

  10. Documents vs Data • Need to preserve information & knowledge – not just “the bits” • Documents, videos are rendered – simple? • Data – must be processed – in new ways - harder • more Representation Information Publication of data as well as documents

  11. Data

  12. Data – OAIS view A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing • In 2006, the amount of digital information created, captured, and replicated worldwide was 161 exabytes(161 billion gigabytes) -roughly 3 million times the information in all the books ever written! • Between 2006 and 2010, the information added annually to the digital universe will increase more than six fold from 161 exabytes to 988 exabytes IDC (2007) The Expanding Digital Universe

  13. people Rendered Object Documents • Documents, articles, journals… • Images • Audio • Video A reinterpretable representation of information in a formalized manner suitable for communication, interpretation, or processing Semantics tends to be ignored

  14. Data

  15. Great Wall of China

  16. Taj Mahal

  17. UNESCO examples World Heritage List • Mandatory Documentation: • Identification of property • Description of property • Justification of inscription • State of conservation and factors affecting the property • Protection and Management • Monitoring • Documentation • Contact information of responsible authorities • Signature on behalf of the State Party(ies) • DATA: • Scanned documents and maps • Aerial and close range photography (Digital photogrammetry) • Monument measurements (Laser scanning) • Satellite images (Remote sensing and image processing) • Multi-scale digital cartography (Geographic information systems (GIS) and CAD) • 3D models, virtual tours (Computer visualization)

  18. Convention for the safeguardingof intangible cultural heritage (entry into force: 2006)

  19. Interactive Multimedia Performance (IMP):Music via Motion (MvM)

  20. Components of an Interactive Multimedia Performance (IMP) • People: directors, performers, technicians, programmers, etc • Documents: e.g. performance plan and procedure, music scores, documentation about performance context, programmes • Musical instruments: traditional (e.g. violin, cello, piano), augmented, … • Particular focus on 3D motion of the performer • Mapping and content generation application (e.g. Max/MSP patches) • Output multimedia contents (e.g. video, graphics, sound) • Supporting applications (e.g. multimedia applications for processing and rendering) • Operating system • Hardware used in the performance: computer, camera, mixer, speakers, … Input 3D motion data Analysis & Processing Mapping Parameters Mapping GUI Multimedia Generation Multimedia output

  21. Data… Level 2 GOME Satellite instrument data

  22. Format “vs” Representation Information • Format • IS Structural Representation Information • IS adequate for rendering a digital object • IS NOT adequate for understanding – especially data

  23. Just Format? representation information rules sfqsftfoubujpo jogpsnbujpo svmft You have a file JHOVE tells you it is WORD version 7 Format Registries – useful but not enough: formats can be used for multiple purposes e.g. audio files used to store configuration parameters

  24. SEAWIFS IMAGE

  25. XML enough? <VOTABLE version="1.1" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.ivoa.net/xml/VOTable/v1.1 http://www.ivoa.net/xml/VOTable/v1.1" xmlns="http://www.ivoa.net/xml/VOTable/v1.1"> <RESOURCE> <TABLE name="6dfgs_E7_subset" nrows="875"> <PARAM arraysize="*" datatype="char" name="Original Source" value="http://www-wfau.roe.ac.uk/6dFGS/6dfgs_E7.fld.gz"> <DESCRIPTION>URL of data file used to create this table.</DESCRIPTION> </PARAM> <PARAM arraysize="*" datatype="char" name="Comment" value="Cut down 6dfGS dataset for TOPCAT demo usage."/> <FIELD arraysize="15" datatype="char" name="TARGET"> <DESCRIPTION>Target name</DESCRIPTION> </FIELD> <FIELD arraysize="11" datatype="char" name="DEC" unit="DMS"> <DATA> <FITS> <STREAM encoding='base64'> U0lNUExFICA9ICAgICAgICAgICAgICAgICAgICBUIC8gU3RhbmRhcmQgRklUUyBm b3JtYXQgICAgICAgICAgICAgICAgICAgICAgICAgICBCSVRQSVggID0gICAgICAg ICAgICAgICAgICAgIDggLyBDaGFyYWN0ZXIgZGF0YSAgICAgICAgICAgICAgICAg ICAgICAgICAgICAgICAgIE5BWElTICAgPSAgICAgICAgICAgICAgICAgICAgMCAv IE5vIGltYWdlLCBqdXN0IGV4dGVuc2lvbnMgICAgICAgICAgICAgICAgICAgICAg <family> <father>John</father> <mother>Mary</mother> <son>Paul</son> </family>

  26. OAIS Representation Information

  27. OAIS Package Description

  28. Levels of applicability of OAIS • Checklist – not just “metadata” e.g. • Representation Information (RepInfo) • Preservation Description Information • Packaging Information • Exhaustive text documentation • RepInfo supporting: • Interoperability • Automation Double payback • Applicable in “GRID” context • usability now as well as later

  29. Outline • OAIS concepts • Preservation Infrastructure – CASPAR and DCC • Registry issues

  30. Sharing • What can be shared? • Care of the bits ….. • Representation Information about…. • How to share it? • Need a way that is independent of domain and type

  31. CASPAR Project EU FP6 Integrated Project Total spend approx. 16MEuro (8.8 MEuro from EU) Started April 2006, for 42 months David Giaretta is Co-ordinator http://www.casparpreserves.eu

  32. CASPAR Aims • Produce tools and techniques to support digital preservation and make it easier to share the cost • must be relatively easy to use • must have a low “buy-in” in terms of effort required for adoption • must avoid requiring wholesale change of everyone else’s systems • must be decentralised and reproducible so that it can live on after the formal end of the CASPAR project • must be “preservable” • must be open: open source, open standards • Cannot do everything • Working closely with other projects

  33. Sharing the burden of preservation • Must be able to pass on custody of “the bits” • May involve transformations over time • Can involve many duplicate copies • Need to harness the “wisdom of crowds” • Wikipedia • Wiki-radio – annotation • Google page-ranking • “given enough eyeballs, all bugs are shallow” BUT – if we are serious about the long term • Need to be proactive • To avoid death by neglect • Need more than a Wiki!

  34. Infrastructure • Somewhere to collect the contributed “wisdom” • Registry/Repositories • Something to remind people to take action • Gap Manager • Orchestration Manager • More “local” tools • Creation of Representation Information, PDI etc • Persistent Storage • …

  35. Things change/disappear How can we ensure that the information trapped in the “bits” remains understandable despite all these changes? • Software • Hardware • Environment • E.g. Network links to related information • People • What is “common knowledge” How can a digital curator even be aware of these changes?

  36. Validation • How can we judge any proposed solution? • CASPAR validation metrics: • Theoretic underpinning • Testbed scenarios addressing real issues • No “hand-waving” – use what is there now • Accelerated lifetime tests • Hardware and Software • Environment • People • Improved “trustability”/”certifiability” Live a long time Evidence - not proof

  37. Rep • Info CASPAR information flow architecture Virtualisation How do we capture the Representation Information?

  38. Preservability Infrastructure • Persistent Identifiers • Distributed and Persistent Storage • Identifying relevant information • Registries of Representation Information • Messaging infrastructure • Workflow • …….

  39. Registry/Repository Scenario The Digital Object could have some RepInfo packed with it. User gets data from archive. User may be unfamiliar with the data so needs Representation Information. Some may be packaged with the data CPID CPID CPID CPID CPID Digital Digital CPID CPID CPID Object Object CPID CPID CPID CPID CPID CPID Archive Archive User

  40. Registry/Repository of Representation Information CPID CPID CPID Representation Information May need MORE Representation Information. Obtain this from one or more Registry/Repository of Rep.Info. CPID CPID CPID CPID CPID Do not want to guess which RepInfo is needed – use an identifier (CPID) associated with the data Digital Digital CPID CPID CPID Object Object CPID CPID CPID CPID CPID CPID Archive Archive

  41. Registry/Repository of Representation Information At some point in the future there is a need for further Representation Information. CPID Someone creates the necessary RepInfo using whatever tools are appropriate. RepInfo Toolkit CPID CPID Representation Information CPID CPID CPID CPID CPID Digital Digital CPID CPID CPID Object Object CPID CPID CPID CPID CPID CPID Archive Archive

  42. CPID CPID Structure = CPID Semantics = CPID Rendering s/w = CPID Structure = CPID Semantics = CPID Rendering s/w = CPID RepInfoLabel – points to other RepInfo Use of RepInfo CPID Each “bag of bits” has an associated pointer (CPID) to a Label • copy Registry External

  43. Create RepInfo RegRep Data Curator RepInfo toolkit Data Source Repository Registry

  44. Use Data RegRep User Data Curator RepInfo toolkit Data Source Application Repository Registry

  45. Create RepInfo Orchestration Gap Manager Data Curator User Data Curator RepInfo toolkit Data Source Application Repository RegRep Registry

  46. Orchestration Gap Manager RegRep Data Curator User RepInfo toolkit Data Source Application Repository Registry

  47. Content independent infrastructure - 2 • Important to note that: • All the above has been rather independent of specifics of the information. • The domain specifics – the hard parts are hidden in e.g. the exact type of RepInfo • Domain independence allows wide applicability

  48. Content dependent components • Representation Information tools • Structure • EAST • DRB • DFDL • Virtualisation assistant • Semantics • RDF editors • RDFSuite • Terminology capture • Software • UVC • Hardware emulators • Trust, Authenticity & Provenance tools • Certification assistant • PREMIS • Packaging tools • XFDU toolkit Use existing tools where applicable Develop new tools as needed and resources allow

  49. Unfamiliar information • Preservation • Digitally encoded information which must be usable and understandable • Unfamiliar because of separation in time • E-Science/GRID/CyberInfrastructure for data • Digitally encoded information which must be usable and understandable • Unfamiliar because of separation in discipline or location – even if created yesterday

More Related