200 likes | 381 Views
2. . Preservation is a Stage in the Data Life Cycle. ProjectCollectionPrivateLocalPolicy. DataGridSharedDistributionPolicy. DigitalLibraryPublishedDescriptionPolicy. DataProcessingPipelineAnalyzedServicePolicy. ReferenceCollectionPreservedRepresentationPolicy. Federa
E N D
1. 1 Policy-Based Data Managementintegrated Rule Oriented Data System Reagan Moore
rwmoore@renci.org
Arcot Rajasekar
sekar@diceresearch.org
Mike Wan
mwan@diceresearch.org
2. 2 Preservation is a Stage in the Data Life Cycle
3. 3 Policy-based Preservation Environment Purpose - reason a preservation environment is assembled
Properties - attributes needed to ensure the purpose
Policies - control for ensuring maintenance of properties
Procedures - functions that implement the policies
State information - results of applying the procedures
Assessment criteria - validation that state information conforms to the desired purpose
Federation - controlled sharing of logical name spaces
These are the necessary elements for a preservation environment
4. 4 iRODS - Policy-based Data Management Turn policies into computer actionable rules
Compose rules by chaining standard operations
Standard operations (micro-services) executed at the remote storage location
Manage state information as attributes on namespaces:
Files / collections /users / resources / rules
Validate assessment criteria
Queries on state information, parsing of audit trails
Automate administrative functions
Minimize labor costs
5. Policy-based Preservation - Authenticity Purpose - Maintain authenticity of records
Properties - Define template for required representation information
Policies - Extract and register representation information for each file on ingestion
Procedures - Parse record / XML file to extract metadata
State information - Register representation information into metadata catalog
Assessment criteria - Compare registered metadata with template defining required values
A preservation environment should automate each of these steps
6. Assessment Criteria NARA Electronic Records Archive capabilities list
853 defined capabilities
Mapped to 174 computer actionable rules
Mapped to 212 state information attributes
RLG/NARA Trusted Repository Audit Checklist
Mapped to 105 computer actionable rules
Included 66 rules specific to preservation
ISO Mission Operations Information Management System repository audit checklist
106 policies for operation and control
Mapped to 52 computer actionable rules
7. Examples of Assessment Criteria Specify
a template that governs the representation information required for a specific record series
content of a Submission Information Package (SIP)
content of an Archival Information Package (AIP)
number of replicas
Verify
compliance of SIP with specification
compliance of AIP with specification
compliance with required replica number
integrity of the replicas
8. Preservation Communities NARA Transcontinental Persistent Archive Prototype
Develop policies to automate preservation of selected digital holdings
National Optical Astronomy Observatory
Accession images from a telescope in Chile
Carolina Digital Repository
Preserve institutional collections
9. 9 National Archives and Records Administration Transcontinental Persistent Archive Prototype
10. NOAO Zone Architecture
12. Preservation Concepts Preservation environments are inherently distributed and federated
Mitigate risk of data loss
Mitigate dependence on a single vendor
Mitigate dependence on a single institution
Management of technology evolution can be done through same mechanisms that support interoperability across heterogeneous storage systems
At the point in time when add new technology, both the old and new technologies are present
Migrate from old protocols to new protocols using data grids
13. Preservation Concepts (Cont.) Preservation requires management of communication with the future
Need to migrate records to future technology
Need procedural infrastructure independence to ensure can parse data formats in the future
Preservation requires management of communication from the past
Need to know what policies and procedures were applied by prior archivists
Need to validate that policies were enforced
Federation minimizes risk of data loss
Deep archive implemented through rules that:
turn on data staging, data versioning, replication
turn off deletion, external write, external data grid access
14. Preservation Concepts (Cont.) Periodic verification of assessment criteria
Check that required properties still hold
These rules are in addition to the rules that enforce policies
Compare values in metadata catalog with expected values
Number of replicas, checksums of files, required metadata
Verify relationships between files in storage and entries in metadata catalog
Metadata record <----> files in storage
Parse audit trails to track compliance over time
Evaluate impact of changing preservation policy
15. 15
16. Managing Properties of Records Namespaces
Record (file name)
Users
Storage resources
Rules
State information
State information
User-defined metadata (provenance)
System attributes
Procedures
Basic operations performed on data
Store, retrieve, move, copy, replicate, parse, aggregate
Extract metadata, checksum, synchronize, version
17. 17 Migration of Procedures
18. 18 Format of an iRODS Rule Action | Condition | MS1, …, MSn | RMS1, …, RMSn
Action
Name of action to be performed
Name known to the server and invoked by server
Condition – condition under which the rule applies
Micro-services - Chain of micro-services to be executed
Recovery micro-service - If any micro service fails, recovery micro-service(s) executed to maintain transactional consistency
Example of MS/RMS
createFile(*F) removeFile(*F)
ingestMetadata(*F,*M) rollback
19. 19 iRODS - Distributed Operating System
20. 20 iRODS is a "coordinated NSF/OCI-Nat'l Archives research activity" under the auspices of the President's NITRD Program and is identified as among the priorities underlying the President's 2009 Budget Supplement in the area of Human and Computer Interaction Information Management technology research.
Reagan W. Moore
rwmoore@renci.org
http://irods.diceresearch.org