180 likes | 205 Views
This paper discusses the risk factors and solutions for long-term digital preservation, focusing on media decay, hardware and software obsolescence, human errors, and external events. The study proposes a strategy of carrier migration, replication, and control to ensure data preservation. Various case studies, including mirror disks and backup tapes, illustrate the effectiveness of different preservation strategies. The CIDOC CRM model is used to analyze the risk factors and evaluate the reliability goal.
E N D
Reliability Modelling for Long Term Digital Preservation Panos Constantopoulos, Martin Doerr, Meropi Petraki Information Systems Laboratory Institute of Computer Science Foundation for Research and Technology - Hellas Heraklion, Greece May 12, 2005
The CIDOC CRMOutline • Problem statement • Approach • Case studies • Conclusion
The CIDOC CRMProblem Statement • All Digital Material is vulnerable to loss • Cultural and scientific memory needs long-term preservation: • We would like to have the library of Alexandria back... • A large museum may keep and describe a million objects • It may not want to loose more than 10 objects per year • = 1% loss in 1000 years!
The CIDOC CRMProblem Statement • Risk factors: • Media decay and failure • Access Component Obsolescence (format, H/W) • Human and Software Errors • External events • Format Obsolescence: • Best studied. Measures are standards, technology preservation, migration. • For knowledge in text form, textual databases, vector graphics, bitmap images reasonably solved with XML and extensive documentation.
The CIDOC CRMProblem Statement • Hardware Obsolescence: • Systematic, foreseeable. • Reasonable Solution: carrier migration. • Human errors: • Stochastic failure. Can be reduced but not avoided. • Solution: replication and control • Software errors: • Difficult to model and to foresee. • Replication , multiple S/W platforms and control.
The CIDOC CRMProblem Statement • External Events: • Stochastic failure. • Solution: replication and control • Media decay: • Stochastic and systematic failure. • Solution: Preventive carrier migration, replication and control
The CIDOC CRMProblem Statement • Summary: • In long terms, the basic strategy is carrier migration, replication and control. • The expected life-time of information exceeds any platform and technology. • The respective risk management has hardly been addressed • “The Gksan strategy”: longest human memories known • People of the Haida and Qksan tribes in British Columbia, resident there since Ice Age, keep historical oral memories more than 10.000 years back on land-ownership by: • Distribution to multiple, selected human carriers, annual quality control, and Totem poles as mnemonic aids.
The CIDOC CRMApproach • Statistical modelling of long-term risk of data loss due to media decay and failure and external events. • Analyze risk factors of different configurations • In models for long times, complex aging effects average out. e.g. preventative replacement results in constant average failure rate. Long-term studies are simpler than short-term ones! • Extrapolation of current technology: • Optimal strategy: maintain constant failure rate at any time. This is independent of technology = has to be reevaluated at each technology change, and to be maintained for each technology period. Random processes have no memory
The CIDOC CRMApproach • Analytical models that allow for • Dominant factor analysis • Cost/benefit analysis (future work) to achieve the politically set reliability goal. • “memoryless” Markov chains and fault tree • Evaluation with program “SHARPE”.
The CIDOC CRMCase 1: Mirror Disks • Assumptions: • Two identical disks, constant failure rate , system failure if both are destroyed • MTTF = 1/λ, Mean time to failure • MTTR =1/μ: Mean time to repair, • MTTFD = 1/θ : Mean time to failure detection. λ 2λ λ θ 1 1D F 2 μ
The CIDOC CRMCase 1: Mirror Disks 120d = 4m 360d =12m 740d= 2yrs
The CIDOC CRMCase 1: Mirror Disks • Results: • MTTF = 3yrs, MTTR = 50hrs, MTTFD = 14days : MTTF total = 106,46 yrs • MTTFD = MTTR=0 => MTTF = ! • The dominant factor is only the time to detect failure and to repair! Any quality of the disk can be compensated by faster detection and repair, in the realistic limits. • Any uncontrolled media will loose the data in the long term. • => cost/benefit analysis to be done!
μ2 μ1 λ1 2λ1 θ1 λ1 1,1 1D,1 0,1 0D,1 2,1 θ1 μ3 μ1 λ2 λ2 λ2 λ2 λ2 1D,0 λ1 θ1 2λ1 2,0 1,0 F λ1 λ1 μ3 θ2 μ3 θ2 θ2 λ1 2λ1 1D,0D 2,0D 1,0D μ1 θ1 The CIDOC CRMCase 2: Mirror Disks + Backup Tape MTTF = 1/λ, Mean time to failure, MTTR =1/μ: Mean time to repair, MTTFD = 1/θ : Mean time to failure detection, 1,2 = disk, 3 = tape.
The CIDOC CRMCase 2: Mirror Disks + Backup Tape Coming closer !
The CIDOC CRMCase 2: Mirror Disks + Backup Tape • Adding Fire ! • At least another backup needed in a third room
The CIDOC CRMCase 3: Distributed carriers • Assumptions: Data are distributed to N independent systems with mirror disk and tape each. • Question: Which percentage of my data will exist after 1000 years? (Binomial model)
The CIDOC CRMCase 3: Distributed carriers • If all data are on one system: • High probability to preserve all data • High probability to loose all data • If all data are on many individual systems: • Some data will be lost by sure • Some data will survive by sure • Conclusion: • Optimal strategy may combine both modes!
The CIDOC CRMConclusions • Some results seem notto be very intuitive: • The influence of failure detection and repair time • The effect of data distribution • The effect of external events • Long-term risk modeling allows for simplifications, that allow for analytical models. • Analytical models can effectively turned into decision support tools and combined with cost/benefit models • Future work: A practical decision support tool