300 likes | 382 Views
Archiving Movies in a Digital World Dave Cavena, Sun Microsystems January, 2007. Introduction to. Agenda. Overview Archiving Archived content integrity Proposed model Costs Alternatives? Summary Conclusion. Overview. Has the time come to begin archiving movies digitally?
E N D
Archiving Movies in a Digital World Dave Cavena, Sun Microsystems January, 2007 Introduction to
Agenda • Overview • Archiving • Archived content integrity • Proposed model • Costs • Alternatives? • Summary • Conclusion
Overview • Has the time come to begin archiving movies digitally? • Only archiving remains reliant on film • Digital image archive technology is mature • A viable, scalable, cost-effective COTS model • What are the alternatives?
Archiving • The stories of an Age • Fiduciary responsibility • A Digital Content Archive can store these assets • without degradation • forever
Archiving • Any movie archived in 1907 is playable in 2007 • Will a celluloid movie archived in 2007 be playable in 2107? • Is it time to start digital archiving of this irreplaceable content? Chairman Vice Chairman
Archiving • Will the Archive be the only time the story exists on film? • What are celluloid archive and repurposing costs? • A Digital Content Archive provides image and cost advantages over celluloid • Can be accomplished with COTS Technology
Archived Content Integrity • Irreplaceable content • Multiple copies • Multiple libraries • Automated audit, copy • Algorithmic assurance of bit integrity • Error Correction Codes (ECC) • Bit Error Detection • Bit Error Correction
Archived Content Integrity • ECC • Standard on tape drives • COTS technology • Bit Error Rates* • Bit Error Rates (BER) differ by manufacturer • ECC undetected BER = 10-33 • Four copies = 10-128 • ECC uncorrected BER = 10-19 • Four copies = 10-76 • 10TB Digital Intermediate = 1014 bits • One uncorrectable bit error in 1062 movies (10-76 * 1014) * Sun T10000 drive
Archived Content Integrity • Generational data integrity • 20 generations of compute/disk front-end • 5 generations of libraries • Unknown generations of application file formats • At least 12 rewrites of the content onto new media • What is the generational impact on the algorithmic BER?
Archived Content Integrity • For this application it doesn’t matter how many times the data is accessed; how many generations of rewrite • Probability that the ECC will fail to correct damage during any given access is 10-19. • The probability it will fail one or more times during N accesses is 1 minus the probability that it will succeed N times in a row: 1-(1-10-19)N • For N less than 1019, this is well approximated by N*10-19
Archived Content Integrity • Example • Assume a movie accessed one million times • The chance of an uncorrectable bit error per read is 10-19 • The chance of an uncorrectable bit error on any one of 106 reads is 106 * 10-19 = 10-13 • For a single copy • It reasonably can be assumed for the purposes of this application that the ability to detect and correct errors in transcription is perfect.
Archived Content Integrity • Other Strategies • Secure Hashing Algorithm, SHA-256* • Checksum failure probability of 2-256, or approximately 10-77 • Four-copy BER = 10-308 • One undetected bit loss in 10294 movies • Birthday collisions don’t apply; not defending against traffic analysis, just using it as a good checksum • Voting bit-by-bit • Can make a 10TB DCDM into 40 1TB files, 31 of which would have to be damaged to preclude rebuilding the original * Developed by the NSA, publicly available, peer-reviewed, easy to implement
Archive Model • Enterprise class tape library • Front-end server and disk • Ingest and prepare Archive Object for writing to tape library • Hierarchical Storage Manager, HSM • Two complete and identical systems, geographically separate • Two copies of each movie on each library
Archive Model • Computers and disk front-ends reach EOSL • 5-yr replacement • Tape drives reach EOSL • 10-yr replacement • Libraries reach EOSL • 20-yr replacement • Tape media has a finite lifetime* • Replace tapes every 10 years • Audit every tape every six months • Re-write from pristine copies as necessary *National Media Lab, IBM, Sun, others, publish 30 years as viable tape media lifetime
Archive Model • Application software and file formats • Proposed archive model HSM uses an open tarball format, readable even without the application • When a tape is audited, rewritten or copied, the new copy can be created in the new file format • This is feasible because the underlying data format remains digitally fixed, only the file format and / or storage medium change
Archive Model • Institutional memory must be created • Two or more sites are required, geographically separate • No network connectivity • Archive content in the clear • Same as current model • Lost key or algorithm will render archive useless • Can be encrypted for transport (tape drive HW encryption becoming the norm) • When copying tapes, send old ones to another location
Archive Model • Oil & Gas has been archiving digital images for decades • Medical is doing this with far higher transaction rates • Library of Congress doing it now • "Storing National Treasures" http://www.enterprisestorageforum.com/sans/features/article.php/3586066 • "Sun Rises at the Library of Congress" http://www.enterprisestorageforum.com/sans/features/article.php/3619646
Costs • Can digital compete with celluloid? • Film archiving cost • $100K /100 years / feature • 2,000 movies = $200M • 10TB archive object, 20 objects/year, 100 years • $45,000/movie (list) • $16,000/movie (Archive pricing) • 2,000 movies = $32M • 100TB archive object • $67,000/movie (Archive pricing) • 2,000 movies = $79M
$3,000,000 $2,601,160 $2,500,000 $2,000,000 Dollars $1,500,000 $1,000,000 $408,631 $500,000 $114,218 $73,094 $57,330 $45,493 $0 10 100 500 1000 1500 2000 Movies in Archive Costs10TB Archive Object – List price
Compute/Disk + Library and Drives mtce 4% + mtce 18% Media 3% Description (both SAM License + libraries, two mtce 75% copies/movie/library) Cost Compute/Disk + mtce $3,200,000 Library and Drives + mtce $16,112,000 Media $2,699,000 SAM License + mtce $68,974,400 Total $90,985,400 Costs10TB Archive Object – List price
$2,500,000 $2,005,850 $2,000,000 $1,500,000 Dollars $1,000,000 $500,000 $236,852 $54,072 $29,706 $21,259 $16,265 $0 10 100 500 1000 1500 2000 Movies in Archive Costs10TB Archive Object – Archive price
$7,000,000 $6,488,535 $6,000,000 $5,006,303 $5,000,000 Dollars $3,791,715 $4,000,000 $3,180,640 $2,641,742 $3,000,000 $2,121,252 $2,000,000 $1,000,000 $0 20 100 500 1000 1500 2000 Movies in Archive Costs100TB Archive Object – List price
$1,600,000 $1,462,466 $1,400,000 $1,200,000 $1,000,000 Dollars $800,000 $600,000 $505,009 $400,000 $170,432 $112,159 $200,000 $85,527 $67,171 $0 20 100 500 1000 1500 2000 Movies in Archive Costs100TB Archive Object – Archive price
Alternatives • An unaddressed question… Does celluloid have a future – at all? • Replaced by commercial photographers globally • Precipitous drop in market share and manufacturer jobs • Environmentally unfriendly to manufacture and process • Celluloid may not be an option • Film may not even exist in 100 years • Film infrastructure – labs, chemicals, workers, etc. - may not exist
Summary • The technology required to store and maintain irreplaceable digital image content for archive durations is mature, proven and in use today • A Digital Content Archive will extend the quick responsiveness of a studio’s Library to the Archive • The return on these increasingly expensive assets easily can be extended – forever • … all using COTS technology
Conclusion The pivotal and immutable point is that this can be done beginning today. The experience Sun brings to the project already has been recognized, and is being broadened by, the Library of Congress and other locations around the world undertaking the digitization of their media assets using solutions from Sun Microsystems. The time is now to begin serious efforts to test and implement studio Digital Content Archives
Thank you Dave Cavena david.cavena@sun.com Introduction to