200 likes | 353 Views
The Data Immortality Foundation Memory Is Free, Eternally. John Walker http://www.fourmilab.ch/ JD 2452025.7. Modern Problems. Millions of people never back up their data; the capital cost and anguish when the inevitable occurs is enormous.
E N D
The Data Immortality FoundationMemory Is Free, Eternally. John Walker http://www.fourmilab.ch/ JD 2452025.7
Modern Problems Millions of people never back up their data; the capital cost and anguish when the inevitable occurs is enormous. Media and file system formats are ephemeral. Who doesn’t have an 8” floppy or DECtape in their garage they wish they could read? Many easy-to-use and scalable backup solutions are proprietary; if the vendor goes belly-up, you’re out of luck. URLs on the Web break all the time; many citations in peer-reviewed publications will be useless a year or so after the paper appears. This compromises the integrity of the literature.
These are all the same problem!The solution: Data Immortality.
Data Immortality: Phase One It comes by night and sucks the vital essence from your computers. A high performance, arbitrarily scalable, multi-platform, extensible backup and restore solution. Open Source project distributed under the GPL.
Backup/Restore Clients Unix (all flavours), Windows, MacOS, PalmOS, Cell Phones -- anything which can run ANSI C code or be connected to something which can. Installable modules for platform-specific file and file system types. (Ex. Mac resources, Unix log and mail files, etc.) Roll-out / roll-back on platforms with installable file systems.
Storage Servers Any ANSI C platform with TCP/IP capability and one or more backup storage devices. Changers, jukeboxes, and silos supported. High-speed intermediate buffering and background spill to permanent media. Multiple storage servers with configurable redundancy. Hierarchical storage servers.
Administration Web-based with command line alternatives. Hierarchical delegation of administrative authority. Client-level backups and restores (if configured). Comprehensive file backup history and volume database kept in SQL engine; multiple SQL back-ends supported.
Performance Multiple simultaneous backups to a single drive. Optimal media packing; optional compression. Network traffic monitoring and throttling. Load balancing on multiple storage servers and devices.
Security Configurable encryption. Un-trusted storage servers are supported. Everything is checksummed and verified. Robust recovery from media errors. Monitoring of performance statistics and notification of potential problems (e.g. high soft error rate on a storage device). Complete recovery of databases by linearly scanning media.
Scalability No limits on number of clients, storage, or administrative servers. Full IPv6 support throughout. All technology-related numbers (file size, media capacity, etc.) are 128 bit quantities. Textual values (such as file names) are arbitrarily long ISO 10646 strings, with provisions for other formats. Media pool management; recycling, migration, consolidation.
Status December 1999: Project launch. January - February 2000: Design definition, utility module coding. March 2000: Initial implementation (“plan to throw one away”) of “Baby Bacula”. April 2000: Baby Bacula makes first backups on Linux machine (to DLT tape) and Solaris SPARC 2.7 (to 8mm tape).
Status May 2001: Single-machine production backup;2 sites backing up daily. May 2001: Begin co-operative OpenSourcedevelopment with Technical University ofVienna; 2 additional programmers on project. Summer/fall 2001: Windows backup clientprogram (CYGWIN port). May 2002: Projected initial release ofmulti-client remote backup version.
Now, once those pieces are in place, look at what you can do with them….
Phase 2:The Data Immortality Foundation Permanent, multi-site, redundant backup, restore, and archiving over the Internet. Any Bacula client can use the service in addition to, or instead of, local backups. Non-profit foundation incorporated in a stable offshore jurisdiction; safe from commercial take-over or government interference. Business model: a cemetery with “perpetual care”.
The Data Immortality Foundation Multiple, redundant, physically dispersed facilities: to start, for example: Switzerland, Canada, and India. Data are mirrored in all centres; no more “bad asteroid days”. Fee for storing file goes to Foundation’s endowment to preserve it forever (maintenance re-copying, migration to new media). Bacula can be configured to choose which files to immortalise; it’s simply another storage server. Restore is a normal Bacula client task. Encryption protects stored data and transfer to and from client. All tools Open Source and free of patent constraints.
The Data Immortality Foundation Eternal URLs On request, the Foundation will issue a URL for any immortalised file. This URL will retrieve a copy of the file from the Foundation (not necessarily quickly; the Foundation is not an ISP). This URL, something like: http://www.dataimmortality.com/2005-04-22-6526517 will retrieve the file for as long as the Foundation continues to operate. Yes, I’ve already registered “dataimmortality.*”.
Trying this, Myself, at Home Assumptions: Library shelf density, entire 3 floors space used, 2 metre shelf height, DLT tape with 50 Gb average capacity, 5 mm shelving overhead. And the answer is... 15 petabytes (1.5×1016 bytes) per 3 story building. 15 million clients at 1 Gb of unique data each.
Doing it with Disk Drives Hard drive densities surpassedtape media in 2000! So how about it? Assumptions: IBM 75 Gb drive (DTLA 307075), density same as DLT tapes (optimistic considering controllers, cable, and cooling, but drive density will increase).And the answer is... 23 petabytes (2.3×1016 bytes) per 3 story building. 312,120 disk drives per building.Memo to file: Don’t spin them all at once! Suppose you do… Power consumption: 2.5 Mw, plus cooling Electric bill: US$ 375,000 / month Sound emission: 112,000 dB, plus coolingBut, drives are rated for 40,000 spin-up/down cycles!