Archival Storage Venti : A new approach to archival storage Sean Quinlan and Sean Dorward

Archival StorageVenti : A new approach to archival storageSean Quinlan and Sean Dorward CS 599 Special topics in OS and Distributed Storage Systems Rohit Kulkarni 4th Feb 2004

Outline • Archival Storage • Venti : key ideas • Applications • Implementation • Performance • Reliability and Recovery • Conclusion & Questions?

Archival Storage • Storing data for long periods of time - forever • Tape backup • Central server for no. of clients • Restoring data painful • Full backup Vs Incremental backup • Snapshots • consistent read-only view of file system at some point in past • Maintains file system permissions • Can be accessed by standard tools – ls, cat, cp, grep, diff • Avoids tradeoff between full Vs incremental backup • Looks like full backup • Implementation resembles incremental backup – share blocks

Venti • GOAL: “To provide a write-once archival repository that can be shared by multiple client machines and applications” • Block level network storage system • actually a backend storage for client apps • Blocks addressed by hash of their contents • uses SHA-1 algorithm • SHA-1 output is 160 bit (20 byte) fingerprint of data block • Write once policy • once written cannot be deleted • Multiple writes of same data coalesced • data sharing – saves storage capacity • makes write operation idempotent

Venti (2) • Multiple clients can share a Venti server • Hash fn gives an universal namespace • Inherent integrity checking of data • Fingerprint computation on retrieval • Caching is simplified • Uses magnetic disks as storage technology • access time comparable to non-archival data

Data divided into blocks App needs fingerprint for retrieval Fingerprints packed together -> pointer blks Above repeated recursively to get single fingerprint -> root of tree Data Organization

New or modified data blocks are stored Unchanged sections of tree reused Data organization (2)

Data organization (3) • More complex data structures • Mixing data and fingerprints in a block • e.g. structure for storing file system • 3 types of blocks • Directory – has file meta info + root fingerprint • Pointer • Data

Venti application 1 : Vac • Similar to zip & tar – storing collection of files and directories as single object • tree of blocks formed for selected files • vac archive file -> 45 bytes long • 20 bytes for root fingerprint • 25 bytes fixed header string • any amount of data compressed down to 45 bytes • Unvac – to restore file from archive • Duplicate copies of file coalesced on server • Multiple users vac same data – only 1 copy stored • vac on changed contents

Venti application 2 : Physical Level Backup • Copy the raw disk blocks to Venti • No need to walk file hierarchy • Gives higher throughput • Duplicate blocks are coalesced • User sees full backup of device • Storage space advs of incremental backup retained • Random access possible • Directly mounting a backup file system image • lazy restore – restore on demand

Venti application 3 : Plan 9 File System • Plan 9 FS on top of Venti • Primary location for data • Small amount of read/write storage • Stores daily changes to file system • Smaller than active file system • Venti stores permanent changes

Append-only log of data blocks RAID array Separate index maps fingerprints to log location Fingerprint location in index is random Striped across multiple disks Write buffering Block cache Hit -> index lookup & data log bypassed index cache Hit -> index lookup bypassed Implementation

Implementation (2)

no. of fingerprints >> no. of blocks on a server Index as disk-resident hash table Hashing fn maps fingerprints to index buckets Implementation (3)

Performance: computing environments • 2 plan 9 file servers, bootes and emelie • Spanning 1990 to 2001 • 522 user accounts, 50-100 active all the time • Numerous development projects hosted • Several large data sets • Astronomical data, satellite imagery, multimedia files

Performance (2)

Performance (3) • When stored on Venti, size of jukebox data reduced by 3 factors • Elimination of duplicate blocks • Elimination of block fragmentation • Compression of block contents

Reliability and Recovery • Tools for integrity checking & error recovery • Verifying structure of arena • Checking there is an index entry for every block in data log, vice versa • Rebuilding index form data log • Copying arena to removable media • Data log on RAID 5 disk array • Protection against single drive failures • Off-site mirrors for server • Storing to write-once read-many optical jukeboxes

Future Work • Load balancing • Distribute Venti across multiple machines • Replicate server • Use of proxy server to hide it from client • Better access control • Currently just authentication to server • Single root fingerprint gives access to entire file tree • Use of variable sized blocks as in LBFS

Conclusion • Addressing block by SHA-1 hash of contents • Write once model • Reduces accidental or malicious data loss • Simplifies administration • Simplifies caching • Allows sharing of data • Magnetic disks as storage technology • Large capacity at low price • Random access • Performance comparable to non-archival data

Questions ? Thank You

Archival Storage Venti : A new approach to archival storage Sean Quinlan and Sean Dorward