230 likes | 241 Views
Deciding When to Forget in the Elephant File System. University of British Columbia: Douglas. S. Santry, Michael J. Feeley, Norman C. Hutchinson, Ross W. Carton, and Jacob Ofir Hewlett-Packard Laboratories: Alistair. C. Veitch December 1999. Presentated by: David Allen May 31 st , 2005.
E N D
Deciding When to Forget in the Elephant File System University of British Columbia: Douglas. S. Santry, Michael J. Feeley, Norman C. Hutchinson, Ross W. Carton, and Jacob Ofir Hewlett-Packard Laboratories: Alistair. C. Veitch December 1999 Presentated by: David Allen May 31st, 2005
Elephant File System: Overview • Undo and Long-Term History • File system that helps to protect data by keeping histories of file and directory changes. • User Control • Gives control over retention policies to the user. • Can be applied at the file level. • Storage Reclamation • Separates storage reclamation from file operations such as write and delete. • Cleaner runs in background to reclaim storage and support the retention policy.
Elephant file system: Why • User Failures • There is already good protection from network, system and media failures. • Now we need to protect from user mistakes. rm *.o is not the same as rm * o
Elephant file system: Why • Cheap Disk Space • Single inexpensive disks were approaching 50GB at time of paper in 1999. • Now in 2005 they are approaching 500GB. • They will be 2TB by 2010.
Elephant file system: Why • Cheap Disk Space • In addition to high-end disk capacity increasing 10x in 6 years, the price is more than 10 times cheaper.
Elephant file system: Why • Cheap Disk Space • Other types of media as well. 8GB compact flash 6GB micro drives (Useful for that 16.7MP Canon camera. 42MB images.)
Elephant file system: Why • Capacity • Large disk capacities. • Constant human productivity. • Only a relatively small set of files that need protection. It makes sense to support revision histories on files and directories.
Elephant file system: Change • Change in pattern of use. • Does this paper stand up to changes in disk usage? • Explosion of large files from still and video digital cameras, mp3 CD rips, and divx DVD rips. • I have 17.8GB of pictures and video from one trip, which I need to prune and edit to a final form. • How would people in the class use this system?
Elephant file system: Policies • Keep One (no versioning) • Just like the FFS. Files changes can overwrite existing data, and are permanent.
Elephant file system: Policies • Keep All (complete versioning) • Like revision control systems. Entire history is maintained.
Elephant file system: Policies • Keep Safe (undo protection) • Keeps recent changes for a specified undo period.
Elephant file system: Policies • Keep Landmarks (long-term history) • In addition to Keep Safe protection, retain important file versions.
Elephant file system: Policies • Application Defined (user specified) • Custom policy implemented at the user level.
Elephant file system: Features for Comparison • User Control • Only retains history on user selected files, with user selected policies. • Custom policies can be created. • Landmarks can be user specified. • Automation • Implemented within the file system. • Revisions are maintained automatically as the files are used. • Landmarks can be determined automatically. • Cleaning is done in the background.
Elephant file system: Features for Comparison • Granularity • Every file and directory change can be kept. • Full or partial long term histories can be maintained. • Files can be grouped to maintain consistency for landmarking. • Versioning on files is done at the block level. • Access • Specific version can be specified with a file and date pair. • Only the current version can be written to. • Most recent revision is fastest, but all versions can be accessed relatively quickly. • Only a single version exists at a time.
Elephant file system: Features for Comparison • Storage • Files with no versions are stored as efficiently as files without versioning. • Revisions to inodes are stored in a inode log, which uses full blocks and is much larger than a single inode. • Directories are stored as name histories.
Elephant file system vs. the Trash Can • User Control • Users manually empty the trash can. This causes files to have different levels of protection based on when they were deleted and when the trash can was emptied. • Automation • Files are automatically moved to the trash can on delete. • Granularity • Very coarse-grained. • Only protects files against accidental deletion. • Only until the trash can is emptied. • No directory protection. • Access • Files can retrieved from the trash can, but the user needs to determine where to put it. • Storage • Copy of entire file is kept in the trash can.
Elephant file system vs. Backups • User Control • Typically no control over system backups. • Users can manually copy files. • Automation • System backups are usually automatic. • Granularity • Very coarse over time. • No fine grained revisioning • No protection between backups. • Typically limited by backup retention policy (number of tapes). • Access • System backups are usually very expensive to retrieve. • User manual backups are usually closer, but not always convenient. • Storage • Usually full or differential copies of the data.
Elephant file system vs. Checkpoints • User Control • Typically no user control over checkpoints. • Automation • Checkpoints are usually automatic. • Granularity • Very coarse over time. • No fine grained revisioning • No protection between backups. • Typically limited by checkpoints retention policy (space). • Access • Typically on-line, easy to get to. • Storage • Efficient. Copy-on-write policy maintains changes to file system after the checkpoint.
Elephant file system vs. Revision Control System • User Control • Only retains history on user selected files, but usually best to use revision control on all files in a directory. • No policies to select, entire history is retained. • File groups can be "tagged" to establish a consistent version. (Like landmarks and grouping.) • Automation • No automation. • Usually a set of command line tools that are initiated by the user. Checkout, commit... • Granularity • Medium granularity. • Only committed changes are kept. • All versions are retained. Often it is difficult or impossible to remove old versions. • Typically revision control does not include directories. (CVS) • Often renaming or moving files will break file histories. (CVS, SourceSafe)
Elephant file system vs. Revision Control System • Access • Files can be accessed by name and version. • Only most recent files can be modified. • Older versions can be branched. • Branches can be merged. • Multiple branches (versions) can exists at a time. • Storage • Text file are usually stored efficiently as differentials. • Access is fast for recent versions and slow for old versions. • Binary file storage is usually inefficient, full copies.
Elephant file system: Summary • Most files don't need versioning so impact is low. • Performance is very close to a system with no versioning. • Storage cost of metadata is high in the prototype implementation. • Disk capacity has increased as predicted in this paper, but so has the need for capacity due to digital music and imaging. • Usage patterns have also changed for the same reasons. • Does this system still make as much sense in the face of these changes? Definitely!
References • "Deciding When to Forget in the Elephant File System." D. S. Santry, M. J. Feeley, N. C. Hutchinson, A. C. Veitch, R. W. Carton, and J. Or, In Proceedings of the Seventeenth ACM Symposium on Operating Systems Principles, December 12-15, 1999, Charleston, SC, pp. 110-123. • Historic disk capacity and price data: http://www.littletechshoppe.com/ns1625/winchest.html • Current media capacities and prices: http://froogle.google.com