1 / 12

Understanding the Benefits and Costs of Deduplication

Understanding the Benefits and Costs of Deduplication. Mahmoud Abaza , and Joel Gibson School of Computing and Information Systems, Athabasca University mahmouda@athabascau.ca. Questions to ask…. What is deduplication ? Why is it important to understand?

cicely
Download Presentation

Understanding the Benefits and Costs of Deduplication

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Understanding the Benefits and Costs of Deduplication Mahmoud Abaza, and Joel Gibson School of Computing and Information Systems, Athabasca University mahmouda@athabascau.ca

  2. Questions to ask… • What is deduplication? • Why is it important to understand? • Do all vendors implement deduplication the same way? • How much reduction in physical disk storage can be expected, if any?

  3. .. questions to ask • What are the advantages, and disadvantages including risk? • Is it worth it to my IT budget? • Is deduplication strictly a business tool or could it benefit home users?

  4. Types of deduplications • File-based (example: Micrsoft’s SIS system) • Block-based (digital signature for each block) • Delta Encoding (storing one file as well as the difference between two files )

  5. Deduplication side • Client-side Deduplication (deduplication before copying to array server) • Target-side Deduplication (deduplication that occurs on a backup set after it has been copied to a storage array )

  6. Target-side Deduplication Process • In-line processing (while data is being ingested into the storage system) • Post processing (The data is first written to disk, and then checked for similar copies. )

  7. Inline Processing • Advantage : Reduces amount of overall disk IO • Disadvantage: Slow ingestion time

  8. Post Processing • Advantage : multiple hosts and CPUs can be involved to make the process fast. • Disadvantage: Requires a large pool of storage , plus large disk IO

  9. How much reduction in physical disk storage can be expected, if any? • Depends on type of data. Case studies: • Data Domain LLC, TiVo was able to achieve “data compression rates of 30 to 1 consistently.” • study of SIS found that “for 4 weeks of full backups, achieves 87% of the savings of block-based.”

  10. Experimental Results A deduplication algorithm is run against some real-world data on personal workstation. We chose to backup a set of folders that contained mostly software downloads, music, photos, and videos – a real challenge considering these are typically compressed files already.

  11. Home Based Deduplication Results • Run # 1 - Initial Backup • New files added to backup: 15 935 • Total size of files: 98.8 GB • Physical disk space used for backup: 85.5 GB • Time to process:03:13:47 hh:mm:ss • Run # 2 - Second Backup • New files added to backup: 57 • Size of files: 105 MB • Physical disk space used for backup: 83.7 MB • Time to process: 00:01:49 hh:mm:ss

  12. Conclusion: deduplication. It can mean different things to different vendors, but the basic premise is the same – eliminate duplicate data.

More Related