200 likes | 298 Views
The Pros & Cons of Content Addressed Storage. Arun Taneja Founder & Consulting Analyst. Something Must Be Done!. Current Data Protection Environment. Data Tsunami No Backup Windows Cost of Downtime Increasing Regulations and Compliance Requirements
E N D
The Pros & Cons ofContent Addressed Storage Arun Taneja Founder & Consulting Analyst
Something Must Be Done! Current Data Protection Environment • Data Tsunami • No Backup Windows • Cost of Downtime Increasing • Regulations and Compliance Requirements • Data Protection Technology at Break Point
FCIP iSCSI NDMP DAFS iFCP RDMA CAS GRID TOE RAIN SATA SMI-S SAS Many New Technologies to The Rescue
Concept whereby the address of an object is computed from the content of that object Definition Advantages Disadvantages • Location Independence • Authenticity • Simplified Indexing • Scalability to Exabytes • Load Balancing • Elimination of Duplication • New and Unfamiliar • May Require Changes to Applications • May Require Procedural Changes • May Require Abandoning Existing Applications What is CAS?
CAS vs Networked Storage • SAN & NAS Use File Systems to Place and Locate Data (/abc/xyz/acme.doc) • Hierarchical • Difficult to Scale Beyond TBs • Application Determines if Duplication of Object Exists • Indexing can Become Complicated
Algorithm Applied to the Object’s Content File Portion of a file Directory or file system Unique 128-bit Coding Results (160-bits for Avamar) Object (File, FS, Dir) How is CAS Done? 128-bit hash unique to that object (eg. MD5)
What Can CAS Be Used For? • Archival Storage • Backup and Restore • Disaster Recovery • Content Management
Backup and Restore/DR Archive/Content Mgmt • Lack of Authenticity • Media/Technology Changes • Tape Environmental Issues • Poor Access Times • TCO Expensive • Slow Queries from Large Reps • Centralized Indexing • Application Performance • Generates Tons of Data 10:1 • Backup Windows • No Guarantee if Data is Recoverable • DR Expensive • DR: Potential Consistency Issues Issues with Existing Architectures
Methods for Keeping More Data Online • Bigger Primary Storage • Compression of Data • Hierarchical Storage Architectures • Data Normalization: Finding Subsets of Data That are Common and Storing Them Only Once • No Limit on the Effective Compression Ratio • Indexing Systems Super Critical
Commonality Factoring Using CAS • Fixed Size Atomics for Database • Variable Size Atomics for File Systems • CAS Algorithms Used to Calculate CA for Each Subset • Data Structures Needed to Reconstruct from Atomics • Above Data Kept with Atomics Data
CAS Example: Avamar • CAS Applied to BU/Restore, Archive and DR (initial focus BU/R) • Focus on Data Reduction • Typical Secondary to Primary Ratio is 10:1 • Avamar Claims 1.2 to 1 • Never Do Full + Incremental Backups, Only SnapUps
CAS Example:Avamar Systems Architecture • Distributed Backup Repository • Peer-to-Peer RAIN Architecture • Each Node has Uniform and Consistent View of Repository • Clients can Request Services from any Node • Data Striped Across Nodes (similar to RAID) • No Single Point of Failure • Requires Agent on Each Client System
CAS Archival Example:EMC Centera CA of CDF Returned Centera Application CDF CA of CDF store CDF XML Calculate CA and extract metadata metadata C-clip CA store file Blob API Source: EMC
Due to Architecture Due to CAS • No LUNs to Create or Manage • No Volumes to Create or Manage • Flat Addressing, Simple Indexing • Content Authentication • One Copy of Blob Stored • RAIN=Non-disruptive Scalability • No Reconfigs Required • No Technology Obsolescence • Policy-based Storage of Blobs • Application Modification CAS Advantages: EMC Centera
CAS Players Data Center Technologies Persist Technologies
CAS Futures: What's Needed? • Flexible Scaling Capabilities • Integration with File Interfaces • Easy API-free Application Integration • Integrated Indexing
CAS +’s CAS -’s • Many Aspects are Untested • May Require New Procedures/Tools • Disruptive Technology • Not Good Enough for High Performance Primary Needs • Location Independence • Authenticity • Eliminate Redundancy • Simplify Indexing • Simplify Management • Improve Scalability • Single System Image of Repository Summary
No Wholesale Changes! Taneja Group Recommendations • Absolutely Test Out CAS Systems but… • Apply to a Project at a Time (consider the disruptive factor) • Keep a Fallback Position (run systems in parallel) • Test Out Recoverability Regularly • Keep in Mind…More Solutions Coming