300 likes | 406 Views
Intelligent Archiving Strategies: Toward ILM. Arun Taneja, Founder and Consulting Analyst, Taneja Group Alex Gorbansky, Senior Analyst, Taneja Group. Agenda. A Bit of Historical Perspective Why Archive? What to Archive? The ILM Panacea Developing an Operational Archival Strategy
E N D
Intelligent Archiving Strategies: Toward ILM Arun Taneja, Founder and Consulting Analyst, Taneja Group Alex Gorbansky, Senior Analyst, Taneja Group
Agenda • A Bit of Historical Perspective • Why Archive? • What to Archive? • The ILM Panacea • Developing an Operational Archival Strategy • Key Considerations • Representative Vendors and Solutions • Conclusions
Archival ≠ Backup BACKUP Copying production data to an alternative medium for restorability in the event of data loss, corruption, or unavailability. ARCHIVAL Retention of historical data for future access for business reasons such as audits, customer issues, or litigation.
Some History On Archiving American Historical Association • Archival standards • Marriages • Businesses American colonists • Births • Marriages • Businesses Ancient Egypt: • Library of Alexandria • Engravings 3000 BCE Middle Ages 1600s 1789 1884 Shift from Feudalism To Nation State: • Records • Property rights French Revolution • Property records
Archival Business Drivers Today REGULATORY COMPLIANCE REQUIREMENTS EXPLOSIVE DATA GROWTH APPLICATION PERFORMANCE DEGRADATION RISING COSTS
What to Archive? • Structured Data: • ERP/CRM DB tiers • Business transactions • Unstructured Data: • Documents • X-Rays • Check Images • Voice recording • Semi-structured Data: • Email • Instant Messaging
ILM…ShmILM • “ILM” is an abstract framework for describing the processes and technology used to manage information throughout its life according to its business value. • “ILM” is NOT the panacea for your storage management challenges.
Archival is a key component of what vendors are calling “ILM Applications: ERP, CRM, Email, Call Recording, Image Access Application Data: Structured, Unstructured, Semi-Structured Policies and Rules Business Context Referential Integrity Regulatory Compliance Data Movement Technologies Snapshots Replication Backup Archival HSM Storage Infrastructure Tiers Primary Secondary Tertiary
Developing an Archival Strategy • PLAN • When/How • Data Classification • Requirements 4. REPORT & TEST 2. DESIGN 3. IMPLEMENT
Why Plan and When to Start • Upfront Planning will Result in Significant Benefits in Future Phases. • Develop an Archival Strategy as part of your application design and development process. • Engage Key Stakeholders: • Application Owners • Business Decision Makers: Compliance Officers, Legal • Identify Key Archival Business Drivers: • Regulatory Compliance • Other: Data Growth, Increasing Costs, Poor Performance
The Data Classification Puzzle • Assess the application data in your shop according to the following categories: • Structured: database • Unstructured: files, videos, images • Semi-structured: email • Identify specific data sets impacted by regulatory compliance: • Examples: Email, Medical Records, Call Recordings
Requirements Definition • Engage Application Owners • Compliance not the ONLY archival driver • Separate requirements processes for applications impacted by compliance. Compliance-specific: • Retention period • Media characteristics • Data restorability rates • Access control policies • Data availability/DR General archival: • Data Access Patterns • Restore time requirements • Application performance • Cost structure • Access control policies • Data availability/DR
Taming the Compliance Monster • Understand the Regulations: Significant Variance by Industry • Assess/Communicate Requirements to Key Business Stakeholders • Judge Products for Yourself – Just because a vendor says a solution is “Compliant” doesn’t make it so. • Stay abreast of changes in regulatory mandates.
Defining Key Archival Metrics • Archive Distribution Percentages Across: • Online: Disk, Object-based storage • Near-line: Optical, Tape (local) • Off-line: Off-site vaults • Number of data copies • Local • Remote
Designing an Archival Solution • Requires an application specific assessment – look for commonality in application requirements • Wholly enterprise-wide strategies will be difficult to build and sustain • Evaluate alternative solutions based on application requirements and metrics
Don’t Ignore the Organizational Dynamics • Archival Touches Multiple Organizations: • IT – Applications • IT – Infrastructure • Legal • Users • Consequences of mistakes are enormous: • Fines • Litigation • Consider organizing a cross-functional team led by an archival champion with a combination of technical and business expertise
Comprehensive Application Assessment • Data Classification Exercise • Data Set Size and Historical and Predicted Data Growth Rates based on business drivers • Is Regulatory Compliance an Issue? • Data Valuation over Time: • Access patterns of data of 90 days old and beyond. • Cost of data loss • Going it alone can be difficult • Available resources: • Services organizations: GlassHouse, Accenture, EDS, Storage Vendor • Application Management Tools: File-Level SRM, Precise • Budgetary Requirements
Components of the Archival Stack Application Data • Application Specific Module • Discovery and analysis of data assets • Business rules and policies definitions • Identification and movement of specific data to • appropriate storage medium • Management, indexing of data and metadata • Access control mechanism Management & Control Data Flow • Storage Infrastructure • Physical archive repository • Data Preservation and Protection • Indexing Technologies for Retrieval Physical Repository
Structured Data Archival Challenges to Investigate • ERP deployments are still very nascent • Preventing application downtime during archival • Preserving referential data integrity: • Archival of core data and associated data in other tables • Enforcing single read-only state across related data • Delivering transparent access to archived/combined data via native app UI • Maintaining performance of remote queries and union views. • Update process: • Restate vs. entire reload
Unstructured Data Considerations • Scalability • Sustained performance with data growth • Hierarchical file-systems limited at large scales • Content Access and Visibility • Meta data use to intelligently manage and maintain archive addresses traditional file system limitations • Scalability of Index (Content addresses)
Email Archival Challenges • Stringent regulations: SEC Rule 17A-4 • Non-rewriteable, non-reusable media • Verification of writes • Serialize units of media • Solution Requirements • Server-based capture • Support for multiple distributed Email Servers
Meta Data Holds Real Value • Object Age and creation date • Object Change History • Associated application/users • Access control • Priority/Criticality • Data Access/Frequency Meta Data is data about data • Digital asset tied to specific infrastructure • No value outside of infrastructure context Traditional File Systems • Self-describing attributes for digital asset • Enables powerful policy-based data movement applications Object-based systems
Amount of Data D2D Systems Object Storage Disk Systems Libraries Probability of Reuse Drives Choosing the Right Storage Medium 1 Week 1 Month 3 Months 1 Year 18 – 30 Years Life Expectancy Recovery Time Minutes Hours to Days < Seconds
Key Considerations for Storage Media • Cost • Access time • Application access method: • NFS/CIFS • Application-specific API • Reliability/Availability • Data Preservation Capability • Scalability • Archival solution integration
Shifting towards an On-line Model Tape Primary Object Storage SATA
Representative Vendors Start with your application vendor
Trust But Verify • Develop processes to periodically access historical data to test: • Data integrity • Access time • Manage capacity growth using vendor-supplied reporting tools
Summary • Archival is not backup and is not just about compliance • Successful strategy requires application-centric approach • Engage with key corporate stakeholders to define requirements and select solutions • Look for automated and interoperable software and hardware modules. • Be Paranoid!
Thank you! • Arun Taneja arunt@tanejagroup.com • Alex Gorbansky alex@tanejagroup.com