390 likes | 587 Views
Cost-effective BC/DR with VMware Site Recovery Manager (SRM) and LeftHand Networks. Presented by Stephan Stelter, LeftHand Networks. Agenda. Introduction Definitions: Business Continuance, High Availability, Disaster Recovery, RPO, RTO
E N D
Cost-effective BC/DR with VMware Site Recovery Manager (SRM) and LeftHand Networks Presented by Stephan Stelter, LeftHand Networks
Agenda • Introduction • Definitions: Business Continuance, High Availability, Disaster Recovery, RPO, RTO • The impact of disasters and downtime; virtualization to the rescue! • Relevant LeftHand Networks Products and Features • Key features of LeftHand Networks SANs that provide BC/DR benefits to VMware environments • Customer Examples • How are LeftHand Networks customers using VMware for BC/DR?
Business Continuance, defined • “According to a recent Gartner Group document, a business continuance plan should include: • 1) a disaster recovery plan, which specifies an organization's planned strategies for post-failure procedures • 2) a business resumption plan, which specifies a means of maintaining essential services at the crisis location • 3) a business recovery plan, which specifies a means of recovering business functions at an alternate location • 4) and a contingency plan, which specifies a means of dealing with external events that can seriously impact the organization.” • – SearchStorage.com
High Availability, Disaster Recovery, RPO, RTO • High Availability - refers to a system or component that is continuously operational for a desirably long length of time • Disaster Recovery (plan) - describes how an organization is to deal with potential disasters; disaster recovery planning involves an analysis of business processes and continuity needs; it may also include a significant focus on disaster prevention • Recovery Point Objective - the age of files that must be recovered for normal operations to resume if a system goes down as a result of a failure • Recovery Time Objective - the maximum tolerable length of time that a computer, system, network, or application can be down after a failure or disaster occurs
The impact of disasters - do you have a plan? • Every year, one out of 500 data centers will experience a severe disaster(McGladrey and Pullen) • 43% of companies experiencing disasters never re-open, and 29% close within two years(McGladrey and Pullen) • 93% of business that lost their data center for 10 days went bankrupt within one year(National Archives & Records Administration)
When we speak with customers, many of them ask – How can I reduce recovery times from hours to minutes? How can I simply automate my disaster recovery plan? How can I affordably eliminate application downtime and prevent data loss? How can I test my disaster recovery plan quickly and easily?
Virtualization to the rescue? • Application server consolidation onto fewer physical servers exposes users to more application downtime in event of a hardware failure Traditional Servers Virtualized Servers One server failure, ALL applications goes down One server failure, one application goes down • Delivering high availability requires application and storage HA 9
Virtualization to the rescue! Distributed Resource Scheduler (DRS) VMotion Consolidated Backup (VCB) High Availability (HA) X
But wait, there’s more! Distributed Resource Scheduler (DRS) VMotion Consolidated Backup (VCB) High Availability (HA) X
VMware Site Recovery Manager Site Recovery Manager leverages VMware Infrastructure to deliver advanced disaster recovery management and automation • Simplifies and automates disaster recovery workflows: • Setup, testing, failover • Turns manual recovery runbooks into automated recovery plans • Provides central management of recovery plans from VirtualCenter • Works with VMware Infrastructure to make disaster recovery rapid, reliable, manageable, affordable
What is VMware Site Recovery Manager • Simplifies, coordinates, automates storage and application disaster recovery. • Simplifies set up and management of DR plans to lower DR cost. • Enables DR plan testing for storage and applications to ensure reliability. • Coordinates and automates storage and application failover for faster availability. Preparation Disaster Happens Set Up and Planning Testing Failover & Failback
Disaster Recovery Solution Production Disaster Recovery Site Recovery Manager VirtualCenter Site Recovery Manager Site Recovery Manager VirtualCenter Virtual Machines Virtual Machines Protected virtual machines VMware Infrastructure VMware Infrastructure Servers Servers Storage Storage LeftHand Remote Copy
How Site Recovery Manager works • Pre-program your DR plan • Test to ensure reliability • Disaster strikes! • Site failure is detected • Alert when heartbeat lost • Initiate failover • User confirmation of outage • Granular failover initiation • Manage replication failover • Break replication • Make replica visible to recovery hosts • Execute recovery process • Use pre-programmed plan • Provide visibility into progress Question: What RTO have we achieved? What RPO have we achieved?
LeftHand Networks, Inc. Leader in iSCSI SANs Pioneer in the IP SAN market, founded in 1999 Highly available, simple to manage, and “grow as needed” architecture Rapid market acceptance and growth More than 10,000 installations; over 3,000 customers Strategic VARs and resellers in North America and Europe Strategic industry partnerships Microsoft, VMware, Citrix
Typical Storage Array Architecture Scale-up Storage • Monolithic Array • Not scalable • Controller head Becomes bottleneck • Scales capacity only • Single point of failure • Forklift upgrades • Provisioning capacity tends to involve manipulating individual disks and RAID levels for each LUN or volume
The LeftHand Networks Difference Scale-everything architecture pairs redundant hardware with enterprise-class features
SAN/iQ Storage ClusteringTrue clustering brings reliability, performance, and ease of management Storage Cluster Aggregates all components for performance Data is load balanced across all nodes Predictable scalability Grow on Your Terms Non-disruptive scalability No forklift upgrades Scale everything Throttle Bandwidth Create Tiers of Storage Create a tiered environment for different performance requirements Online Volume Migration Simple Centralized Management Provisioning Monitoring Security SAS SATA Centralized Management Console
SAN/iQ Network RAIDIntegrates Synchronous Replication with Automated Failover and Failback Beyond Component Redundancy Protects data from array failure Synchronous Replication Configure on a per-volume basis Change RAID level on-the-fly High Availability Multiple disks, controllers, or arrays Zero disruption of data access Ensures “high availability” for data SAN/iQ Cluster A D A D A B A B C B C B C D C D
SAN/iQ Multi-site SANReal-time protection from site failure Protect Storage By: Rack Room Floor Building Site Keep Data Online During: Facility disruption Natural disaster Site maintenance SAN/iQ Cluster SAN/iQ Multi-site SAN A D A D A B A B C B C B C D C D Volumes Remain Online
SAN/iQ Remote CopyTime and space-efficient asynchronous replication for disaster recovery and backups Remote Copy Asynchronous Replication Per volume basis Scheduled or manual Thin provisioned Simple to Manage Bandwidth management Failover / Failback Wizard Recovery Server SAN 1 Vol_1 (Primary) SAN 2 Vol_1 (Remote) Vol_1 (Primary) 1:00 1:00 Baseline Copy 2:00 2:00 Incremental Copy 3:00 3:00 Incremental Copy
Full Featured Virtual SAN SAN/iQ within an ESX virtual machine Virtualizes an ESX server’s internal disk resources Significant storage footprint (up to 2TB) Only SAN appliance on VMware SAN/Storage HCL VSA VSA VSA Virtual SAN Appliance for VMwareESXHigh Availability for Server & Storage For Remote/Branch Offices • SAN/iQ cluster within ESX • Highly Available storage across multiple ESX systems • Shared storage for VMs • In the event of an ESX failure: • SAN/iQ keeps volume online • VMware HA will failover VMs
VSA as Remote Office / Branch Office Replication Client Cost effective DR solution Provide HA for stranded sites Replicate data with SAN/iQ Remote Copy to central data center VSA VSA VSA VSA SAN/iQ Cluster VSA Cluster VSA Cluster
LeftHand SAN Integration with Site Recovery Manager • Site Recovery Manager • Manages and monitors recovery plans • Tightly integrated with VirtualCenter VirtualCenter Site Recovery Manager • VMware Infrastructure • Requires ESX Server 3.0.2 or later • Requires VirtualCenter 2.5 or later Virtual Machines • LeftHand iSCSI SAN Storage • On VMware SAN/Storage Compatibility Guide VMware Infrastructure Servers • LeftHand Remote Copy • Storage Replication Adapter certified by VMware One of First Vendors With Certified Adapter Storage LeftHand Remote Copy
University of Maryland School of MedicineHA/DR Project • The fifth oldest medical school in the United States • Established in 1807 • On the University of Maryland, Baltimore campus, the School of Medicine • Serves as the foundation for a large academic health center that combines medical education, biomedical research, patient care and community service. • Recognized technology leadership within the University of Maryland • Adoption of Server and Storage Virtualization • The Challenge – Provide high availability & effective disaster recovery across geographically separated data centers
SAN/iQ Multi-Site SAN and VMware ESX Cluster VMware ESX HA Cluster SAN/iQ Cluster is configured with equal storage in each site SAN/iQ Network RAID replicates data between sites synchronously ESX cluster is configured with equal hosts in each site In the event of a site failure SAN/iQ keeps volumes available ESX High Availability boots up virtual machines lost at the failed site When the failed site comes back online ESX rebalances virtual machines (DRS) D B C A F E SAN/iQ Cluster A A E E F F B B C C D D Virtual Volume / LUN 6 Blocks
The Result: Reduced Unexpected Downtime From Hours To Seconds! • “Our solution combined the VMware HA feature with LeftHand’s Multi-Site SAN capability that synchronously replicates data between multiple sites.” says Jimmy Reid. “As a result, when we had a power outage affect one of our sites, the combined solution detected a failure within 15 seconds and restarted the virtual machines within a minute—as opposed to the several hours that would be needed for an administrator to physically go to the site and bring the servers online.”
Charlotte CountyServer and Storage Project • Project goals • Cost effective server and storage solution • Reduce physical server sprawl • Reduce operational expense requirements • Scalable • Survivable
Charlotte County IT Environment • 900 Windows XP workstations (PC’s and laptops) • 60 Microsoft Standard and Enterprise 2000 and 2003 servers • HP DL320 1U servers • IBM LS41 AMD Opteron blade servers • Applications • Exchange 2003, SQL 2000 SP4, SQL 2005 • File servers housing dept. shared data, user home directories and misc flat file print servers • VMware & LeftHand • VMware Infrastructure 3 with VMotion, DRS, HA • 54TB of LeftHand iSCSI SAN storage
Charlotte County Data Centers Administration Building Public Safety Building 27 km Two tiers of storage needed in each site Need both sites operational if link fails Need RPO of zero if site disaster occurs Single Mode Fiber 10Gb Ethernet
Murdock Administration Building Public Safety Building 10 GB Link Failover Manager Failover Manager ESX Cluster ESX Cluster Murdock Bldg iSCSI SAN Cluster 1 1 SAS SAS 2 SATA SATA 4 iSCSI SAN Storage Clusters Murdock Bldg iSCSI SAN Cluster 2 Public Safety Bldg iSCSI SAN Cluster 1 1 SAS SAS 2 SATA SATA Public Safety Bldg iSCSI SAN Cluster 2
Current Results and Future Plans • Current Results • Migrated approximately 1300 Exchange mailboxes to new VMware based Exchange servers connected to LeftHand iSCSI SAN (SAS based) • 12 – 15 Virtual Machines SQL 2005, Exchange, Flat File Servers • Tested fiber cut scenario, worked flawlessly • Future Plans • Continue migrating all physical servers to virtual servers attached to LeftHand iSCSI SAN • Next phase will include dept. shared data and user’s home directory data migrated to LeftHand iSCSI SAN (SATA based)
Benefits of VMware and LeftHand Networks for Business Continuance High Availability LeftHand’s Network RAID combined with VMware HA delivers superior high availability Simple to deploy and manage Disaster Recovery Site Recovery Manager and LeftHand SANs Certified solution Simple Setup and Management Fast, Automated Recovery Easy Disaster Recovery Tests