1.28k likes | 1.54k Views
page 2. hp enterprise technical symposium. filename.ppt . Topics. TerminologyTechnologyReal-world examples. page 3. hp enterprise technical symposium. filename.ppt . High Availability (HA). Ability for application processing to continue with high probability in the face of common (mostly hardware) failuresTypical technologies:Redundant power supplies and fansRAID for disksClusters of serversMultiple NICs, redundant routersFacilities: Dual power feeds, n 1 air conditioning units33532
E N D
1. page 1 hp enterprise users week .ppt Disaster-Tolerant Cluster Technology & Implementation
Keith Parris
Systems/Software EngineerHP Services Multivendor Systems Engineering
Session P151
Wednesday, 20 May 2003
2. page 2 hp enterprise technical symposium filename.ppt Topics Terminology
Technology
Real-world examples Well first clarify some of the terminology used in the industry, and what it means. Then well look at the technology that allows one to implement a disaster-tolerant cluster solution. Finally, Ill tell you some stories from real experiences that illustrate some key concepts.Well first clarify some of the terminology used in the industry, and what it means. Then well look at the technology that allows one to implement a disaster-tolerant cluster solution. Finally, Ill tell you some stories from real experiences that illustrate some key concepts.
3. page 3 hp enterprise technical symposium filename.ppt High Availability (HA) Ability for application processing to continue with high probability in the face of common (mostly hardware) failures
Typical technologies:
Redundant power supplies and fans
RAID for disks
Clusters of servers
Multiple NICs, redundant routers
Facilities: Dual power feeds, n+1 air conditioning units, UPS, generator Here we define High AvailabilityHere we define High Availability
4. page 4 hp enterprise technical symposium filename.ppt Fault Tolerance (FT) The ability for a computer system to continue operating despite hardware and/or software failures
Typically requires:
Special hardware with full redundancy, error-checking, and hot-swap support
Special software
Provides the highest availability possible within a single datacenter Fault Tolerant products go beyond the usual High Availability products. HP NonStop products fall into the Fault Tolerant category.
No matter how good a High Availability or Fault Tolerant solution you have, if the datacenter it resides in is destroyed, it can no longer support your business.
Fault Tolerant products go beyond the usual High Availability products. HP NonStop products fall into the Fault Tolerant category.
No matter how good a High Availability or Fault Tolerant solution you have, if the datacenter it resides in is destroyed, it can no longer support your business.
5. page 5 hp enterprise technical symposium filename.ppt Disaster Recovery (DR) Disaster Recovery is the ability to resume operations after a disaster
Disaster could be destruction of the entire datacenter site and everything in it
Implies off-site data storage of some sort Disaster Recovery is a fairly well-understood concept in the industry, and most companies plan for DR needs. Key here is storing a copy of business data in a safe location, off-site.Disaster Recovery is a fairly well-understood concept in the industry, and most companies plan for DR needs. Key here is storing a copy of business data in a safe location, off-site.
6. page 6 hp enterprise technical symposium filename.ppt Disaster Recovery (DR) Typically,
There is some delay before operations can continue (many hours, possibly days), and
Some transaction data may have been lost from IT systems and must be re-entered In a typical DR scenario, significant time elapses before one can resume IT functions, and some amount of data typically needs to be re-entered to bring the system data back up to date.In a typical DR scenario, significant time elapses before one can resume IT functions, and some amount of data typically needs to be re-entered to bring the system data back up to date.
7. page 7 hp enterprise technical symposium filename.ppt Disaster Recovery (DR) Success hinges on ability to restore, replace, or re-create:
Data (and external data feeds)
Facilities
Systems
Networks
User access The key to a successful recovery is to be able to restore your data onto suitable computing hardware in a suitable facility, with access by the users.The key to a successful recovery is to be able to restore your data onto suitable computing hardware in a suitable facility, with access by the users.
8. page 8 hp enterprise technical symposium filename.ppt DR Methods:Tape Backup Data is copied to tape, with off-site storage at a remote site
Very-common method. Inexpensive.
Data lost in a disaster is: all the changes since the last tape backup that is safely located off-site
There may be significant delay before data can actually be used Tape backups are the most common method companies use for DR. Cost is low. Some care is needed to ensure the backup tapes actually get to an off-site storage location, and in a timely fashion.
Some amount of data will be lost in a disaster, so some data re-entry is required. This assumes that some other record (i.e. paper) of all transactions exists, from which the data can be re-entered.
It often takes multiple days to get back up and running after a disaster.Tape backups are the most common method companies use for DR. Cost is low. Some care is needed to ensure the backup tapes actually get to an off-site storage location, and in a timely fashion.
Some amount of data will be lost in a disaster, so some data re-entry is required. This assumes that some other record (i.e. paper) of all transactions exists, from which the data can be re-entered.
It often takes multiple days to get back up and running after a disaster.
9. page 9 hp enterprise technical symposium filename.ppt DR Methods:Vendor Recovery Site Vendor provides datacenter space, compatible hardware, networking, and sometimes user work areas as well
When a disaster is declared, systems are configured and data is restored to them
Typically there are hours to days of delay before data can actually be used Here, you contract with a vendor such as SunGard to have the right to use their facilities and equipment in the event that you suffer a disaster.
Their business is based on the assumption that not everyone will have a disaster at once, and they need only procure enough hardware to accommodate a few disasters at once. Out of that pool of hardware, they will start connecting together a suitable configuration immediately when you declare a disaster.
You must formally declare a disaster to invoke your right to use their facilities. At that point, they typically connect together the equipment you need, and give you access to use it.
These vendors also typically allow you to schedule a disaster recovery test at some interval, such as once a year.Here, you contract with a vendor such as SunGard to have the right to use their facilities and equipment in the event that you suffer a disaster.
Their business is based on the assumption that not everyone will have a disaster at once, and they need only procure enough hardware to accommodate a few disasters at once. Out of that pool of hardware, they will start connecting together a suitable configuration immediately when you declare a disaster.
You must formally declare a disaster to invoke your right to use their facilities. At that point, they typically connect together the equipment you need, and give you access to use it.
These vendors also typically allow you to schedule a disaster recovery test at some interval, such as once a year.
10. page 10 hp enterprise technical symposium filename.ppt DR Methods:Data Vaulting Copy of data is saved at a remote site
Periodically or continuously, via network
Remote site may be own site or at a vendor location
Minimal or no data may be lost in a disaster
There is typically some delay before data can actually be used You can copy data periodically or continuously to another site so that theres always a copy at another site. This might involve copying data to tape drives at a remote site, or to disks.
The remote site might be at a second site that you own, or might be provided by a vendor.
In order to actually use that data, it would need to be restored to a suitable hardware configuration with network and user access and so forth.
Some computer vendors have a program that allows you to get an emergency order for replacement hardware shipped out quickly in the event of a disaster. DEC and Compaq had the Recover-All program, and HP has the Quick Ship program. This can help reduce the time needed to replace hardware destroyed by a disaster.You can copy data periodically or continuously to another site so that theres always a copy at another site. This might involve copying data to tape drives at a remote site, or to disks.
The remote site might be at a second site that you own, or might be provided by a vendor.
In order to actually use that data, it would need to be restored to a suitable hardware configuration with network and user access and so forth.
Some computer vendors have a program that allows you to get an emergency order for replacement hardware shipped out quickly in the event of a disaster. DEC and Compaq had the Recover-All program, and HP has the Quick Ship program. This can help reduce the time needed to replace hardware destroyed by a disaster.
11. page 11 hp enterprise technical symposium filename.ppt DR Methods:Hot Site Company itself (or a vendor) provides pre-configured compatible hardware, networking, and datacenter space
Systems are pre-configured, ready to go
Data may already resident be at the Hot Site thanks to Data Vaulting
Typically there are minutes to hours of delay before data can be used Because the time required to configure (or even procure) suitable system hardware to access the data can be so great, some businesses choose to have a hardware configuration in place and ready to take over in the event of a disaster. This may be at another site the business owns, or at a vendors location, and the equipment may either belong to the business, or be provided by the vendor under the terms of its contract.Because the time required to configure (or even procure) suitable system hardware to access the data can be so great, some businesses choose to have a hardware configuration in place and ready to take over in the event of a disaster. This may be at another site the business owns, or at a vendors location, and the equipment may either belong to the business, or be provided by the vendor under the terms of its contract.
12. page 12 hp enterprise technical symposium filename.ppt Disaster Tolerance vs.Disaster Recovery Disaster Recovery is the ability to resume operations after a disaster.
Disaster Tolerance is the ability to continue operations uninterrupted despite a disaster The term Disaster Tolerance is one which is sometimes used incorrectly in the industry, particularly by vendors who cant really achieve it. Disaster Recovery is a well-understood term, but DisasterTolerance is a newer term and often mis-used.
Disaster Recovery is the art and science of preparing so that one is able to recover after a disaster and once again be able to operate the business.
Disaster Tolerance is much harder, as it involves designing things such that business can continue in the face of a disaster, without the end-user or customer noticing any adverse effect.
The effects of global business across timezones, 24x7x365 operations, e-Commerce, and the risks of the modern world are pressuring more and more businesses to achieve the level of Disaster Tolerance in their business operations. Well be focusing mostly on Disaster Tolerance in the IT field, but the same principles apply to the business in general.The term Disaster Tolerance is one which is sometimes used incorrectly in the industry, particularly by vendors who cant really achieve it. Disaster Recovery is a well-understood term, but DisasterTolerance is a newer term and often mis-used.
Disaster Recovery is the art and science of preparing so that one is able to recover after a disaster and once again be able to operate the business.
Disaster Tolerance is much harder, as it involves designing things such that business can continue in the face of a disaster, without the end-user or customer noticing any adverse effect.
The effects of global business across timezones, 24x7x365 operations, e-Commerce, and the risks of the modern world are pressuring more and more businesses to achieve the level of Disaster Tolerance in their business operations. Well be focusing mostly on Disaster Tolerance in the IT field, but the same principles apply to the business in general.
13. page 13 hp enterprise technical symposium filename.ppt Disaster Tolerance Ideally, Disaster Tolerance allows one to continue operations uninterrupted despite a disaster:
Without any appreciable delays
Without any lost transaction data The ideal DT solution would result in no downtime and no lost data, even in a disaster. Such solutions do exist, but they cost more than solutions which have some amount of downtime or data loss associated with a disaster.The ideal DT solution would result in no downtime and no lost data, even in a disaster. Such solutions do exist, but they cost more than solutions which have some amount of downtime or data loss associated with a disaster.
14. page 14 hp enterprise technical symposium filename.ppt Disaster Tolerance Businesses vary in their requirements with respect to:
Acceptable recovery time
Allowable data loss
Technologies also vary in their ability to achieve the ideals of no data loss and zero recovery time Different businesses have different levels of risk with regard to loss of data and to potential downtime.
Different technical solutions also provide different levels of capability with respect to these business needs.
The ideal solution would have no downtime and allow no data to be lost. Such solutions exist, but are more expensive, so their cost must be weighed against the potential impact to the business of a disaster and its after-effects.Different businesses have different levels of risk with regard to loss of data and to potential downtime.
Different technical solutions also provide different levels of capability with respect to these business needs.
The ideal solution would have no downtime and allow no data to be lost. Such solutions exist, but are more expensive, so their cost must be weighed against the potential impact to the business of a disaster and its after-effects.
15. page 15 hp enterprise technical symposium filename.ppt Measuring Disaster Tolerance and Disaster Recovery Needs Determine requirements based on business needs first
Then find acceptable technologies to meet the needs of the business It is important to understand the financial costs and risks to the business itself with regard to downtime and potential data loss. Once these are understood, the best available solution can be chosen.It is important to understand the financial costs and risks to the business itself with regard to downtime and potential data loss. Once these are understood, the best available solution can be chosen.
16. page 16 hp enterprise technical symposium filename.ppt Measuring Disaster Tolerance and Disaster Recovery Needs Commonly-used metrics:
Recovery Point Objective (RPO):
Amount of data loss that is acceptable, if any
Recovery Time Objective (RTO):
Amount of downtime that is acceptable, if any Recovery Point Objective and Recovery Time Objective are two metrics commonly used in the industry to measure the goals of a DR or DT solution.Recovery Point Objective and Recovery Time Objective are two metrics commonly used in the industry to measure the goals of a DR or DT solution.
17. page 17 hp enterprise technical symposium filename.ppt Disaster Tolerance vs.Disaster Recovery Definitions among vendors vary, but in general, Disaster Tolerance is the subset of possible solutions in the overall Disaster Recovery space that minimize RPO (data loss) and RTO (downtime).Definitions among vendors vary, but in general, Disaster Tolerance is the subset of possible solutions in the overall Disaster Recovery space that minimize RPO (data loss) and RTO (downtime).
18. page 18 hp enterprise technical symposium filename.ppt Recovery Point Objective (RPO) Recovery Point Objective is measured in terms of time
RPO indicates the point in time to which one is able to recover the data after a failure, relative to the time of the failure itself
RPO effectively quantifies the amount of data loss permissible before the business is adversely affected RPO is a metric used to quantify how much data loss is acceptable without adversely affecting the business.RPO is a metric used to quantify how much data loss is acceptable without adversely affecting the business.
19. page 19 hp enterprise technical symposium filename.ppt Recovery Time Objective (RTO) Recovery Time Objective is also measured in terms of time
Measures downtime:
from time of disaster until business can continue
Downtime costs vary with the nature of the business, and with outage length RTO is the metric used to quantify how much downtime can occur without adversely affecting the business.RTO is the metric used to quantify how much downtime can occur without adversely affecting the business.
20. page 20 hp enterprise technical symposium filename.ppt Examples of Business Requirements and RPO / RTO Greeting card manufacturer
RPO zero; RTO 3 days
Online stock brokerage
RPO zero; RTO seconds
Lottery
RPO zero; RTO minutes A greeting card manufacturer I once spoke with was very concerned that they never, ever lose an order from one of their stores. That would be too great a customer satisfaction problem. But since stores typically order only about once a week, it wouldnt really hurt, they told me, if a store couldnt place an order, even for several days, as long as the order was reliably delivered once it had been placed.
An online stock brokerage cannot ever lose an order in conjunction with an outage, because then they would be subject to massive risks of fraud, from customers who might claim, for example, that they had placed a large sell order just before the price on a stock dropped dramatically, or placed a large buy order just before the price jumped astronomically. Also, the brokerage has to cover the financial risk during downtime of orders placed before an outage, and that risk is related to the price movement of the stocks. Potential losses of millions of dollars per hour of downtime are not unusual in that industry.
A lottery absolutely cannot lose transactions for the same reason of potential fraud. Also, if tickets cannot be sold due to downtime, the potential revenue loss builds rapidly.A greeting card manufacturer I once spoke with was very concerned that they never, ever lose an order from one of their stores. That would be too great a customer satisfaction problem. But since stores typically order only about once a week, it wouldnt really hurt, they told me, if a store couldnt place an order, even for several days, as long as the order was reliably delivered once it had been placed.
An online stock brokerage cannot ever lose an order in conjunction with an outage, because then they would be subject to massive risks of fraud, from customers who might claim, for example, that they had placed a large sell order just before the price on a stock dropped dramatically, or placed a large buy order just before the price jumped astronomically. Also, the brokerage has to cover the financial risk during downtime of orders placed before an outage, and that risk is related to the price movement of the stocks. Potential losses of millions of dollars per hour of downtime are not unusual in that industry.
A lottery absolutely cannot lose transactions for the same reason of potential fraud. Also, if tickets cannot be sold due to downtime, the potential revenue loss builds rapidly.
21. page 21 hp enterprise technical symposium filename.ppt Downtime Cost Varies with Outage Length The potential cost of downtime actually varies with the length of the outage. A 2-hour outage will not necessarily cost twice what a 1-hour outage costs. This graph tries to show this non-linear relationship. For this particular hypothetical business, a 1-minute outage has no associated cost with it its customers basically dont even notice an outage of that short a duration. An outage of 1 hour results in some small amount of lost revenue, but the impact is still not severe. An outage of 1 day for this business, however, starts to result in significant losses, and for an outage of 1 week, the cost is so great (with many customers losing confidence and permanently moving their business to another vendor as a result) that the business itself would be at risk of totally failing.The potential cost of downtime actually varies with the length of the outage. A 2-hour outage will not necessarily cost twice what a 1-hour outage costs. This graph tries to show this non-linear relationship. For this particular hypothetical business, a 1-minute outage has no associated cost with it its customers basically dont even notice an outage of that short a duration. An outage of 1 hour results in some small amount of lost revenue, but the impact is still not severe. An outage of 1 day for this business, however, starts to result in significant losses, and for an outage of 1 week, the cost is so great (with many customers losing confidence and permanently moving their business to another vendor as a result) that the business itself would be at risk of totally failing.
22. page 22 hp enterprise technical symposium filename.ppt Examples of Business Requirements and RPO / RTO ATM machine
RPO minutes; RTO minutes
Semiconductor fabrication plant
RPO zero; RTO minutes; but data protection by geographical separation not needed Surprisingly, an ATM network can afford a small amount of data loss. This is because the financial transactions are relatively small (typically $500 tops), so the financial risk is relatively low, and the ATM machine itself keeps a log of transactions that can be used to resolve discrepancies. But if an ATM network is down for too long, revenue for transaction fees starts to be lost.
The computers that operate a semiconductor fabrication plant keep track of which processing steps have been applied to each silicon wafer. Without this knowledge, a wafer becomes worthless as sand. Excessive downtime keeps production from moving ahead. Interestingly, these businesses dont worry about keeping a copy of data at a safe distance from the fabrication plant, because if a disaster affects the fabrication plant along with the datacenter, production can be shifted to another fab. Some separation is desirable to protect against things like building fires, but for a fabrication plant along an earthquake fault line, it does not help to protect the data on in-progress work by putting a copy a long distance away when the fabrication plant itself is likely to be damaged by the same earthquake that takes out the datacenter.Surprisingly, an ATM network can afford a small amount of data loss. This is because the financial transactions are relatively small (typically $500 tops), so the financial risk is relatively low, and the ATM machine itself keeps a log of transactions that can be used to resolve discrepancies. But if an ATM network is down for too long, revenue for transaction fees starts to be lost.
The computers that operate a semiconductor fabrication plant keep track of which processing steps have been applied to each silicon wafer. Without this knowledge, a wafer becomes worthless as sand. Excessive downtime keeps production from moving ahead. Interestingly, these businesses dont worry about keeping a copy of data at a safe distance from the fabrication plant, because if a disaster affects the fabrication plant along with the datacenter, production can be shifted to another fab. Some separation is desirable to protect against things like building fires, but for a fabrication plant along an earthquake fault line, it does not help to protect the data on in-progress work by putting a copy a long distance away when the fabrication plant itself is likely to be damaged by the same earthquake that takes out the datacenter.
23. page 23 hp enterprise technical symposium filename.ppt Recovery Point Objective (RPO) RPO examples, and technologies to meet them:
RPO of 24 hours: Backups at midnight every night to off-site tape drive, and recovery is to restore data from set of last backup tapes
RPO of 1 hour: Ship database logs hourly to remote site; recover database to point of last log shipment
RPO of zero: Mirror data strictly synchronously to remote site Nightly tape backups can provide an RPO of a maximum of 24 hours, provided tapes are whisked off-site immediately after being written, or the tape drives are located in a different site from the data disks.
Database logs copied every hour to a remote site can provide an RPO of 1 hour at most.
To achieve zero data loss, data must be written to both sites simultaneously, and a transaction must not be allowed to complete until the data has safely landed at both sites (not awaiting transport between sites or still in transit).Nightly tape backups can provide an RPO of a maximum of 24 hours, provided tapes are whisked off-site immediately after being written, or the tape drives are located in a different site from the data disks.
Database logs copied every hour to a remote site can provide an RPO of 1 hour at most.
To achieve zero data loss, data must be written to both sites simultaneously, and a transaction must not be allowed to complete until the data has safely landed at both sites (not awaiting transport between sites or still in transit).
24. page 24 hp enterprise technical symposium filename.ppt Recovery Time Objective (RTO) RTO examples, and technologies to meet them:
RTO of 72 hours: Restore tapes to configure-to-order systems at vendor DR site
RTO of 12 hours: Restore tapes to system at hot site with systems already in place
RTO of 4 hours: Data vaulting to hot site with systems already in place
RTO of 1 hour: Disaster-tolerant cluster with controller-based cross-site disk mirroring
RTO of seconds: Disaster-tolerant cluster with bi-directional mirroring, CFS, and DLM allowing applications to run at both sites simultaneously Restoration of backup tapes to systems assembled and configured by a vendor at their DR site after declaration of a disaster could meet an RTO requirement of 3 days (72 hours).
To meet an RTO of 12 hours, systems would probably have to already be in place at another site, perhaps a vendor hot site.
To meet an RTO requirement of 4 hours, one would not really have time to restore data from tapes. The data would need to be on disk already at another site and connected to servers. Failover of users to these in-place systems could most likely be accomplished in 4 hours, with proper pre-planning and appropriate network equipment and fail-over procedures in place and pre-tested.
To meet an RTO of 1 hour would most likely require a disaster-tolerant cluster. One could use controller-based mirroring instead of host-based mirroring, because failover from one site to another can be accomplished in somewhere between 15 minutes and 1 hour, depending on the solution.
To meet an RTO of mere seconds would require a full-blown disaster-tolerant cluster with the ability to continually run the applications on all nodes at both sites, so that one is not so much doing failover as simply redirecting clients to an already-running copy of the application at the surviving site.Restoration of backup tapes to systems assembled and configured by a vendor at their DR site after declaration of a disaster could meet an RTO requirement of 3 days (72 hours).
To meet an RTO of 12 hours, systems would probably have to already be in place at another site, perhaps a vendor hot site.
To meet an RTO requirement of 4 hours, one would not really have time to restore data from tapes. The data would need to be on disk already at another site and connected to servers. Failover of users to these in-place systems could most likely be accomplished in 4 hours, with proper pre-planning and appropriate network equipment and fail-over procedures in place and pre-tested.
To meet an RTO of 1 hour would most likely require a disaster-tolerant cluster. One could use controller-based mirroring instead of host-based mirroring, because failover from one site to another can be accomplished in somewhere between 15 minutes and 1 hour, depending on the solution.
To meet an RTO of mere seconds would require a full-blown disaster-tolerant cluster with the ability to continually run the applications on all nodes at both sites, so that one is not so much doing failover as simply redirecting clients to an already-running copy of the application at the surviving site.
25. page 25 hp enterprise technical symposium filename.ppt Technologies Clustering
Inter-site links
Foundation and Core Requirements for Disaster Tolerance
Data replication schemes
Quorum schemes Next, well look at the underlying technologies used to implement a disaster-tolerant solution, including clustering, inter-site link(s), the fundamentals required for DT, data replication, and quorum schemes.Next, well look at the underlying technologies used to implement a disaster-tolerant solution, including clustering, inter-site link(s), the fundamentals required for DT, data replication, and quorum schemes.
26. page 26 hp enterprise technical symposium filename.ppt Clustering Allows a set of individual computer systems to be used together in some coordinated fashion
27. page 27 hp enterprise technical symposium filename.ppt Cluster types Different types of clusters meet different needs:
Scalability clusters allow multiple nodes to work on different portions of a sub-dividable problem
Workstation farms, compute clusters, Beowulf clusters
High Availability clusters allow one node to take over application processing if another node fails For DT, were concerned with High Availability clusters rather than Scalability clusters.For DT, were concerned with High Availability clusters rather than Scalability clusters.
28. page 28 hp enterprise technical symposium filename.ppt High Availability Clusters Transparency of failover and degrees of resource sharing differ:
Shared-Nothing clusters
Shared-Storage clusters
Shared-Everything clusters Cluster implementations vary in their degrees of resource sharing. Cluster solutions with greater ability to share (and coordinate access to) shared resources can typically provide higher availability and better disaster-tolerance.Cluster implementations vary in their degrees of resource sharing. Cluster solutions with greater ability to share (and coordinate access to) shared resources can typically provide higher availability and better disaster-tolerance.
29. page 29 hp enterprise technical symposium filename.ppt Shared-Nothing Clusters Data is partitioned among nodes
No coordination is needed between nodes In Shared-Nothing clusters, data is typically divided up across separate nodes. The nodes have little need to coordinate their activity, since each has responsibility for different subsets of the overall database. But in strict Shared-Nothing clusters, if one node is down, its fraction of the overall data is unavailable.In Shared-Nothing clusters, data is typically divided up across separate nodes. The nodes have little need to coordinate their activity, since each has responsibility for different subsets of the overall database. But in strict Shared-Nothing clusters, if one node is down, its fraction of the overall data is unavailable.
30. page 30 hp enterprise technical symposium filename.ppt Shared-Storage Clusters In simple Fail-over clusters, one node runs an application and updates the data; another node stands idly by until needed, then takes over completely
In more-sophisticated clusters, multiple nodes may access data, but typically one node at a time serves a file system to the rest of the nodes, and performs all coordination for that file system For high availability to be provided, some shared access to the data disks is needed. In Shared-Storage clusters, at the low end, one gets the ability for one node to take over the storage (and applications) in the event another node fails. In higher-level solutions, simultaneous access to data by applications running on more than one node at a time is possible, but typically a single node at a time is responsible for coordinating all access to a given data disk, and serving that storage to the rest of the nodes. That single node can become a bottleneck for access to the data in busier configurations.For high availability to be provided, some shared access to the data disks is needed. In Shared-Storage clusters, at the low end, one gets the ability for one node to take over the storage (and applications) in the event another node fails. In higher-level solutions, simultaneous access to data by applications running on more than one node at a time is possible, but typically a single node at a time is responsible for coordinating all access to a given data disk, and serving that storage to the rest of the nodes. That single node can become a bottleneck for access to the data in busier configurations.
31. page 31 hp enterprise technical symposium filename.ppt Shared-Everything Clusters Shared-Everything clusters allow any application to run on any node or nodes
Disks are accessible to all nodes under a Cluster File System
File sharing and data updates are coordinated by a Lock Manager In a Shared-Everything cluster, all nodes can access all the data disks simultaneously, without one node having to go through another node to get access to the data. Clusters which provide this provide a cluster-wide file system (CFS), so all the nodes view the file system(s) identically, and typically provide a Distributed Lock Manager (DLM) to allow the nodes to coordinate the sharing and updating of files, records, and databases.
Some application software, such as Oracle Parallel Server or Oracle 9i/RAC, provide the appearance of a shared-everything cluster for the users of their specific application (in this case, the Oracle database), even when running in a less-sophisticated Shared-Storage cluster, because they provide capabilities analogous to the CFS and DLM of a shared-everything cluster.In a Shared-Everything cluster, all nodes can access all the data disks simultaneously, without one node having to go through another node to get access to the data. Clusters which provide this provide a cluster-wide file system (CFS), so all the nodes view the file system(s) identically, and typically provide a Distributed Lock Manager (DLM) to allow the nodes to coordinate the sharing and updating of files, records, and databases.
Some application software, such as Oracle Parallel Server or Oracle 9i/RAC, provide the appearance of a shared-everything cluster for the users of their specific application (in this case, the Oracle database), even when running in a less-sophisticated Shared-Storage cluster, because they provide capabilities analogous to the CFS and DLM of a shared-everything cluster.
32. page 32 hp enterprise technical symposium filename.ppt Cluster File System Allows multiple nodes in a cluster to access data in a shared file system simultaneously
View of file system is the same from any node in the cluster A Clusterwide File System provides the same view of disk data from every node in the cluster. This means the environment on each node can appear identical to both the users and the application programs, so it doesnt matter on which node the application or user happens to be running at any given time.
HP TruClusters and OpenVMS Clusters provide a CFS, and HP-UX is slated to gain the CFS from TruClusters by 2004.
A Clusterwide File System provides the same view of disk data from every node in the cluster. This means the environment on each node can appear identical to both the users and the application programs, so it doesnt matter on which node the application or user happens to be running at any given time.
HP TruClusters and OpenVMS Clusters provide a CFS, and HP-UX is slated to gain the CFS from TruClusters by 2004.
33. page 33 hp enterprise technical symposium filename.ppt Distributed Lock Manager Allows systems in a cluster to coordinate their access to shared resources, such as:
Mass-storage devices (disks, tape drives)
File systems
Files
Database tables The Distributed Lock Manager is used to coordinate access to shared resources in a cluster.
HP TruClusters and OpenVMS Clusters provide a DLM, and HP-UX is slated to gain the DLM from TruClusters by 2004.The Distributed Lock Manager is used to coordinate access to shared resources in a cluster.
HP TruClusters and OpenVMS Clusters provide a DLM, and HP-UX is slated to gain the DLM from TruClusters by 2004.
34. page 34 hp enterprise technical symposium filename.ppt Multi-Site Clusters Consist of multiple sites with one or more systems, in different locations
Systems at each site are all part of the same cluster
Sites are typically connected by bridges (or bridge-routers; pure routers dont pass the special cluster protocol traffic required for many clusters) Clusters can consist of nodes in multiple sites.
MC/Service Guard requires that the heartbeat between cluster nodes be passed within the same subnet, and cannot cross a router. OpenVMS Clusters use the SCS protocol for intra-node communications, which also cannot be routed. So bridges (or bridge-routers with bridging enabled) would typically be used to connect multiple-site clusters with these products.Clusters can consist of nodes in multiple sites.
MC/Service Guard requires that the heartbeat between cluster nodes be passed within the same subnet, and cannot cross a router. OpenVMS Clusters use the SCS protocol for intra-node communications, which also cannot be routed. So bridges (or bridge-routers with bridging enabled) would typically be used to connect multiple-site clusters with these products.
35. page 35 hp enterprise technical symposium filename.ppt Multi-Site Clusters:Inter-site Link(s) Sites linked by:
E3 (DS-3/T3) or ATM circuits from a TelCo
Microwave link: E3 (DS-3/T3) or Ethernet
Free-Space Optics link (short distance, low cost)
Dark fiber where available:
Ethernet over fiber (10 mb, Fast, Gigabit)
FDDI
Fibre Channel
Wave Division Multiplexing (WDM) or Dense Wave Division Multiplexing (DWDM) Various technologies are available to link multi-site clusters.
Data circuits at various speeds and costs are available from telecommunications providers such as AT&T, Sprint, MCI, and the regional Bell Operating Companies.
If line-of-sight is available, a private microwave link may be possible. If the distance is short (say no more than a mile or so) then a free-space optical link, which uses lasers or LEDs, may be possible, and is very inexpensive and requires no FCC license like a microwave link would.
Dark fiber can be very cost-effective if it is available in your area. Vendors such as Metro Media Fiber Networks (mmfn.com) or Nortel can provide dark fiber. Some vendors also provide channels within WDM links as an alternative to entire dark fiber pairs.
WDM typically allows 16 channels over the same fibe, while DWDM can allow as many as 64 or 80 channels over a single fiber.Various technologies are available to link multi-site clusters.
Data circuits at various speeds and costs are available from telecommunications providers such as AT&T, Sprint, MCI, and the regional Bell Operating Companies.
If line-of-sight is available, a private microwave link may be possible. If the distance is short (say no more than a mile or so) then a free-space optical link, which uses lasers or LEDs, may be possible, and is very inexpensive and requires no FCC license like a microwave link would.
Dark fiber can be very cost-effective if it is available in your area. Vendors such as Metro Media Fiber Networks (mmfn.com) or Nortel can provide dark fiber. Some vendors also provide channels within WDM links as an alternative to entire dark fiber pairs.
WDM typically allows 16 channels over the same fibe, while DWDM can allow as many as 64 or 80 channels over a single fiber.
36. page 36 hp enterprise technical symposium filename.ppt Bandwidth of Inter-Site Link(s) Link bandwidth:
E3: 34 Mb/sec (or DS-3/T3 at 45 Mb/sec)
ATM: Typically 155 or 622 Mb/sec
Ethernet: Fast (100 Mb/sec) or Gigabit (1 Gb/sec)
Fibre Channel: 1 or 2 Gb/sec
[D]WDM: Multiples of ATM, GbE, FC, etc. Link technologies vary in cost and bandwidth.Link technologies vary in cost and bandwidth.
37. page 37 hp enterprise technical symposium filename.ppt Inter-Site Link Choices Service type choices
Telco-provided data circuit service, own microwave link, FSO link, dark fiber?
Dedicated bandwidth, or shared pipe?
Single or multiple (redundant) links? If multiple links, then:
Diverse paths?
Multiple vendors? Many questions must be answered in selecting an inter-site link.
Using a single vendor can cut costs, but a single failure in that vendors network could take out both links (there was a well-publicized failure of one vendors entire X.25 network not long ago after they upgraded the software on all their switches).
Even when multiple vendors are used, one must verify that redundant links actually follow separate paths. Both vendors might sub-contract at least a portion of the link (typically the links to the actual sites at both ends) to the same RBOC, potentially introducing a single point of failure.Many questions must be answered in selecting an inter-site link.
Using a single vendor can cut costs, but a single failure in that vendors network could take out both links (there was a well-publicized failure of one vendors entire X.25 network not long ago after they upgraded the software on all their switches).
Even when multiple vendors are used, one must verify that redundant links actually follow separate paths. Both vendors might sub-contract at least a portion of the link (typically the links to the actual sites at both ends) to the same RBOC, potentially introducing a single point of failure.
38. page 38 hp enterprise technical symposium filename.ppt Dark Fiber Availability Example This map is the fiber-optic coverage from MetroMedia Fiber Networks (http://mmfn.com/ for AmsterdamThis map is the fiber-optic coverage from MetroMedia Fiber Networks (http://mmfn.com/ for Amsterdam
39. page 39 hp enterprise technical symposium filename.ppt Disaster-Tolerant Clusters:Foundation Goal: Survive loss of up to one entire datacenter
Foundation:
Two or more datacenters a safe distance apart
Cluster software for coordination
Inter-site link for cluster interconnect
Data replication of some sort for 2 or more identical copies of data, one at each site
40. page 40 hp enterprise technical symposium filename.ppt Disaster-Tolerant Clusters Foundation:
Management and monitoring tools
Remote system console access or KVM system
Failure detection and alerting, for things like:
Network (especially inter-site link) monitoring
Mirrorset member loss
Node failure
It may be necessary to boot or reboot systems at a remote site without being physically there.
When redundancy is in place, it is important to be notified when failures occur, so that timely repairs can be made before a second failure can cause an outage.It may be necessary to boot or reboot systems at a remote site without being physically there.
When redundancy is in place, it is important to be notified when failures occur, so that timely repairs can be made before a second failure can cause an outage.
41. page 41 hp enterprise technical symposium filename.ppt Disaster-Tolerant Clusters Foundation:
Management and monitoring tools
Quorum recovery tool or mechanism (for 2-site clusters with balanced votes) Well talk in more depth about quorum schemes later.Well talk in more depth about quorum schemes later.
42. page 42 hp enterprise technical symposium filename.ppt Disaster-Tolerant Clusters Foundation:
Configuration planning and implementation assistance, and staff training It is wise to get some consulting help in planning and implementing a disaster-tolerant cluster, and in training your people how to operate it and handle various recovery scenarios correctly.
HP offers consulting services for DT cluster solutions using MC/Service Guard, OpenVMS Clusters, TruClusters, and for DR or DT solutions using StorageWorks DRM, StorageWorks XP, and EMC controller-based mirroring solutions.It is wise to get some consulting help in planning and implementing a disaster-tolerant cluster, and in training your people how to operate it and handle various recovery scenarios correctly.
HP offers consulting services for DT cluster solutions using MC/Service Guard, OpenVMS Clusters, TruClusters, and for DR or DT solutions using StorageWorks DRM, StorageWorks XP, and EMC controller-based mirroring solutions.
43. page 43 hp enterprise technical symposium filename.ppt Disaster-Tolerant Clusters Foundation:
Carefully-planned procedures for:
Normal operations
Scheduled downtime and outages
Detailed diagnostic and recovery action plans for various failure scenarios
44. page 44 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance Goal is to continue operating despite loss of an entire datacenter
All the pieces must be in place to allow that:
User access to both sites
Network connections to both sites
Operations staff at both sites
Business cant depend on anything that is only at either site
45. page 45 hp enterprise technical symposium filename.ppt Disaster Tolerance:Core Requirements Second site with its own storage, networking, computing hardware, and user access mechanisms is put in place
No dependencies on the 1st site are allowed
Data is constantly replicated to or copied to 2nd site, so data is preserved in a disaster To achieve Disaster Tolerance, these core pieces must be put in place.To achieve Disaster Tolerance, these core pieces must be put in place.
46. page 46 hp enterprise technical symposium filename.ppt Disaster Tolerance:Core Requirements Sufficient computing capacity is in place at the 2nd site to handle expected workloads by itself if the primary site is destroyed
Monitoring, management, and control mechanisms are in place to facilitate fail-over
If all these requirements are met, there may be as little as seconds or minutes of delay before data can actually be used
47. page 47 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance Sites must be carefully selected to avoid common hazards and loss of both datacenters at once
Make them a safe distance apart
This must be a compromise. Factors:
Risks
Performance (inter-site latency)
Interconnect costs
Ease of travel between sites
48. page 48 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance: What is a Safe Distance Analyze likely hazards of proposed sites:
Fire (building, forest, gas leak, explosive materials)
Storms (Tornado, Hurricane, Lightning, Hail)
Flooding (excess rainfall, dam breakage, storm surge, broken water pipe)
Earthquakes, Tsunamis
49. page 49 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance: What is a Safe Distance Analyze likely hazards of proposed sites:
Nearby transportation of hazardous materials (highway, rail)
Terrorist (or disgruntled customer) with a bomb or weapon
Enemy attack in war (nearby military or industrial targets)
Civil unrest (riots, vandalism)
50. page 50 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance: Site Separation Select separation direction:
Not along same earthquake fault-line
Not along likely storm tracks
Not in same floodplain or downstream of same dam
Not on the same coastline
Not in line with prevailing winds (that might carry hazardous materials)
51. page 51 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance: Site Separation Select site separation distance (in a safe direction):
1 kilometer: protects against most building fires, gas leak, terrorist bombs, armed intruder
10 kilometers: protects against most tornadoes, floods, hazardous material spills, release of poisonous gas, non-nuclear military bombs
100 kilometers: protects against most hurricanes, earthquakes, tsunamis, forest fires, dirty bombs, biological weapons, and possibly military nuclear attacks
52. page 52 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance: Providing Redundancy Redundancy must be provided for:
Datacenter and facilities (A/C, power, user workspace, etc.)
Data
And data feeds, if any
Systems
Network
User access
53. page 53 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance Also plan for operation after a disaster
Surviving site will likely have to operate alone for a long period before the other site can be repaired or replaced
54. page 54 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance Plan for operation after a disaster
Provide redundancy within each site
Facilities: Power feeds, A/C
Mirroring or RAID to protect disks
Clustering for servers
Network redundancy
55. page 55 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance Plan for operation after a disaster
Provide enough capacity within each site to run the business alone if the other site is lost
and handle normal workload growth rate
56. page 56 hp enterprise technical symposium filename.ppt Planning for Disaster Tolerance Plan for operation after a disaster
Having 3 sites is an option to seriously consider:
Leaves two redundant sites after a disaster
Leaves 2/3 capacity instead of ˝
57. page 57 hp enterprise technical symposium filename.ppt Cross-site Data Replication Methods Hardware
Storage controller
Software
Host software disk mirroring, duplexing, or volume shadowing
Database replication or log-shipping
Transaction-processing monitor or middleware with replication functionality
58. page 58 hp enterprise technical symposium filename.ppt Data Replication in Hardware HP StorageWorks Data Replication Manager (DRM)
HP XP Series with Continuous Access (CA) XP
HP Continuous Access Storage Appliance (CASA)
EMC Symmetrix Remote Data Facility (SRDF)
59. page 59 hp enterprise technical symposium filename.ppt Data Replication in Software Host software disk mirroring or shadowing:
Volume Shadowing Software for OpenVMS
MirrorDisk/UX for HP-UX
Veritas VxVM with Volume Replicator extensions for Unix and Windows
Fault Tolerant (FT) Disk on Windows
60. page 60 hp enterprise technical symposium filename.ppt Data Replication in Software Database replication or log-shipping
Replication
Remote Database Facility (RDF) on NonStop
e.g. Oracle DataGuard (Oracle Standby Database)
Database backups plus Log Shipping
61. page 61 hp enterprise technical symposium filename.ppt Data Replication in Software TP Monitor/Transaction Router
e.g. HP Reliable Transaction Router (RTR) Software on OpenVMS, Unix, and Windows
62. page 62 hp enterprise technical symposium filename.ppt Data Replication in Hardware Data mirroring schemes
Synchronous
Slower, but less chance of data loss
Beware: some solutions can still lose the last write operation before a disaster
Asynchronous
Faster, and works for longer distances
but can lose minutes worth of data (more under high loads) in a site disaster
63. page 63 hp enterprise technical symposium filename.ppt Data Replication in Hardware Mirroring is of sectors on disk
So operating system / applications must flush data from memory to disk for controller to be able to mirror it to the other site
64. page 64 hp enterprise technical symposium filename.ppt Data Replication in Hardware Resynchronization operations
May take significant time and bandwidth
May or may not preserve a consistent copy of data at the remote site until the copy operation has completed
May or may not preserve write ordering during the copy
65. page 65 hp enterprise technical symposium filename.ppt Data Replication:Write Ordering File systems and database software may make some assumptions on write ordering and disk behavior
For example, a database may write to a journal log, let that I/O complete, then write to the main database storage area
During database recovery operations, its logic may depend on these writes having completed in the expected order
66. page 66 hp enterprise technical symposium filename.ppt Data Replication:Write Ordering Some controller-based replication methods copy data on a track-by-track basis for efficiency instead of exactly duplicating individual write operations
This may change the effective ordering of write operations within the remote copy
67. page 67 hp enterprise technical symposium filename.ppt Data Replication:Write Ordering When data needs to be re-synchronized at a remote site, some replication methods (both controller-based and host-based) similarly copy data on a track-by-track basis for efficiency instead of exactly duplicating writes
This may change the effective ordering of write operations within the remote copy
The output volume may be inconsistent and unreadable until the resynchronization operation completes
68. page 68 hp enterprise technical symposium filename.ppt Data Replication:Write Ordering It may be advisable in this case to preserve an earlier (consistent) copy of the data, and perform the resynchronization to a different set of disks, so that if the source site is lost during the copy, at least one copy of the data (albeit out-of-date) is still present
69. page 69 hp enterprise technical symposium filename.ppt Data Replication in Hardware:Write Ordering Some products provide a guarantee of original write ordering on a disk (or even across a set of disks, at least if the disks are on a single controller)
Some products can even preserve write ordering during resynchronization operations, so the remote copy is always consistent (as of some point in time) during the entire resynchronization operation
70. page 70 hp enterprise technical symposium filename.ppt Data Replication:Performance Replication performance may be affected by latency due to the speed of light over the distance between sites
Greater (safer) distances between sites implies greater latency
71. page 71 hp enterprise technical symposium filename.ppt Data Replication: Performance Re-synchronization operations can generate a high data rate on inter-site links
Excessive re-synchronization time increases Mean Time To Repair (MTTR) after a site failure or outage
Acceptable re-synchronization times and link costs may be the major factors in selecting inter-site link(s)
72. page 72 hp enterprise technical symposium filename.ppt Data Replication:Performance With some solutions, it may be possible to synchronously replicate data to a nearby short-haul site, and asynchronously replicate from there to a more-distant site
This is sometimes called cascaded data replication
73. page 73 hp enterprise technical symposium filename.ppt Data Replication:Copy Direction Most hardware-based solutions can only replicate a given set of data in one direction or the other
Some can be configured replicate some disks on one direction, and other disks in the opposite direction
This way, different applications might be run at each of the two sites
74. page 74 hp enterprise technical symposium filename.ppt Data Replication in Hardware All access to a disk unit is typically from one controller at a time
So, for example, Oracle Parallel Server can only run on nodes at one site at a time
Read-only access may be possible at remote site with some products
Failover involves controller commands
Manual, or scripted
75. page 75 hp enterprise technical symposium filename.ppt Data Replication in Hardware Some products allow replication to:
A second unit at the same site
Multiple remote units or sites at a time (MxN configurations)
76. page 76 hp enterprise technical symposium filename.ppt Data Replication:Copy Direction A very few solutions can replicate data in both directions on the same mirrorset
Host software must coordinate any disk updates to the same set of blocks from both sites
e.g. Volume Shadowing in OpenVMS Clusters, or Oracle Parallel Server or Oracle 9i/RAC
This allows the same application to be run on cluster nodes at both sites at once
77. page 77 hp enterprise technical symposium filename.ppt Managing Replicated Data With copies of data at multiple sites, one must take care to ensure that:
Both copies are always equivalent, or, failing that,
Users always access the most up-to-date copy
78. page 78 hp enterprise technical symposium filename.ppt Managing Replicated Data If the inter-site link fails, both sites might conceivably continue to process transactions, and the copies of the data at each site would continue to diverge over time
This is called Split-Brain Syndrome, or a Partitioned Cluster
The most common solution to this potential problem is a Quorum-based scheme
79. page 79 hp enterprise technical symposium filename.ppt Quorum Schemes Idea comes from familiar parliamentary procedures
Systems are given votes
Quorum is defined to be a simple majority of the total votes
80. page 80 hp enterprise technical symposium filename.ppt Quorum Schemes In the event of a communications failure,
Systems in the minority voluntarily suspend or stop processing, while
Systems in the majority can continue to process transactions
81. page 81 hp enterprise technical symposium filename.ppt Quorum Schemes To handle cases where there are an even number of votes
For example, with only 2 systems,
Or half of the votes are at each of 2 sites
provision may be made for
a tie-breaking vote, or
human intervention With only two nodes, if communications fails between the nodes, neither node has the number of votes necessary to have a majority of votes, so both nodes consider themselves to be in the minority during a loss of communications, and neither one can process transactions.
Likewise, if there are two sites with equal numbers of votes at each site, neither site can process transactions after a failure of the inter-site link.
This problem is typically solved by allowing human intervention to decide which one (and it is important that only one) of the two sites will continue processing transactions, or else to make provision for a tie-breaking vote so that the determination of which site is to continue can be done automatically, without human intervention and the delay that would cause.With only two nodes, if communications fails between the nodes, neither node has the number of votes necessary to have a majority of votes, so both nodes consider themselves to be in the minority during a loss of communications, and neither one can process transactions.
Likewise, if there are two sites with equal numbers of votes at each site, neither site can process transactions after a failure of the inter-site link.
This problem is typically solved by allowing human intervention to decide which one (and it is important that only one) of the two sites will continue processing transactions, or else to make provision for a tie-breaking vote so that the determination of which site is to continue can be done automatically, without human intervention and the delay that would cause.
82. page 82 hp enterprise technical symposium filename.ppt Quorum Schemes:Tie-breaking vote This can be provided by a disk:
Cluster Lock Disk for MC/Service Guard
Quorum Disk for OpenVMS Clusters or TruClusters or MSCS
Or by a system with a vote, located at a 3rd site
Software running on a non-clustered node or a node in another cluster
e.g. Quorum Server for MC/Service Guard
Additional cluster member node for OpenVMS Clusters or TruClusters (called quorum node) or MC/Service Guard (called arbitrator node)
83. page 83 hp enterprise technical symposium filename.ppt Quorum configurations inMulti-Site Clusters 3 sites, equal votes in 2 sites
Intuitively ideal; easiest to manage & operate
3rd site serves as tie-breaker
3rd site might contain only a quorum node, arbitrator node, or quorum server In MC/Service Guard disaster-tolerant clusters, there must be an equal number of systems in the cluster at each of the two sites. Each system has one vote in the quorum scheme.
In OpenVMS Clusters, the number of nodes in each site can vary, because the number of votes each node is given may be anywhere from 0 to 127 votes, and the quorum disk can also be given from 0 to 127 votes. In a TruCluster, the quorum disk may only have 0 or 1 votes.In MC/Service Guard disaster-tolerant clusters, there must be an equal number of systems in the cluster at each of the two sites. Each system has one vote in the quorum scheme.
In OpenVMS Clusters, the number of nodes in each site can vary, because the number of votes each node is given may be anywhere from 0 to 127 votes, and the quorum disk can also be given from 0 to 127 votes. In a TruCluster, the quorum disk may only have 0 or 1 votes.
84. page 84 hp enterprise technical symposium filename.ppt Quorum configurations inMulti-Site Clusters 3 sites, equal votes in 2 sites
Hard to do in practice, due to cost of inter-site links beyond on-campus distances
Could use links to quorum site as backup for main inter-site link if links are high-bandwidth and connected together
Could use 2 less-expensive, lower-bandwidth links to quorum site, to lower cost
85. page 85 hp enterprise technical symposium filename.ppt Quorum configurations in3-Site Clusters
86. page 86 hp enterprise technical symposium filename.ppt Quorum configurations inMulti-Site Clusters 2 sites:
Most common & most problematic:
How do you arrange votes? Balanced? Unbalanced?
If votes are balanced, how do you recover from loss of quorum which will result when either site or the inter-site link fails? In an MC/Service Guard cluster, the votes must always be balanced.
With OpenVMS Clusters or TruClusters, one faces a choice. One site may be given more votes, and it can continue with or without the other site, and whether or not the inter-site link is functional. The site wth fewer votes would pause due to quorum loss if the other site failed or the inter-site link failed, but manual action could be taken to restore quorum and resume processing. In the OpenVMS Cluster case, the DECamds or Availability Manager tools can be used to restore quorum. One would need to be careful to only restore quorum at one of the two sites in the event of an inter-site link failure, to avoid a partitioned cluster, sometimes called split-brain syndrome.In an MC/Service Guard cluster, the votes must always be balanced.
With OpenVMS Clusters or TruClusters, one faces a choice. One site may be given more votes, and it can continue with or without the other site, and whether or not the inter-site link is functional. The site wth fewer votes would pause due to quorum loss if the other site failed or the inter-site link failed, but manual action could be taken to restore quorum and resume processing. In the OpenVMS Cluster case, the DECamds or Availability Manager tools can be used to restore quorum. One would need to be careful to only restore quorum at one of the two sites in the event of an inter-site link failure, to avoid a partitioned cluster, sometimes called split-brain syndrome.
87. page 87 hp enterprise technical symposium filename.ppt Quorum configurations inTwo-Site Clusters Unbalanced Votes
More votes at one site
Site with more votes can continue without human intervention in the event of loss of the other site or the inter-site link
Site with fewer votes pauses or stops on a failure and requires manual action to continue after loss of the other site
88. page 88 hp enterprise technical symposium filename.ppt Quorum configurations inTwo-Site Clusters Unbalanced Votes
Very common in remote-mirroring-only clusters (not fully disaster-tolerant)
0 votes is a common choice for the remote site in this case
89. page 89 hp enterprise technical symposium filename.ppt Quorum configurations inTwo-Site Clusters Unbalanced Votes
Common mistake: give more votes to Primary site; leave Standby site unmanned (cluster cant run without Primary or human intervention at unmanned Standby site)
90. page 90 hp enterprise technical symposium filename.ppt Quorum configurations inTwo-Site Clusters Balanced Votes
Equal votes at each site
Manual action required to restore quorum and continue processing in the event of either:
Site failure, or
Inter-site link failure
91. page 91 hp enterprise technical symposium filename.ppt Data Protection Scenarios Protection of the data is extremely important in a disaster-tolerant cluster
Well look at two obscure but dangerous scenarios that could result in data loss:
Creeping Doom
Rolling Disaster
92. page 92 hp enterprise technical symposium filename.ppt Creeping Doom Scenario Here we have two sites, with data mirrored between them.Here we have two sites, with data mirrored between them.
93. page 93 hp enterprise technical symposium filename.ppt Creeping Doom Scenario A lightning strike hits the network room, taking out (all of) the inter-site link(s).A lightning strike hits the network room, taking out (all of) the inter-site link(s).
94. page 94 hp enterprise technical symposium filename.ppt Creeping Doom Scenario First symptom is failure of link(s) between two sites
Forces choice of which datacenter of the two will continue
Transactions then continue to be processed at chosen datacenter, updating the data In this case, for the sake of illustration, well assume we happen to choose the left-hand site (the one where lightning struck) to continue processing transactions.In this case, for the sake of illustration, well assume we happen to choose the left-hand site (the one where lightning struck) to continue processing transactions.
95. page 95 hp enterprise technical symposium filename.ppt Creeping Doom Scenario
96. page 96 hp enterprise technical symposium filename.ppt Creeping Doom Scenario In this scenario, the same failure which caused the inter-site link(s) to go down expands to destroy the entire datacenter
97. page 97 hp enterprise technical symposium filename.ppt Creeping Doom Scenario Fire spreads from the network room and engulfs the entire site.
Fire spreads from the network room and engulfs the entire site.
98. page 98 hp enterprise technical symposium filename.ppt Creeping Doom Scenario Transactions processed after wrong datacenter choice are thus lost
Commitments implied to customers by those transactions are also lost
99. page 99 hp enterprise technical symposium filename.ppt Creeping Doom Scenario Techniques for avoiding data loss due to Creeping Doom:
Tie-breaker at 3rd site helps in many (but not all) cases
3rd copy of data at 3rd site
100. page 100 hp enterprise technical symposium filename.ppt Rolling Disaster Scenario Disaster or outage makes one sites data out-of-date
While re-synchronizing data to the formerly-down site, a disaster takes out the primary site
101. page 101 hp enterprise technical symposium filename.ppt Rolling Disaster Scenario The right-hand site has been down for scheduled maintenance or something, or has recovered from an earlier disaster, and we are in the process of copying data from the disks at the left-hand site to the disks at the right-hand site, one track after the other.The right-hand site has been down for scheduled maintenance or something, or has recovered from an earlier disaster, and we are in the process of copying data from the disks at the left-hand site to the disks at the right-hand site, one track after the other.
102. page 102 hp enterprise technical symposium filename.ppt Rolling Disaster Scenario Before the data re-synchronization can be completed, a disaster strikes the left-hand site. Now we have no valid copy of the data left. Before the data re-synchronization can be completed, a disaster strikes the left-hand site. Now we have no valid copy of the data left.
103. page 103 hp enterprise technical symposium filename.ppt Rolling Disaster Scenario Techniques for avoiding data loss due to Rolling Disaster:
Keep copy (backup, snapshot, clone) of out-of-date copy at target site instead of over-writing the only copy there
Surviving copy will be out-of-date, but at least youll have some copy of the data
3rd copy of data at 3rd site
104. page 104 hp enterprise technical symposium filename.ppt Long-distance Cluster Issues Latency due to speed of light becomes significant at higher distances. Rules of thumb:
About 1 ms per 100 km, one-way
About 1 ms per 50 km round-trip latency
Actual circuit path length can be longer than highway mileage between sites
Latency affects I/O and locking The Speed of Light is one speed limit you cant break.
Inter-site latency will affect the performance of any I/O operations which must go to the remote site (such as write I/Os which update the mirrorset members at that site) as well as any distributed lock manager operations that go across the link between sites, in a cluster which has a DLM.The Speed of Light is one speed limit you cant break.
Inter-site latency will affect the performance of any I/O operations which must go to the remote site (such as write I/Os which update the mirrorset members at that site) as well as any distributed lock manager operations that go across the link between sites, in a cluster which has a DLM.
105. page 105 hp enterprise technical symposium filename.ppt Differentiate between latency and bandwidth Cant get around the speed of light and its latency effects over long distances
Higher-bandwidth link doesnt mean lower latency
Multiple links may help latency somewhat under heavy loading due to shorter queue lengths, but cant outweigh speed-of-light issues At long inter-site distances, a higher-speed (really higher-bandwidth) link likely wont have lower latency, as the latency will be dominated by the effect of the speed of light over the extended distance.At long inter-site distances, a higher-speed (really higher-bandwidth) link likely wont have lower latency, as the latency will be dominated by the effect of the speed of light over the extended distance.
106. page 106 hp enterprise technical symposium filename.ppt Application Scheme 1:Hot Primary/Cold Standby All applications normally run at the primary site
Second site is idle, except for data replication, until primary site fails, then it takes over processing
Performance will be good (all-local locking)
Fail-over time will be poor, and risk high (standby systems not active and thus not being tested)
Wastes computing capacity at the remote site In a shared-everything cluster, there are 3 main ways folks run applications.
In a shared-storage cluster, only the first 2 may be possible.In a shared-everything cluster, there are 3 main ways folks run applications.
In a shared-storage cluster, only the first 2 may be possible.
107. page 107 hp enterprise technical symposium filename.ppt Application Scheme 2:Hot/Hot but Alternate Workloads All applications normally run at one site or the other, but not both; opposite site takes over upon a failure
Performance will be good (all-local locking)
Fail-over time will be poor, and risk moderate (standby systems in use, but specific applications not active and thus not being tested from that site)
Second sites computing capacity is actively used
108. page 108 hp enterprise technical symposium filename.ppt Application Scheme 3:Uniform Workload Across Sites All applications normally run at both sites simultaneously; surviving site takes all load upon failure
Performance may be impacted (some remote locking) if inter-site distance is large
Fail-over time will be excellent, and risk low (standby systems are already in use running the same applications, thus constantly being tested)
Both sites computing capacity is actively used Doing this requires a shared-everything cluster implementation like OpenVMS Clusters or TruClusters.Doing this requires a shared-everything cluster implementation like OpenVMS Clusters or TruClusters.
109. page 109 hp enterprise technical symposium filename.ppt Capacity Considerations When running workload at both sites, be careful to watch utilization.
Utilization over 35% will result in utilization over 70% if one site is lost
Utilization over 50% will mean there is no possible way one surviving site can handle all the workload When capacity at both sites can be used (Application Schemes 2 and 3), one must be careful to not over-subscribe the computing capacity.When capacity at both sites can be used (Application Schemes 2 and 3), one must be careful to not over-subscribe the computing capacity.
110. page 110 hp enterprise technical symposium filename.ppt Response time vs. Utilization Queuing theory allows us to predict the impact on response time as a resource is increasingly busy. In this graph, response time for an idle resource is set at 1. At 50% utilization, response time has doubled, and at 90% utilization, response time has gone to 10 times. To prevent response times from becoming unacceptable, it is advisable to keep resource utilization below the knee of the curve, which starts to occur at about 70%.Queuing theory allows us to predict the impact on response time as a resource is increasingly busy. In this graph, response time for an idle resource is set at 1. At 50% utilization, response time has doubled, and at 90% utilization, response time has gone to 10 times. To prevent response times from becoming unacceptable, it is advisable to keep resource utilization below the knee of the curve, which starts to occur at about 70%.
111. page 111 hp enterprise technical symposium filename.ppt Response time vs. Utilization: Impact of losing 1 site If you take advantage of the capacity of both sites to run production workload, be careful not to fall into the trap of not providing additional capacity to keep up with increases in workload, because if one of the two sites fails, you will have only one-half the capacity, utilization will double, and response times may become intolerable. To keep utilization under the knee of the curve, or about 70%, after a site loss, one needs to procure additional computing capacity whenever utilization gets over about ˝ that, or 35%, under normal conditions, when two sites are present.If you take advantage of the capacity of both sites to run production workload, be careful not to fall into the trap of not providing additional capacity to keep up with increases in workload, because if one of the two sites fails, you will have only one-half the capacity, utilization will double, and response times may become intolerable. To keep utilization under the knee of the curve, or about 70%, after a site loss, one needs to procure additional computing capacity whenever utilization gets over about ˝ that, or 35%, under normal conditions, when two sites are present.
112. page 112 hp enterprise technical symposium filename.ppt Testing Separate test environment is very helpful, and highly recommended
Good practices require periodic testing of a simulated disaster. Allows you to:
Validate your procedures
Train your people
113. page 113 hp enterprise technical symposium filename.ppt Business Continuity Ability for the entire business, not just IT, to continue operating despite a disaster
114. page 114 hp enterprise technical symposium filename.ppt Business Continuity:Not just IT Not just computers and data:
People
Facilities
Communications
Networks
Telecommunications
Transportation
115. page 115 hp enterprise technical symposium filename.ppt Real-Life Examples Credit Lyonnais fire in Paris, May 1996
Data replication to a remote site saved the data
Fire occurred over a weekend, and DR site plus quick procurement of replacement hardware allowed bank to reopen on Monday
116. page 116 hp enterprise technical symposium filename.ppt Real-Life Example:Credit Lyonnais, Paris
117. page 117 hp enterprise technical symposium filename.ppt Real-Life Example:Credit Lyonnais, Paris
118. page 118 hp enterprise technical symposium filename.ppt Real-Life Example:Credit Lyonnais, Paris
119. page 119 hp enterprise technical symposium filename.ppt Real-Life Examples:Online Stock Brokerage 2 a.m. on Dec. 29, 1999, an active stock market trading day
UPS Audio Alert alarmed security guard on his first day on the job, who pressed emergency power-off switch, taking down the entire datacenter
120. page 120 hp enterprise technical symposium filename.ppt Real-Life Examples:Online Stock Brokerage Disaster-tolerant cluster continued to run at opposite site; no disruption
Ran through that trading day on one site alone
Re-synchronized data in the evening after trading hours
Procured replacement security guard by the next day
121. page 121 hp enterprise technical symposium filename.ppt Real-Life Examples: Commerzbank on 9/11 Datacenter near WTC towers
Generators took over after power failure, but dust & debris eventually caused A/C units to fail
Data replicated to remote site 30 miles away
One server continued to run despite 40°C temperatures, running off the copy of the data at the opposite site after the local disk drives had succumbed to the heat
122. page 122 hp enterprise technical symposium filename.ppt Real-Life Examples: Online Brokerage Dual inter-site links
From completely different vendors
Both vendors sub-contracted to same local RBOC for local connections at both sites
Result: One simultaneous failure of both links within 4 years time
123. page 123 hp enterprise technical symposium filename.ppt Real-Life Examples: Online Brokerage Dual inter-site links from different vendors
Both used fiber optic cables across the same highway bridge
El Nińo caused flood which washed out bridge
Vendors SONET rings wrapped around the failure, but latency skyrocketed and cluster performance suffered
124. page 124 hp enterprise technical symposium filename.ppt Real-Life Examples: Online Brokerage Vendor provided redundant storage controller hardware
Despite redundancy, a controller pair failed, preventing access to the data behind the controllers
Host-based mirroring was in use, and the cluster continued to run using the copy of the data at the opposite site
125. page 125 hp enterprise technical symposium filename.ppt Real-Life Examples: Online Brokerage Dual inter-site links from different vendors
Both vendors links did fail sometimes
Redundancy and automatic failover masks failures
Monitoring is crucial
One outage lasted 6 days before discovery
126. page 126 hp enterprise technical symposium filename.ppt Speaker Contact Info Keith Parris
E-mail: keith.parris@hp.com or parris@encompasserve.org or keithparris@yahoo.com
Web: http://www.geocities.com/keithparris/ and http://encompasserve.org/~kparris/
127. page 127 hp enterprise technical symposium filename.ppt Speaker Contact Info:
Keith Parris
E-mail: parris@encompasserve.org
or keithparris@yahoo.com
or Keith.Parris@hp.com
Web: http://encompasserve.org/~parris/
and http://www.geocities.com/keithparris/