1.34k likes | 1.66k Views
Exchange 2010 High Availability. Exchange Deployment Planning Services. Exchange 2010 High Availability. Ideal audience for this workshop Messaging SME Network SME Security SME. Exchange 2010 High Availability. During this session focus on the following :
E N D
Exchange 2010 High Availability Exchange Deployment Planning Services
Exchange 2010 High Availability • Ideal audience for this workshop • Messaging SME • Network SME • Security SME
Exchange 2010 High Availability • During this session focus on the following : • How will we leverage this functionality in our organization? • What availability and service level requirements do we have around our messaging solution?
Agenda • Review of Exchange Server 2007 Availability Solutions • Overview of Exchange 2010 High Availability • Exchange 2010 High Availability Fundamentals • Exchange 2010 High Availability Deep Dive • Exchange 2010 Site Resilience
Exchange Server 2007 Single Copy Clustering • SCC out-of-box provides little high availability value • On Store failure, SCC restarts store on the same machine; no CMS failover • SCC does not automatically recover from storage failures • SCC does not protect your data, your most valuable asset • SCC does not protect against site failures • SCC redundant network is not leveraged by CMS • Conclusion • SCC only provides protection from server hardware failures and bluescreens, the relatively easy components to recover • Supports rolling upgrades without losing redundancy
Exchange Server 2007 Continuous Replication FileShare 2. Inspect logs Database Database Log Log E00.log E0000000012.log E0000000011.log 1. Copy logs 3. Replay logs Cluster Local Standby Log shipping to a standby server or cluster Log shipping to a local disk Log shipping within a cluster
Exchange Server 2007 HA Solution (CCR + SCR) Manual “activation” of remote mailbox server AD site: Dallas Client Access Server DB4 Outlook (MAPI) client OWA, ActiveSync, or Outlook Anywhere DB5 Standby Server DB6 Mailbox server can’t co-exist with other roles AD site: San Jose Client Access Server SCR SCR managed separately; no GUI CCR #1 Node A CCR #1 Node B CCR #2 Node B CCR #2 Node A Windows cluster Windows cluster Clustering knowledge required DB4 DB4 DB1 DB1 DB5 DB2 DB2 DB5 Database failure requires server failover DB6 DB3 DB6 DB3
Exchange 2010 High Availability Goals • Reduce complexity • Reduce cost • Native solution - no single point of failure • Improve recovery times • Support larger mailboxes • Support large scale deployments Make High Availability Exchange deployments mainstream!
Exchange 2010 High Availability Architecture AD site: Dallas Client Access Server All clients connect via CAS servers DB1 DB3 DB5 Mailbox Server 6 AD site: San Jose Easy to extend across sites Client Access Server (CAS) Failover managed within Exchange Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 Mailbox Server 4 Mailbox Server 5 DB1 DB4 DB1 DB5 DB3 DB2 Database (DB) centric failover DB5 DB2 DB1 DB4 DB3 DB3 DB1 DB2 DB4 DB5
Exchange 2010 High Availability Fundamentals RPC CAS • Database Availability Group • Server • Database • Database Copy • Active Manager • RPC Client Access service SVR DB DB copy copy copy copy AM AM SVR DAG RPC CAS
Exchange 2010 High Availability FundamentalsDatabase Availability Group • A group of up to 16 servers hosting a set of replicated databases • Wraps a Windows Failover Cluster • Manages servers’ membership in the group • Heartbeats servers, quorum, cluster database • Defines the boundary of database replication • Defines the boundary of failover/switchover (*over) • Defines boundary for DAG’s Active Manager Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 Mailbox Server 4 Mailbox Server 16
Exchange 2010 High Availability FundamentalsServer • Unit of membership for a DAG • Hosts the active and passive copies of multiple mailbox databases • Executes Information Store, CI, Assistants, etc., services on active mailbox database copies • Executes replication services on passive mailbox database copies Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 DB3 DB4 DB1 DB4 DB1 DB2 DB2 DB3
Exchange 2010 High Availability FundamentalsServer (Continued) • Provides connection point between Information Store and RPC Client Access • Very few server-level properties relevant to HA • Server’s Database Availability Group • Server’s Activation Policy Client Access Server RCA Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 DB3 DB4 DB1 DB4 DB1 DB2 DB2 DB3
Exchange 2010 High Availability FundamentalsMailbox Database • Unit of *over • A database has 1 active copy – active copy can be mounted or dismounted • Maximum # of passive copies == # servers in DAG – 1 Mailbox Server 1 Mailbox Server 2 Mailbox Server 3 DB3 DB4 DB1 DB4 DB1 DB2 DB1 DB2 DB3
Exchange 2010 High Availability FundamentalsMailbox Database (Continued) • ~30 seconds database *overs • Server failover/switchover involves moving all active databases to one or more other servers • Database names are unique across a forest • Defines properties relevant at the database level • GUID: a Database’s unique ID • EdbFilePath: path at which copies are located • Servers: list of servers hosting copies
Exchange 2010 High Availability Fundamentals Active/Passive vs. Source/Target • Availability Terms • Active: Selected to provide email services to clients • Passive: Available to provide email services to clients if active fails • Replication Terms • Source: Provides data for copying to a separate location • Target: Receives data from the source DB1 DB1
Exchange 2010 High Availability FundamentalsMailbox Database Copy • Scope of replication • A copy is either source or target of replication at any given time • A copy is either active or passive at any given time • Only 1 copy of each database in a DAG is active at a time • A server may not host >1 copy of a any database Mailbox Server 1 Mailbox Server 2 DB1 X DB2 DB1 DB3 DB2 DB1 DB3
Exchange 2010 High Availability FundamentalsMailbox Database Copy Defines properties applicable to an individual database copy • Copy status: Healthy, Initializing, Failed, Mounted, Dismounted, Disconnected, Suspended, FailedandSuspended, Resynchronizing, Seeding • CopyQueueLength • ReplayQueueLength • ActiveCopy • ActivationSuspended
Exchange 2010 High Availability FundamentalsActive Manager • Exchange-aware resource manager (high availability’s brain) • Runs on every server in the DAG • Manages which copies should be active and which should be passive • Definitive source of information on where a database is active or mounted • Provides this information to other Exchange components (e.g., RPC Client Access and Hub Transport) • Information stored in cluster database
Exchange 2010 High Availability FundamentalsActive Manager • Active Directory is still primary source for configuration info • Active Manager is primary source for changeable state information (such as active and mounted) • Replication service monitors health of all mounted databases, and monitors ESE for I/O errors or failure
Exchange 2010 High Availability FundamentalsContinuous Replication • Continuous replication has the following basic steps: • Database copy seeding of target • Log copying from source to target • Log inspection at target • Log replay into database copy
Exchange 2010 High Availability FundamentalsDatabase Seeding • There are several ways to seed the target instance: • Automatic Seeding • Update-MailboxDatabaseCopy cmdlet • Can be performed from active or passive copies • Manually copy the database • Backup and restore (VSS)
Exchange 2010 High Availability FundamentalsLog Shipping • Log shipping in Exchange 2010 leverages TCP sockets • Supports encryption and compression • Administrator can set TCP port to be used • Replication service on target notifies the active instance the next log file it expects • Based on last log file which it inspected • Replication service on source responds by sending the required log file(s) • Copied log files are placed in the target’s Inspector directory
Exchange 2010 High Availability FundamentalsLog Inspection • The following actions are performed to verify the log file before replay: • Physical integrity inspection • Header inspection • Move any Exx.log files to ExxOutofDate folder that exist on target if it was previously a source • If inspection fails, the file will be recopied and inspected (up to 3 times) • If the log file passes inspection it is moved into the database copy’s log directory
Exchange 2010 High Availability FundamentalsLog Replay • Log replay has moved to Information Store • The following validation tests are performed prior to log replay: • Recalculate the required log generations by inspecting the database header • Determine the highest generation that is present in the log directory to ensure that a log file exists • Compare the highest log generation that is present in the directory to the highest log file that is required • Make sure the logs form the correct sequence • Query the checkpoint file, if one exists • Replay the log file using a special recovery mode (undo phase is skipped)
Exchange 2010 High Availability FundamentalsLossy Failure Process • In the event of failure, the following steps will occur for the failed database: • Active Manager will determine the best copy to activate • The Replication service on the target server will attempt to copy missing log files from the source - ACLL • If successful, then the database will mount with zero data loss • If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial setting • The mounted database will generate new log files (using the same log generation sequence) • Transport Dumpster requests will be initiated for the mounted database to recover lost messages • When original server or database recovers, it will run through divergence detection and perform an incremental reseed or require a full reseed
Exchange 2010 High Availability FundamentalsBackups • Streaming backup APIs for public use have been cut, must use VSS for backups • Backup from any copy of the database/logs • Always choose Passive (or Active) copy • Backup an entire server • Designate a dedicated backup server for a given database • Restore from any of these backups scenarios Database Availability Group Mailbox Server 1 Mailbox Server 3 Mailbox Server 2 DB1 DB1 DB1 DB2 DB2 DB2 VSS requestor DB3 DB3 DB3
Multiple Database Copies Enable New Scenarios • Exchange 2010 HA • E-mail archive • Extended/protected dumpster retention • Site/server/disk failure • Archiving/compliance • Recover deleted items Database Availability Group Mailbox Server 1 Mailbox Server 3 Mailbox Server 2 7-14 day lag copy X DB1 DB1 DB1 DB2 DB2 DB2 DB3 DB3 DB3
Mailbox Database Copies • Create up to 16 copies of each mailbox database • Each mailbox database must have a unique name within Organization • Mailbox database objects are global configuration objects • All mailbox database copies use the same GUID • No longer connected to specific Mailbox servers
Mailbox Database Copies • Each DAG member can host only one copy of a given mailbox database • Database path and log folder path for copy must be identical on all members • Copies have settable properties • Activation Preference • RTM: Used as second sort key during best copy selection • SP1: Used for distributing active databases; used as primary sorting key when using Lossless mount dial • Replay Lag and Truncation Lag • Using these features affects your storage design
Lagged Database Copies • A lagged copyis a passive database copy with a replay lag time greater than 0 • Lagged copies are only for point-in-time protection, but they are not a replacement for point-in-time backups • Logical corruption and/or mailbox deletion prevention scenarios • Provide a maximum of 14 days protection • When should you deploy a lagged copy? • Useful only to mitigate a risk • May not be needed if deploying a backup solution (e.g., DPM 2010) • Lagged copies are not HA database copies • Lagged copies should never be automatically activated by system • Steps for manual activation documented at http://technet.microsoft.com/en-us/library/dd979786.aspx • Lagged copies affect your storage design
DAG DesignTwo Failure Models • Design for all database copies activated • Design for the worst case - server architecture handles 100 percent of all hosted database copies becoming active • Design for targeted failure scenarios • Design server architecture to handle the active mailbox load during the worst failure case you plan to handle • 1 member failure requires 2 or more HA copies and 2 or more servers • 2 member failure requires 3 or more HA copies and 4 or more servers • Requires Set-MailboxServer<Server> -MaximumActiveDatabases <Number>
DAG DesignIt’s all in the layout • Consider this scenario • 8 servers, 40 databases with 2 copies
DAG DesignIt’s all in the layout • If I have a single server failure • Life is good
DAG DesignIt’s all in the layout • If I have a double server failure • Life could be good…
DAG DesignIt’s all in the layout • If I have a double server failure • Life could be bad…
DAG DesignIt’s all in the layout • Now let’s consider this scenario • 4 servers, 12 databases with 3 copies • With a single server failure: • With a double server failure:
Deep Dive on Exchange 2010 High Availability Basics QuorumWitnessDAG LifecycleDAG Networks
Quorum • Used to ensure that only one subset of members is functioning at one time • A majority of members must be active and have communications with each other • Represents a shared view of members (voters and some resources) • Dual Usage • Data shared between the voters representing configuration, etc. • Number of voters required for the solution to stay running (majority); quorum is a consensus of voters • When a majority of voters can communicate with each other, the cluster has quorum • When a majority of voters cannot communicate with each other, the cluster does not have quorum
Quorum • Quorum is not only necessary for cluster functions, but it is also necessary for DAG functions • In order for a DAG member to mount and activate databases, it must participate in quorum • Exchange 2010 uses only two of the four available cluster quorum models • Node Majority (DAGs with an odd number of members) • Node and File Share Majority (DAGs with an even number of members) • Quorum = (N/2) + 1 (whole numbers only) • 6 members: (6/2) + 1 = 4 votes for quorum (can lose 3 voters) • 9 members: (9/2) + 1 = 5 votes for quorum (can lose 4 voters) • 13 members: (13/2) + 1 = 7 votes for quorum (can lose 6 voters) • 15 members: (15/2) + 1 = 8 votes for quorum (can lose 7 voters)
Witness • A witness is a share on a server that is external to the DAG that participates in quorum by providing a weighted vote for the DAG member that has a lock on the witness.log file • Used only by DAGs that have an even number of members • Witness server does not maintain a full copy of quorum data and is not a member of the DAG or cluster
Witness • Represented by File Share Witness resource • File share witness cluster resource, directory, and share automatically created and removed as needed • Uses Cluster IsAlive check for availability • If witness is not available, cluster core resources are failed and moved to another DAG member • If other DAG member does not bring witness resource online, the resource will remain in a Failed state, with restart attempts every 60 minutes • See http://support.microsoft.com/kb/978790 for details on this behavior
Witness • If in a Failed state and needed for quorum, cluster will try to online File Share Witness resource once • If witness cannot be restarted, it is considered failed and quorum is lost • If witness can be restarted, it is considered successful and quorum is maintained • An SMB lock is placed on witness.log • Node PAXOS information is incremented and the updated PAXOS tag is written to witness.log • If in an Offline state and needed for quorum, cluster will not try to restart – quorum lost
Witness • When witness is no longer needed to maintain quorum, lock on witness.log is released • Any member that locks the witness, retains the weighted vote (“locking node”) • Members in contact with locking node are in majority and maintain quorum • Members not in contact with locking node are in minority and lose quorum
Witness Server • No pre-configuration typically necessary • Exchange Trusted Subsystem must be member of local Administrators group on Witness Server if Witness Server is not running Exchange 2010 • Cannot be a member of the DAG (present or future) • Must be in the same Active Directory forest as DAG
Witness Server • Can be Windows Server 2003 or later • File and Printer Sharing for Microsoft Networks must be enabled • Replicating witness directory/share with DFS not supported • Not necessary to cluster Witness Server • If you do cluster witness server, you must use Windows 2008 • Single witness server can be used for multiple DAGs • Each DAG requires its own unique Witness Directory/Share
Database Availability Group Lifecycle • Create a DAGNew-DatabaseAvailabilityGroup -Name DAG1 –WitnessServer EXHUB1 -WitnessDirectory C:\DAG1FSW -DatabaseAvailabilityGroupIpAddresses 10.0.0.8New-DatabaseAvailabilityGroup -Name DAG2 -DatabaseAvailabilityGroupIpAddresses 10.0.0.8,192.168.0.8 • Add Mailbox Servers to DAGAdd-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer EXMBX1Add-DatabaseAvailabilityGroupServer -Identity DAG1 -MailboxServer EXMBX2 • Add a Mailbox Database CopyAdd-MailboxDatabaseCopy -Identity DB1 -MailboxServer EXMBX2