600 likes | 1.16k Views
SESSION CODE: EXL407. Scott Schnoll Principal Technical Writer Microsoft Corporation. Exchange server 2010 high availability deep dive. Agenda. Exchange Server 2010 High Availability Deep Dive Database Availability Group Networks Active Manager Best Copy Selection
E N D
SESSION CODE: EXL407 Scott Schnoll Principal Technical Writer Microsoft Corporation Exchange server 2010high availability deep dive (c) 2011 Microsoft. All rights reserved.
Agenda • Exchange Server 2010 High Availability Deep Dive • Database Availability Group Networks • Active Manager • Best Copy Selection • Datacenter Activation Coordination Mode (c) 2011 Microsoft. All rights reserved.
Exchange Server 2010 High Availability Deep Dive: Database Availability Group Networks
DAG Networks • A DAG network is a collection of one or more subnets • There are two types of DAG networks • MAPI Network - connects DAG members to network resources (Active Directory, other Exchange servers, DNS, etc.) • Registered in DNS / DNS configured • Uses default gateway • Client for Microsoft Networks/File and Print Sharing enabled • Replication Network - used for/by continuous replication (log shipping and seeding) • Not registered in DNS / DNS not configured • Typically no default gateway • Client for Microsoft Networks/File and Print Sharing disabled
DAG Networks • All DAGs must have: • Exactly one MAPI network • Zero or more Replication networks • Separate network(s) on separate subnet(s) • LRU determines which replication network is used with multiple replication networks • DAG networks automatically created when Mailbox server is added to DAG • Based on cluster’s enumeration of networks • Cluster enumeration based on subnet • One cluster network is created for each subnet
DAG Networks • Maximum round trip return latency between all DAG members must be 500 ms or less • Regardless of the latency of the solution, customers should validate that the network between all DAG members is capable of satisfying the data protection and availability goals of the deployment • May need to investigate increasing the number of databases or decreasing the number of mailboxes per database to achieve desired goals
DAG Networks • Collapse subnets into two DAG networks and disable replication for the MAPI network: Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
DAG Networks • Collapse subnets into two DAG networks and disable replication for the MAPI network: Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
DAG Networks • Automatic detection occurs only when members added to DAG • If networks are added after member is added, you must perform discovery Set-DatabaseAvailabilityGroup -DiscoverNetworks • DAG network configuration persisted in cluster registry • HKLM\Cluster\Exchange\DAG Network • DAG networks include built-in encryption and compression • Encryption: Kerberos SSP EncryptMessage/DecryptMessage APIs • Compression: Microsoft XPRESS, based on LZ77 algorithm
DAG Networks • Block cross-network communication to minimize heartbeat traffic Allowed Subnet 1 Subnet 3 Subnet 2 Subnet 4 Blocked
DAG Networks • If using iSCSI storage, configure DAG and cluster to ignore iSCSI networks • Set-DatabaseAvailabilityGroupNetwork -Identity <DAGNetworkName> -ReplicationEnabled:$false -IgnoreNetwork:$true • Cluster network <ClusterNetworkName> /prop Role=0
DAG Networks • When a DAG spans multiple subnets you need an IP address on the MAPI network for each subnet • Use DHCP in site resilience configurations to assign IP addresses to Replication network • Enables delivery of the typically required static routes • If using static IP addresses, use netsh to configure static routes • Configure a DNS TTL on service access connection records that is consistent with your SLA, e.g. ~5 minutes for a one hour RTO SLA
Exchange Server 2010 High Availability Deep Dive: Active Manager
Active Manager • What are the three Active Manager roles? • Standalone • PAM (Primary Active Manager) • SAM (Standby Active Manager) • Transition of role state logged into Microsoft-Exchange-HighAvailability/Operational event log (Crimson Channel)
Active Manager Functionality • Mount and Dismount Databases • Provide Database Availability Information • Provide Interface for Administrative Tasks • Monitor for Failures • Maintains Database and Server State Information
AutoMount on DAG Members • In a DAG, all AutoMount operations are coordinated through the PAM • AutoMount operations occur: • When the first server in the DAG is initialized • When the ownership of the PAM role is changed
AutoMount on DAG Members • Checks msExchMasterServerOrAvailabilityGroup to determine all databases hosted on the DAG • Checks if database can be mounted on startup • If msExchEDBOffline is TRUE, stop processing • If msExchEDBOffline is FALSE, proceed with processing
AutoMount on DAG Members • Checks persistent database information stored in cluster registry • Determines if database is mounted on another DAG member • If the database is mounted on another server, take no action • If the database is not mounted on another server, proceed
AutoMount on DAG Members • Checks AdminDismount in cluster registry: • If AdminDismount is TRUE, take no action • If AdminDismount is FALSE, proceed • Checks persistent database state information in cluster registry for server on which database was last mounted • If server available, issue mount request to Information Store on that server • If server not available or property not set, issue mount request to next server in sorted list
AutoMount on DAG Members • If AutoMount operation succeeds: • Update persistent database state information stored in cluster database • Propagate information to all other DAG members
Mount / Dismount Database Copy • Mount Database • An administrator action invoked through a task • The last part of a move operation • Dismount Database • An administrator action invoked through a task • The first part of a move operation
Mount Database – DAG Member • Initiate RPC to member of the DAG • If the server contacted is not the PAM, the task is referred to the PAM • If the server is the PAM, continue with no referral • Checks the msExchMasterServerOrAvailabilityGroup to ensure database is hosted in the DAG • If database is hosted in DAG, proceed • If database is not hosted in DAG, error out
Mount Database – DAG Member • Checks if the database is already mounted • If already mounted, task fails • If not already mounted, task continues • PAM invokes callback • This invokes a pre-check for the database mount operation • Persistent database state updated to show mount Initiated
Mount Database – DAG Member • PAM invokes RPC call to Information Store to mount database • If mount fails, task fails • If mount succeeds, task completes successfully • Persistent database state updated to record results of operation and propagated to other members
Dismount Database – DAG Member • Task initiates call to PAM or is referred to PAM • PAM checks that msExchMasterServerOrAvailabilityGroup value matches the DAG • PAM verifies that database is mounted in the DAG by checking persistent database state information stored in registry • If database is mounted, task proceeds • If database is dismounted, task fails
Dismount Database – DAG Member • PAM updates persistent state information in cluster database to show state Initiated • PAM makes RPC call to Information Store on DAG member and invokes dismount • If dismount operation succeeds, persistent database state information stored in cluster database is updated • If dismount operation fails, task fails
Auto Dismount – DAG Member • Occurs when a DAG loses quorum • All DAG members are running (but may not be participating in the cluster) • Databases dismounted as quickly as possible to avoid split-brain • Information Store service is terminated
Auto Dismount – DAG Member • Dismount operation should attempt to update database state information in cluster database • This is the only case where a database operation occurs on a server other than the PAM
Active Manager – Move Database • Move Database • An administrator action invoked by a task • Automatic operation initiated by the PAM (failover) • Begins with a Dismount operation and ends with a Mount operation
Exchange Server 2010 High Availability Deep Dive: Best Copy Selection
Best Copy Selection • Process of finding the best copy of an individual database to activate, given a list potential copies for activation and their status • Active Manager selects the “best” copy to become the new active copy when the existing active copy fails or when an administrator performs a targetless switchover
Best Copy Selection – RTM • Sorts copies by copy queue length to minimize data loss, using activation preference as a secondary sorting key if necessary • Selects from sorted listed based on which set of criteria met by each copy • Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy
Best Copy Selection – SP1 • Sorts copies by activation preference when auto database mount dial is set to Lossless • Otherwise, sorts copies based on copy queue length, with activation preference used a secondary sorting key if necessary • Selects from sorted listed based on which set of criteria met by each copy • Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy
Best Copy Selection • Is database mountable? • Is copy queue length <= AutoDatabaseMountDial? • If Yes, database is marked as current active and mount request is issued • If not, next best database tried (if one is available) • During best copy selection, any servers that are unreachable or “activation blocked” are ignored
Best Copy Selection – RTM • Four copies of DB1 • DB1 currently active on Server1 Server1 Server2 Server3 Server4 X DB1 DB1 DB1 DB1
Best Copy Selection – RTM • Sort list of available copies based by Copy Queue Length (using Activation Preference as secondary sort key if necessary): • Server3\DB1 • Server2\DB1 • Server4\DB1
Best Copy Selection – RTM • Only two copies meet first set of criteria for activation (CQL< 10; RQL< 50; CI=Healthy): • Server3\DB1 • Server2\DB1 • Server4\DB1 Lowest copy queue length – tried first
Best Copy Selection – SP1 • Four copies of DB1 • DB1 currently active on Server1 • Auto database mountdial set to Lossless Server1 Server2 Server3 Server4 X DB1 DB1 DB1 DB1
Best Copy Selection – SP1 • Sort list of available copies based by Activation Preference: • Server2\DB1 • Server3\DB1 • Server4\DB1
Best Copy Selection – SP1 • Sort list of available copies based by Activation Preference: • Server2\DB1 • Server3\DB1 • Server4\DB1 Lowest preference value – tried first
Best Copy Selection • After Active Manager determines the best copy to activate • The Replication service on the target server attempts to copy missing log files from the source (ACLL) • If successful, then the database will mount with zero data loss • If unsuccessful (lossy failure), then the database will mount based on the AutoDatabaseMountDial setting • If data loss is outside of dial setting, next copy will be tried
Best Copy Selection • If an activated database copy is mounted • It will generate new log files (using the same log generation sequence) • Transport Dumpster requests will be initiated for the mounted database to recover lost messages • When original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed
Exchange Server 2010 High Availability Deep Dive: Datacenter Activation Coordination Mode
Datacenter Activation Coordination Mode • DAC mode is a property of a DAG • Acts as an application-level form of quorum • Controls whether or not a Mailbox server attempts to mount its active databases on startup • Designed to prevent multiple copies of same database mounting on different members due to loss of network (split brain) • Also enables use of Site Resilience tasks • Stop-DatabaseAvailabilityGroup • Restore-DatabaseAvailabilityGroup • Start-DatabaseAvailabilityGroup
Datacenter Activation Coordination Mode • RTM: DAC Mode for DAGs with three or more members that are extended to two Active Directory sites • Don’t enable for two-member DAGs where each member is in different AD site or DAGs where all members are in the same AD site • SP1: DAC Mode can be enabled for all DAGs • If using Third Party Replication (TPR) mode, check with your vendor for guidance on DAC mode
Datacenter Activation Coordination Mode • Uses Datacenter Activation Coordination Protocol (DACP) • A bit in memory (in MSExchangeRepl.exe) set to either: • 0 = can’t mount • 1 = can mount