490 likes | 730 Views
UCC402. Exchange Server 2010 High Availability Deep Dive. Scott Schnoll Principal Technical Writer Microsoft Corporation. Agenda. Exchange Server 2010 High Availability Deep Dive Database Availability Group Networks Active Manager Best Copy Selection
E N D
UCC402 Exchange Server 2010High Availability Deep Dive Scott Schnoll Principal Technical Writer Microsoft Corporation
Agenda • Exchange Server 2010 High Availability Deep Dive • Database Availability Group Networks • Active Manager • Best Copy Selection • Datacenter Activation Coordination Mode
Exchange Server 2010 High Availability Deep Dive: Database Availability Group Networks
DAG Networks • A DAG network is a logical collection of one or more subnets • There are two types of DAG networks • MAPI Network - connects DAG members to network resources (Active Directory, other Exchange servers, DNS, etc.) • Registered in DNS / DNS configured • Uses default gateway • Client for Microsoft Networks/File and Print Sharing enabled • Replication Network - used for/by continuous replication (log shipping and seeding) • Not registered in DNS / DNS not configured • No default gateway • Client for Microsoft Networks/File and Print Sharing disabled
DAG Networks • All DAGs must have: • Exactly one MAPI network • Zero or more Replication networks • Separate network(s) on separate subnet(s) • LRU determines which replication network is used with multiple replication networks • DAG networks automatically created when Mailbox server is added to DAG • Based on cluster’s enumeration of networks, which uses subnets • One cluster network is created per subnet
DAG Networks • Maximum round trip return latency between all DAG members must be 500 ms or less • Regardless of network latency, validate that the network between all DAG members is capable of satisfying your data protection and availability goals • May need to increase the number of databases or decreasing the number of mailboxes per database to achieve goals
DAG Networks • Collapse DAG networks and disable replication on MAPI network: Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0 Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03 Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
DAG Networks • Collapse DAG networks and disable replication on MAPI network: Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0 Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03 Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
DAG Networks • All DAGs extended to multiple datacenters should have hotfix from KB 2550886 installed • Automatic detection occurs when members added to DAG • If NICs are added after server is member of DAG, you must perform discovery • Set-DatabaseAvailabilityGroup<DAGName> -DiscoverNetworks • DAG network configuration persisted in cluster database • HKLM\Cluster\Exchange\DAG Network • DAGs include built-in encryption and compression • Encryption: Kerberos SSP EncryptMessage/DecryptMessage APIs • Compression: Microsoft XPRESS, based on LZ77 algorithm
DAG Networks • When using a single NIC • It is both the MAPI and the Replication network • EnableReplication is $True • When using multiple NICs • One NIC is the MAPI network • EnableReplication is $False • Other NIC(s) are Replication network(s) • Replication uses LRU to pick Replication network to use • If Replication networks are unavailable, MAPI network is used
DAG Networks • Use netsh, router ACLs or other means to block cross-network traffic Allowed M M M M Subnet 1 Subnet 3 R R R R Subnet 2 Subnet 4 Blocked
DAG Networks • If using iSCSI storage, configure DAG and cluster to ignore iSCSI networks • Set-DatabaseAvailabilityGroupNetwork-Identity <DAGNetworkName> -ReplicationEnabled:$false -IgnoreNetwork:$true
DAG Networks • When a DAG spans multiple subnets you need an IP address on the MAPI network for each subnet • Use DHCP in site resilience configurations to assign IP addresses to Replication network • Enables delivery of the typically required static routes • If using static IP addresses, use netsh to configure static routes • Configure a DNS TTL on namespace records consistent with your SLA • For example, use a TTL of 5 minutes for a 60 minute RTO SLA
Exchange Server 2010 High Availability Deep Dive: Active Manager
Active Manager • What are the three Active Manager roles? • Standalone • PAM (Primary Active Manager) • SAM (Standby Active Manager) • Transition of role state logged into Microsoft-Exchange-HighAvailability/Operational event log (Crimson Channel)
Active Manager Functionality • Mount and Dismount Databases • Provide Database Availability Information • Provide Interface for Administrative Tasks • Monitor for and React to Failures • Maintains Database and Server State Information
Mount / Dismount Database Copy • Mount Database • An administrator action invoked through a task • The last part of a move operation • Dismount Database • An administrator action invoked through a task • The first part of a move operation
Auto Dismount – DAG Member • Occurs when a DAG loses quorum • All DAG members are running (but may not be participating in the cluster) • Databases dismounted as quickly as possible to avoid split-brain • Information Store service is terminated
Active Manager – Move Database • Move Database • An administrator action invoked by a task • Automatic operation initiated by the PAM (failover) • Begins with a Dismount operation and ends with a Mount operation
Exchange Server 2010 High Availability Deep Dive: Best Copy Selection
Best Copy Selection • Active Manager selects the “best” copy to become the new active copy when the existing active copy fails, or when an administrator performs a target-less switchover • BCS is the process of finding the best copy of an individual database to activate, given a list potential copies for activation and their status • During BCS, any servers that are unreachable or activation blocked are ignored
Best Copy Selection – RTM • Sorts copies by copy queue length to minimize data loss, using activation preference as a secondary sorting key if necessary • Selects from sorted listed based on which set of criteria met by each copy • Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy
Best Copy Selection – SP1 • Sorts copies by activation preference when auto database mount dial is set to Lossless • Otherwise, sorts copies based on copy queue length, with activation preference used a secondary sorting key if necessary • Selects from sorted listed based on which set of criteria met by each copy • Attempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy
Best Copy Selection • Is database mountable? • Is copy queue length <= AutoDatabaseMountDial? • If Yes, database is marked as current active and mount request is issued • If not, next best database tried (if one is available)
Best Copy Selection – RTM • Four copies of DB1 • DB1 currently active on Server1 Server1 Server2 Server3 Server4 X DB1 DB1 DB1 DB1
Best Copy Selection – RTM • Sort list of available copies based by Copy Queue Length (using Activation Preference as secondary sort key if necessary): • Server3\DB1 • Server2\DB1 • Server4\DB1
Best Copy Selection – RTM • Only two copies meet first set of criteria for activation (CQL< 10; RQL< 50; CI=Healthy): • Server3\DB1 • Server2\DB1 • Server4\DB1 Lowest copy queue length – tried first
Best Copy Selection – SP1 • Four copies of DB1 • DB1 currently active on Server1 • Auto database mountdial set to Lossless Server1 Server2 Server3 Server4 X DB1 DB1 DB1 DB1
Best Copy Selection – SP1 • Sort list of available copies based by Activation Preference: • Server2\DB1 • Server3\DB1 • Server4\DB1
Best Copy Selection – SP1 • Sort list of available copies based by Activation Preference: • Server2\DB1 • Server3\DB1 • Server4\DB1 Lowest preference value – tried first
Best Copy Selection • After Active Manager determines the best copy to activate • The Replication service on the target server tries to copy missing log files from source (ACLL) • If successful, database will mount with zero data loss • If unsuccessful (lossy failure), database will mount based on the AutoDatabaseMountDial setting • If data loss is outside of dial setting, next copy will be tried
Best Copy Selection • If an activated database copy is mounted • It will generate new log files (using the same log generation sequence) • Transport Dumpster requests will be initiated for the mounted database to recover lost messages • When original server or database recovers, it will run through divergence detection and either perform an incremental resync or require a full reseed
Exchange Server 2010 High Availability Deep Dive: Datacenter Activation Coordination Mode
Datacenter Activation Coordination Mode • DAC mode is a property of a DAG • Acts as an application-level form of quorum • Controls whether or not a Mailbox server attempts to automatically mount its active databases on startup • Designed to prevent multiple copies of same database mounting on different members due to loss of network (split brain) • Also enables use of Site Resilience tasks • Stop-DatabaseAvailabilityGroup • Restore-DatabaseAvailabilityGroup • Start-DatabaseAvailabilityGroup
Datacenter Activation Coordination Mode • RTM: DAC Mode for DAGs with three or more members that are extended to two Active Directory sites • Don’t enable for two-member DAGs where each member is in different AD site or DAGs where all members are in the same AD site • SP1: DAC Mode can be enabled for all DAGs • If using Third Party Replication (TPR) mode, check with your vendor for guidance on DAC mode
Datacenter Activation Coordination Mode • Uses Datacenter Activation Coordination Protocol (DACP) • A bit in memory (in MSExchangeRepl.exe) set to either: • 0 = can’t auto-mount at startup • 1 = can auto-mount at startup
Datacenter Activation Coordination Mode • Active Manager startup sequence • DACP is set to 0 • DAG member communicates with other DAG members it can reach to determine the current value for their DACP bits • If the starting DAG member can communicate with all other members on the StartedMailboxServers list, DACP bit switches to 1 • If the starting DAG member can communicate with another member, and that other member’s DACP bit is set to 1, starting DAG member DACP bit switches to 1 • If the starting DAG member can communicate with another member, and that other member’s DACP bits are set to 0, starting DAG member DACP bit remains at 0
Related Content • UCC305 - Exchange Server 2010 High Availability Design
Resources • Exchange Team Blog • http://aka.ms/EHLO • Exchange 2010 Documentation Library • http://aka.ms/Ex2010Docs
Feedback Your feedback is very important! Please complete an evaluation form! Thank you!
Questions? • UCC402 • Scott Schnoll • Principal Technical Writer • scott.schnoll@microsoft.com • http://blogs.technet.com/scottschnoll • Twitter: @schnoll • You can ask me questions at the “Ask the Expert” zone: • November 10, 2011 12:30 – 13:30