290 likes | 409 Views
Exchange Server 2010 & 2013: Disaster Recovery – Troubleshooter v.1.0. 1. Switch to “Slide Show”. Instructions. 2 . Select the kind of issue occurring. How to use this tool?. 3 . Follow the instructions for each scenario. Scope of issue :. Mailbox. Database. Exchange Server. Exit.
E N D
Exchange Server 2010 & 2013: • Disaster Recovery – Troubleshooterv.1.0
1. Switch to “Slide Show” Instructions 2. Selectthekindofissueoccurring How to use this tool? 3. Followtheinstructions for eachscenario
Scopeofissue: Mailbox Database Exchange Server Exit
Mailbox levelissues Symptoms and common causes: Item count issues or items’ size vs mailbox size; Items “disappearing” (e.g.: meetings, contacts, e-mails); Items are being duplicated; OOF (Out-Of-Office) showing unusual behavior and/or errors; Outlook AND OWA showing errors, during mailbox access or folder navigation; Corruption of “old” items (e.g.: e-mails previously read that cannot be opened); Pre-defined searches not working anymore. Display name corruption for items/folders. Symptoms match? YES NO
Mailbox levelissues Troubleshooting: Using Exchange Management Shell (EMS): New-MailboxRepairRequest -Mailbox <Affected_Mailbox> -CorruptionType <SearchFolder,AggregateCounts,FolderView,ProvisionedFolder> Expected results: At Mailbox Server holding the mailbox, the “Application Event Viewer” should show: EventID: 10047 – Source: MSExchangeIS Mailbox Store – Starting the repair process. EventID: 10048 – Source: MSExchangeIS Mailbox Store – Informing the end of repair process. EventID: 10049 – Source: MSExchangeIS Mailbox Store – Informing the end of repair process and repaired objects report. Symptoms persists? YES NO
Mailbox levelissues Troubleshooting: Get mailbox statistics, through EMS: Get-MailboxFolderStatistics <Affected_Mailbox> |flIdentity,ItemsInFolder,FolderSize-AutoSize Through EMS, try to move mailbox, forcing to remove logical corruption: New-MoveRequest -Identity <Affected_Mailbox> -BadItemLimit <0 a 50> -TargetDatabase <different_database> Expected results: Check “MoveRequest Report” (Go to Exchange Magement Console (EMC) > Recipient Configuration > Move Request; or, via Powershell: Get-MoveRequest). Warning: If logical corruption happened, it is possible to lose affected data. Note: It is possible to recover “MoveHistory”, even after move report was removed: $MoveReport = (Get-MailboxStatistics -Identity `mailbox' -IncludeMoveReport).MoveHistory $MoveReport > path\history_file_name.txt Symptoms persists? YES NO
Databaselevelissues Symptoms and common causes: The same type of issues already listed for mailbox level, though affecting several (or all) mailboxes within a database (also called DB); Database won’t mount, after Information Store crash (possible logical corruption); Database dismounted, and won’t mount; Database states “dirty shutdown” during *.edb check via ESEUTIL / MH; Database states “dirty shutdown” during *.log check via ESEUTIL /ML; Database states “dirty shutdown” and logs are “disappearing” (check Antivirus). Database states “clean shutdown” AND “log required” via ESEUTIL /MH (and vice-versa). Symptoms match? YES NO
Databaselevelissues Affected database is protected by DAG (Database Availability Group)? NO YES
Databaselevelissues Database protected by DAG: It is possible that the Database copy is already mounted in another DAG member, as long as the copy was in “Health” state. However, DAG could suffer a failure that avoids databases to mount, forcing administrators to rebuild DAG copy through restore, or in the worst cases, force the copy still running on a healthy server, to mount, affording to lose data. There are, literally, dozens of factors that can cause this kind of scenario, therefore our approach is to discuss the most common scenarios, and how to fix each one. Possible action plans: Rebuild copy Rebuild DB Index Force mounting Failback Dial-Tone
Databaselevelissues Dial-Tone Database: Check database and log paths, through EMS: Get-Mailbox <Affected_DB> |fl *path* Check “EDBFilePath” and “LogFolderPath” and be sure there is no remaining files on those locations (Better move files to secure location, instead of delete this set of files). Force database to mount (via EMC or EMS): Mount-Database <Affected_DB>. Accept the creation of new log and EDB files. When the original DB is recovered, change the EDB’s, by dismounting the current (dial-tone) and moving it to a safe location, then replace it with the recovered EDB (or, simply overwrite it, using the back-up tool, after the dial-tone DB has been already copied to a safe location). Merging data of dial-tone and production EDB’s: New-MailboxDatabase -Name “Recovery_DB” (could be another meaningful name) -Server <Recover_Server> (on the same server) -EDBFilePath <“path+name.edb”> -LogFolderPath <“path_logs”> -Recovery (it will configure this new DB in “recovery mode”). Mount-Database <Recovery_DB> (it will mount the DB configured on prior step). Configure the production DB to allow restore: Check “This Database can be overwritten by a restore”, at “Maintenance” tab of production database, through EMC or using EMS. Via EMS, execute: Get-MailboxStatistics -Database Recovery_DB | Restore-Mailbox -RecoveryDatabaseRecovery_DB After this cmdlet, check if Outlook is not showing “Maintenance warnings”, and if it is already presenting all the data (recovered from backup, but also, dial-tone data). There is not a warning message at OWA, so it is best to test it through Outlook to check whether operation succeed. The approach of keeping dial-tone mounted as a production database, and merging data from database recovered (e.g.: by restore from backup) will cause permanent Outlook pop-ups about “Maintenance mode”. If this approach is adopted, the only way to fix it is by recreating the Outlook Profile in each machine, displaying the message. Symptoms persists? YES NO
Databaselevelissues Standalone Database (no DAG): This type of database is not ready to take failover actions. There is, at least, three ways to recover a standalone database that fails, including log sequence verification, through the need to restore from back-up. We are going to discuss the most common procedures to recovery standalone databases. Possibleactionplans: Check logs and EDB Replay Logs ESEutil /P
Databaselevelissues Check logs and EDB: Check Windows Event Viewer, in “Application” section for “ESE” source events. Check disk space on the paths used for logs & EDB. If there are no abnormalities during the routines above, it is time to check EDB: Elevate CMD, the access the file path to EDB and execute ESEUTIL /MH against the file: Note down the values for the fields “State” and “Log Required”; “State” can display “Clean” or “Dirty Shutdown”. “Log Required” can display any value from “0-0” (no log required), to a series of required logs. Any Database which is “State” is equal to “Clean Shutdown”, is technically ready to be mounted, even if all logs are lost. However, some serious kinds of physical corruption can render a DB in “Clean State”, that cannot be mounted, with several errors. Next
Databaselevelissues Check logs & EDB: Load elevated CMD, access path folder, and execute the ESEutil /ML at generation sequence: Example: e:\Db1\Logs\> ESEutil /ML E00 (“E00” the standard for new DBs, although this value can change). A list of log sequence and the state of each log is displayed. The States could be “OK”, “Missing”, or “Error:” (example): E0000000001.log – OK E0000000002.log – OK E0000000003.log – OK E0000000004.log – OK E0000000005.log – OK E0000000006.log – OK E0000000007.log – OK E0000000008.log – OK E0000000009.log – OK E000000000A.log – OK E000000000B.log – OK (...) Symptoms match? YES NO
Databaselevelissues Check logs & EDB: If “State” presents “Dirty Shutdown”, and “Log Required” points to any other value than “0-0” (expected), it will be necessary to find out the logs missing. Example: DB1 State “Dirty Shutdown” – Log Required “0x1 – 0x2” To identify the corresponding log generation file, open an elevated CMD, and execute: ESEutil /ML e04.log (example). There is a field called “LGeneration” that provides the formation sequence of this particular log, corresponding to “Log Required” field, presented at database command. If every .log file required at “Log Required” field is present and healthy, we can follow the “Replay” process. Symptoms persists? YES NO
Databaselevelissues Replay Logs process: If log sequence and EDB was successfully validate, it is time to log replay: Through elevated CMD, access path for logs. Execute “ESEutil /R E04” (as we discussed before, this value can be different. Check the prefix name, used at every log file for a “tip” or use ESEUTIL /MH to find out). This command identifies the path to EDB and apply the logs required by DB, just after checking again for log integrity and sequence. At the end, if no errors were detected, the EDB will display “State” = “Clean Shutdown”, upon ESEUTIL /MH execution. After this, we are ready to mount the database, dismissing any specific parameter. Symptoms persists? YES NO
Databaselevelissues ESEutil /P: ALWAYS, the last resort (recommended after attempts to fix with Microsoft Support representatives have failed). Implies loss of data. Open an elevated CMD and access the path to EDB. Always do a secure copy of the EDB, prior /P execution. Execute: ESEutil /P db1.edb (example). After this process, we are going to get an EDB in “Clean Shutdown” state. Yet, it is not logically consistent. As “ISInteg” tool is now deprecated, we have to use EMS cmdlets for fix this: New-MailboxRepairRequest -Database <path_for_DB_after_ESEutli/p> -CorruptionType <SearchFolder,AggregateCounts,FolderView,ProvisionedFolder> Symptoms persists? YES NO
Databaselevelissues Recreate copy: At the server where the DB first crashed (and now is acting as the passive copy): Suspend-MailboxDatabaseCopy -Identity <DB_Name\Server_Name> Executing “Full reseed”: Update-MailboxDatabaseCopy -Identity <DB_Name\Healthy_Copy_Server_Name> -DeleteExistingFiles This process can spend a long time, varying due to database size. Symptoms persists? YES NO
Database level issues Recreating Content Index for a DAG database: At the server presenting the issue for Content Index: Suspend-MailboxDatabaseCopy -Identity <DB_Name\Server_Name> Regenerating Content Index: Update-MailboxDatabaseCopy -Identity <DB_Name\Server_Name> -CatologOnly This process can take a long time, varying due to database size. Symptoms persists? YES NO
Databaselevelissues Forced mounting: It is possible, though uncommon, to suffer loss of data. On affected server, where forced mounting will be attempted: Move-ActiveMailboxDatabase -Identity <DB_Name> -ActivateOnServer <Sever_Name> -MountDialOverride "BestEffort" -SkipActiveCopyChecks -SkipLagChecks-SkipClientExperienceChecks -SkipHealthChecks Discharging, basically, “all” routines used to check DAG database integrity and health, this cmdlet will attempt to mount the db, accepting to lose data. Several mechanisms are in place to avoid this risk to occur, but it is impossible to ensure “no risk” through this method. Symptoms persists? YES NO
Databaselevelissues Failback: Get-MailboxDatabaseCopyStatusDB_Name Prior to execute failback, check columns “Status” & “ContentIndex State”, during the cmdlet above. Show present “Healthy” for both. Otherwise, failback will fail. If any other status is present, try “DB Copy Rebuild” and/or “DB Catalog Rebuild” operations. Then, the failback occurs using “Move-ActiveMailboxDatabase”: Move-ActiveMailboxDatabase -Identity <“DB_Name”> -ActivateOnServer <“Server_Name”> Symptoms persists? YES NO
Exchange Server level issues Symptoms and common causes: Common causes and symptoms related at “Database level”, however, affecting all databases present in a given server; Exchange server services won’t start, logging errors at Event Viewer; Windows Server is corrupted and O.S. is lost; Damaged hardware, beyond repair. Symptoms match? YES NO
Exchange Server levelissues Exchange Role presentingissues: Mailbox Server Client Access Server /Hub Transport Server Dial-ToneDatabase* Return
Exchange Server levelissues Mailbox Server: Reset computer account for the affected server, through ADUC (Active Directory Users and Computers), or any other supported method. Reinstall Operation System exactly as the server was configured with, prior the crash, and provide the same FQDN (full qualified domain name)of the lost server. It is not possible to recover a server, using another server name or O.S. version. Reconfigure all Network adapter to the values of the lost server. Do not join the domain. Mailbox Server type DAG Standalone
Exchange Server levelissues Mailbox Server (DAG): Install O.S and Exchange Server pre-requisites, hotfixes, and so on. Tip: Using an elevated CMD or EMS, access the Exchange Installation folder and execute: servermanagercmd -ip exchange-typical.xml (this script installs Exchange pre-requisites (only) for all the roles. There are other scripts on this folder). Using Exchange Management Shell of other server: Remove-MailboxDatabaseCopyDB_Name\Server_lost_Name Remove-DabaseAvailabilityGroupServer -Identity <DAG_Name> -MailboxServer <Server_Lost_Name> -ConfigurationOnly cluster.exe /cluster:<DAG Name> Node <Server Name> /Evict (Force removal of lost server from cluster database). Add the new server (but with the same old name) to Active Directory domain, again. At the new server, open elevated command prompt. At Exchange 2010 Installation folder path, execute: Setup /m:RecoverServer If there are healthy database copies of this server, at the other DAG members: Add-DatabaseAvailabilityGroupServer -Identity DAG_Name –MailboxServer <Server_Recovered_Name> Add-MailboxDatabaseCopy -Identity <DB_Name> -MailboxServer <Server_Recovered_Name> If something fails, during this process, it is possible to solve issues by using “reseed” process, at “Database level issues”. If there are no remain copies for this server at DAG members, repeat step “a.” above and, then, recover DB's from backup. Symptoms persists? YES NO
Exchange Server levelissues Mailbox Server (Standalone): Install O.S and Exchange Server pre-requisites, hotfixes, and so on. Tip: Using an elevated CMD or EMS, access the Exchange Installation folder and execute: servermanagercmd -ip exchange-typical.xml (this script installs Exchange pre-requisites (only) for all the roles. There are other scripts on this folder). Add the server to Active Directory domain, again. Same FQDN (full qualified domain name) and IP configurations. Access the elevated cmd prompt at the recovered server. At installation folder path for Exchange 2010, execute: Setup /m:RecoverServer As we are considering a Standalone Mailbox Server, there are no database copies on other servers, so restore from backup is the only way to recover database data. Symptoms persists? YES NO
Exchange Servers levelissues Client Access Server/Hub Transport Server: Install O.S and Exchange Server pre-requisites, hotfixes, and so on. Tip: Using an elevated CMD or EMS, access the Exchange Installation folder and execute: servermanagercmd -ip exchange-typical.xml (this script installs Exchange pre-requisites (only) for all the roles. There are other scripts on this folder). Reset computer account at AD (example: via AD Users and Computers) for the affected server. Add the server as a domain joined Active Directory computer, again. Access the server to be recovered, and executed elevated CMD prompt. At installation folder path for Exchange 2010, execute: Setup /m:RecoverServer Reconfigure NLB, CAS Array, customizations for OWA, Certificates (SSL), and etc., as needed. Symptoms persists? YES NO
Time to restore back-up OR contact Microsoft Support If you reached this page... The issue your Exchange is facing is not “regular”, or it is not enough to use the knowledge presented on this document to deal with it. Next steps: If servers are ok, and you just need the data, then use a restore from your backup; Or call the Microsoft Support Team (PSS), to get help from a representative, specialized in your affected product. See the link below, for contact information: Using Microsoft Product Support Services • http://technet.microsoft.com/en-us/library/dd346877.aspx Return