270 likes | 585 Views
Catastrophic Hardware Failure & Recovery with Exchange Server 2003. Eileen Brown (eileenb@microsoft.com) IT Evangelist Microsoft UK http://blogs.msdn.com/eileen_brown. Topics. What’s new in Exchange 2003 and Windows 2003 Disaster Recovery Questionnaire
E N D
Catastrophic Hardware Failure & Recovery with Exchange Server 2003 Eileen Brown (eileenb@microsoft.com) IT Evangelist Microsoft UK http://blogs.msdn.com/eileen_brown
Topics • What’s new in Exchange 2003 and Windows 2003 • Disaster Recovery Questionnaire • Active Directory Overview and Disaster Recovery • Exchange 2003 Overview and Disaster Recovery • Database Disaster Recovery
What’s New In Exchange 2003 • Database snapshot through Volume Shadow Copy Services • Recovery Storage Group • RPC/HTTP support for Outlook 2003 • IPSec support between front-ends and back-end clusters • IIS6 runs in Dedicated Mode • Clustering
Active Directory Database • Ntds.dit – the database • Edbxxxxx.log – transaction logs • Edb.chk – checkpoint file • Res1.log and Res2.log – reserved log files • Logs are of fixed size (10mb for AD) • Three categories of directory data are replicated between domain controllers: • Domain data (accounts…) • Configuration data (list of domains…) • Schema data (definition of all objects…)
Active Directory Backup • System State Components: • System Start-up Files (boot files) • System registry • Class registration database of COM+ • SYSVOL • What Is A Good Backup? • System State, system disk contents, and the SYSVOL folder • Consider tombstone age set in Active Directory • Default is 60 days • If data older than the tombstone lifetime - restore disallowed • Backup data from a DC can only be used to restore that DC
Types Of Disaster • Determine the type of disaster • Database corruption • Damaged disks • DC hardware failure • Software failure – server cannot boot • Data corruption • Accidentally deleted object from directory • Methods to restore Windows 2003 DC: • Re-installation • Backup
Restore Through Re-Installation • New DC receives the same name as failed DC: • Remove the ntdsDSA object of the failed DC using ntdsutil • Use ntdsutil “metadata cleanup” command • connect to the remote DC • remove orphaned DC
Restore From Backup • Non-Authoritative Restore • Default method for the restoration of Active Directory • DC is then updated using normal replication techniques • Authoritative Restore • ntdsutil
Authoritative Restore • Follow non-authoritative restore before initiation • object attributes version number Incremented • entire directory • subtree • individual object • Used when human error is involved • Accidentally deleted a number of objects which cannot be recreated easily
Recovering A Global Catalog Server • Restore from backup or: • Add additional GC • Create branch office replica from media - dcpromo /adv • Restore GC onto different hardware - issues • Different HALs • Incompatible Boot.ini file • Different network or video cards
AD Forest Recovery - High Level Steps • Identify single DC for restore • Shut down ALL DC’s • Recover first DC in root domain • 1. Primary SYSVOL restore, disable GC flag • 2. Configure DNS • 3. Raise value of RID pool by 100,000 • cn=RID Manager$,cn=System,dc=<domain name> • 4. Seize all (FSMO) roles (ntdsutil) • 5. Clean metadata of ALL DC’s in the root (ntdsutil)
AD Forest Recovery - High Level Steps • Recover FIRST DC in theroot domain (cont.) • 6. Delete server and computer objects of all other DC • 7. Reset the computer account of the DC twice (netdom) • 8. Reset the krbtgt password twice (ADUC) • 9. Reset the trust password twice (netdom) • Restore FIRST DC in each other remaining domains • Primary SYSVOL restore for domain • Same steps as previously (domain wide) • Enable GC flag • DO FRESH BACKUP • Install other DC’s using dcpromo
AD Forest Recovery - High Level Steps • White paper http://download.microsoft.com/download/win2000srv/Utility/1.001/NT5/EN-US/forestrecovery.exe • AD Fast recovery (VSS) – white paper available
Where Is Exchange Information Stored? • Registry settings and metabase • System state backup • AD Directory Objects store “Recipient” information • Users, Groups, and Contacts. • Replicated to GCs • Most Exchange information placed on existing objects are replicated between Global Catalogs • AD Configuration • Exchange System Objects • Public Folder Directory entries • Active Directory Connector (ADC) settings
Levels Of Disaster Recovery • Restoring mailboxes • Recovery Storage Group / Separate server / 3rd party backup utility • Restoring one or more Exchange databases • Backup software • Restoring multiple databases - single storage group • Backup software • Complete disaster - full server recoveries
Move Exchange To New Hardware(Exchange 2003 = GC) • If server is a domain controller: • Deletion of computer account / NTDS Settings Object • DCPROMO /FORCEREMOVAL – “NEW” • Keeping the same server name • Take existing Exchange 2003 computer offline • Reset existing Exchange 2003 computer account • Bring the new computer online using same name • Log on using Exchange 2003 Full Administrator account • Exchange 2003 Setup /disasterrecovery • Mount stores - check client connectivity and mail flow.
Using Exchange 2003 Stand-By Recovery Server • What you need • System State backup • C:\Windows folder backup • Exchange 2003 database backups • Steps to recover • Start stand-by server • Restore %SystemRoot% folder and System State • Run Exchange 2003 setup in disaster recovery mode • Restore databases • Recovery Using Images • Drive signature issue prevents logon after recovery • Fix using Q249321 and Q223188
Recovery Storage Group • RSG per Server/ Information Store • Restore mailbox DBs from same SG • Restore SG/DBs from same AG • User mailboxes remain disconnected • Only MAPI protocol supported • Restores default into RSG • Active/Passive one restore storage group per EVS • ONE recovery storage group per cluster supported
Recovery Of Other Exchange 2003 Services • Connectors • Lotus Notes • Novell GroupWise • Exchange Calendar Connector • Custom OWA • Clusters • Volume Mount Points • Majority Node Set (MNS) Clusters • Resource Kit clusdiag tool
Exchange 2003 Clustering • What to back up • Cluster Administrative software • Quorum • System State • Exchange 2003 Server Cluster Disaster Recovery types • Recover shared disk resource (Clusdb – Chkxxx.tmp Q224999) • Restore Quorum Resource • Replace a damaged node • Restore an entire Exchange 2003 cluster • Majority Node Set (MNS) Cluster, ASR for cluster • Windows 2000 to Windows Server 2003 rolling upgrades supported • Support for Mount Points
ASR For Clusters • Automated System Recovery – ASR can completely restore a cluster in a variety of scenarios, including • damaged or missing system files • complete OS reinstallation due to hardware failure • a damaged Cluster database, and • changed disk signatures (including shared)
Removing orphaned Exchange Server • Active Directory Sites and Services snap-in • Services: Microsoft Exchange: organisation_name:Administrative Groups: Servers • Delete same named server object • If cluster is gone you cannot delete Exchange Virtual Server resources from AD • Bind to DC using LDP: • Configuration\Services\Microsoft Exchange\Organization\Administrative Group\Servers • Right click: Delete orphan EVS entries • No option of Disaster Recovery Setup for EVS
Logical Versus Physical Corruption • Three layers of corruption that can occur • Page level • ESE level • Store level • To remove corruption • Restore an uncorrupted backup of the database • Repair the database • Expunge the corrupted pages from the database • Salvage data and generate a new database
Errors 1018 and 1019 • Error 1018: JET_errReadVerifyFailure • Bad checksum / Wrong page number • Hardware / Firmware • File system corruption • How serious are 1018 Errors? • During normal operation (somewhat serious) • During startup (likely fatal) • During backup (may be minor) • Error 1019: JET_errPageNotInitialized • What causes Error 1019? • Special case of error 1018 (page is replaced with zeroes) • Bad page links
Errors 1022 and 1216 • Error 1022: JET_errDiskIO • Disk I/O failure • File damage or truncation • File locked by another process • Anti-virus software • Error 1216 (Q296843) files in the database's running set are missing or have been replaced • When storage group starts system analyses header information • If logs are missing: • Restore the database from backup • Repair the database by using • ESEUTIL /P followed by • ESEUTIL /D and • ISINTEG -fix • Q296843 – more details
Conclusion • Review your disaster recovery plan when upgrading / deploying Exchange 2000/2003 • Backup all data needed for full recovery • Verify disaster recovery and restore plans through drills • ReadExchange 2003 mailbox and disaster recovery whitepapers regularly • Audit your Best Practices • Request Microsoft PSS Operations Assessment
© 2004 Microsoft Corporation. All rights reserved. This presentation is for informational purposes only.MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS SUMMARY.