310 likes | 394 Views
Elizabeth Chamberlain Mike Dickinson. Buckinghamshire Chilterns University College. Disaster Planning. Or “Don’t panic Captain Mainwaring!”. Disaster Planning. Unix Sun Solaris System + Oracle Dbase Live & Test/Backup servers 200,000 items, 3 Branches (½ hour apart)
E N D
Elizabeth ChamberlainMike Dickinson Buckinghamshire Chilterns University College
Disaster Planning Or “Don’t panic Captain Mainwaring!”
Disaster Planning • Unix Sun Solaris System + Oracle Dbase Live & Test/Backup servers 200,000 items, 3 Branches (½ hour apart) • Reasons for a Disaster Recovery plan • Disasters we have (nearly) had! • Thoughts on the backup process • Process of recovery/restore • Potential banana skins • Open forum
Reasons for Disaster Planning • Business continuity – the disaster always happens in the wrong place at the wrong time! • To avoid headless chicken condition • Risk assessment is being carried out throughout the organisation • Validation from external bodies • Previous ‘disasters’ or ‘near disasters’ • To improve communication to users
Disasters we have (nearly) had! • 3 July 2003 – Partial power failure in main machine room. • 10 July 2003 – Complete air conditioning failure in main machine room • 26 August 2003 – Nachi virus struck BCUC • 6 January 2004 – Complete power failure
Implications of these events • BCUC cut off from the outside world (some for several days) • Key contact & address data not available (mainly during power failure events) • Need to run key business processes – e.g. payroll, BACS run • General inconvenience
Thoughts on the backup process • Do we need to have a system?
Thoughts on the backup process • Do we need to have a system? • How long will the server be out of action
Thoughts on the backup process • Do we need to have a system? • How long will the server be out of action • Understand the time required (test)
Thoughts on the backup process • Do we need to have a system? • How long will the server be out of action • Understand the time required (test) • Understand your backup regime
Thoughts on the backup process • Do we need to have a system? • How long will the server be out of action • Understand the time required (test) • Understand your backup regime • Plan the detail • Recovery Process
Recovery Process • Unscheduled
Recovery Process • Unscheduled • Put users on Standalone
Recovery Process • Unscheduled • Put users on Standalone • Retrieve most recent full backup tape
Recovery Process • Unscheduled • Put users on Standalone • Retrieve most recent full backup tape • Restore data to backup server
Recovery Process • Unscheduled • Put users on Standalone • Retrieve most recent full backup tape • Restore data to backup server • Modify server specific settings (e.g. iLink url, Opac urls, Wf config, Self-issue)
Recovery Process • Unscheduled • Put users on Standalone • Retrieve most recent full backup tape • Restore data to backup server • Modify server specific settings (e.g. iLink url, Opac urls, Wf config, Self-issue) • Run missed reports + other actions
Recovery Process • Unscheduled • Put users on Standalone • Retrieve most recent full backup tape • Restore data to backup server • Modify server specific settings (e.g. iLink url, Opac urls, Wf config, Self-issue) • Run missed reports + other actions • Test
Recovery Process • Unscheduled • Put users on Standalone • Retrieve most recent full backup tape • Restore data to backup server • Modify server specific settings (e.g. iLink url, Opac urls, Wf config, Self-issue) • Run missed reports + other actions • Test • Upload Standalone transactions
Recovery Process • Unscheduled • Put users on Standalone • Retrieve most recent full backup tape • Restore data to backup server • Modify server specific settings (e.g. iLink url, Opac urls, Wf config, Self-issue) • Run missed reports + other actions • Test • Upload Standalone transactions • Return to “normal” operation
Approximate Timings • Standalone/retrieve backup tape ½ hour • Restore data to backup server 2-4 hrs • Modify settings & run reports ½ -2 hrs • Testing ½ hour • Uploading standalone data ¼ hour Total 3 ¾ - 7 ¼ hours
Restore Process • Scheduled
Restore Process • Scheduled • Stop all activities – users on Standalone
Restore Process • Scheduled • Stop all activities – users on Standalone • Run full backup
Restore Process • Scheduled • Stop all activities – users on Standalone • Run full backup • Restore data to live server
Restore Process • Scheduled • Stop all activities – users on Standalone • Run full backup • Restore data to live server • Modify server specific settings back
Restore Process • Scheduled • Stop all activities – users on Standalone • Run full backup • Restore data to live server • Modify server specific settings back • Test
Restore Process • Scheduled • Stop all activities – users on Standalone • Run full backup • Restore data to live server • Modify server specific settings back • Test • Upload Standalone transactions
Restore Process • Scheduled • Stop all activities – users on Standalone • Run full backup • Restore data to live server • Modify server specific settings back • Test • Upload Standalone transactions • Return to “normal” operation
Approximate Timings • Run full backup 1 hour • Restore data to live server 2-4 hrs • Modify settings ½ hour • Testing ½ hour • Uploading standalone data ¼ hour Total 4 ¼ - 6 ¼ hours
Potential banana skins • WorkFlows configuration • Opacs, Self-issue & other equipment • Communicate with users (live/backup) • Test & document then test & document • Report suspension