160 likes | 349 Views
ITCP / ITDR Audit Program/ Test Recovery Checklist. ROBERT K. DUGGAN, CPA, CIA, CISA. Why do we need to test the ITDR / ITCP?. ITCP/ DRP often doesn’t work. We discover it doesn’t work when we really need it to work. We pay a fortune to maintain it. (Tier 4-6- $400K- $2M and up!)
E N D
ITCP / ITDR Audit Program/ Test Recovery Checklist ROBERT K. DUGGAN, CPA, CIA, CISA
Why do we need to test the ITDR / ITCP? • ITCP/ DRP often doesn’t work. • We discover it doesn’t work when we really need it to work. • We pay a fortune to maintain it. (Tier 4-6- $400K-$2M and up!) • DR test recoveries are fun!
Tiers of Configuration • IBM sets Tiers 1-6 for CICS operating on z/OS • Based on configuration - Tiers 1-3 being 1 week to >24 hours recovery time • Tiers 4-6 being <24 hours (large manufacturers/distributors with continuous processing needs and low downtime tolerance to business to instantaneous (Tier 6- banks- 0 downtime tolerance) (see IBM .com for more information) • Today’s example is on a Tier 4 Scenario for medium to large organizations with 24 hour RTO requirement for critical applications (If you have a mainframe you most likely need Tier 3 up) • < 24 hour recovery of critical platforms and applications – key success factors and evaluation steps are similar for the tiers
Tiers of Configuration • Determined by Business Impact Analysis and Risk Assessment • RTO / RPO • Recovery of critical platforms and applications – regardless of tier or platform, key success factors and evaluation steps are similar for all tiers . Configuration and RTO changes.
Today : 3 Levels of Assessment of the ITDR • Walkthru -“Tabletop”- Scenario with roles and responsibilities • Functional Exercise – Verify the effectiveness of the backup by platform • Off-Site Test Restore – Verify the effectiveness of the IT DR plan offsite at the test center
BCP/ITDR Key Concept: Two different things, but: ITDR and BCP are severely impaired without each other.
Walkthru / Tabletop • Should occur well before the offsite test • Include vendor team • Follow up process with platform owners/DR team and vendor team to resolve issues noted prior to actual test restore • Audit interviews platform support teams, IT Director, DR Manager assigned as part of planning to get an understanding of objectives and where the process is on an evolutionary scale
Major Gaps- DRP Walkthru • Call tree notification system dysfunctional / not at vendor, call trees incomplete or not defined • Persons who can declare not defined or poorly separated (or the wrong people) – vendor cannot take action under contractual terms • Support teams not defined / backups for key members • Approval process for changes to DR Documents • DR Documents not current and at vendor/on secure website • Vendor in same geographic area
Major Gaps- DRP Walkthru • Step by step instructions for platform owner / vendor operators are not crystal clear • No clear assignment of responsibilities or documented procedures for key platform owners • No clear assignment of responsibility for vendor personnel or appropriate training on platforms • Backups for key personnel not defined • Business impact analysis and risk assessment not current/tier of recovery is insufficient- Example: Distributor switch from call center to web application/proprietary remote order entry system
Major Gaps- Functional Exercise- Test Recovery • Vendor personnel or backup recovery personnel cannot restore the system • Port mapping / system documentation not complete / up to date • Insufficient remote software / hardware support level • Vendor hardware is insufficient • Insufficient procedures / lack of clean updated scripts • Poorly trained recovery personnel
Major Gaps- Functional Exercise- Test Recovery • Backup not really effective- verify successful recovery of each platform using a checklist and document verification method (system, volume information in header screens). PS - Don’t ask for screenshots in the middle of a DR test. Just catch platform, LPAR, times, and volume information – observe/confirm effective validation. • Application recovery not verified during the 24 hour test/inaccurate RTO • Inaccurate system documentation leads to failure to meet RTO • Port mapping is inaccurate /not maintained properly by hardware support
Major Gaps- Functional Exercise- Test Recovery • Restore personnel cannot follow scripts without assistance from the company platform team • Test results not verified by DR Test Manager/DR Manager or test leader is not independent/does not rotate by test • Teams do not complete verification checklist or keep testing notes- it is an evolving process that needs to build • Teams do not update DR Instructions following test restore for lessons learned- expensive process- should have a post restore review with follow up task list • Teams do not accurately capture RT/RP , evaluate against true RTO/RPO by platform and application
Some Resources • www.searchdisasterrecovery.com • www.IBM.com
Thank You Be sure to find me on Linked-In