150 likes | 164 Views
Explore disaster recovery strategies & best practices for New York City's IT infrastructure, focusing on data protection, resiliency, and cost considerations. Learn the common pitfalls and steps to ensure successful disaster recovery planning.
E N D
THE CITY OF NEW YORK Disaster Recovery “Protecting City Data” Ron Bergman First Deputy Commissioner Gregory Neuhaus Assistant Commissioner
Disaster Recovery- Business Continuity Planning Items for Review • Review of CITIServ data consolidation improving citywide Disaster Recovery- Business Continuity Planning • Disaster Recovery Strategy for application and technical infrastructure hosted by the New York City Department of Information Technology and Telecommunications • Methodology for plan development • Integration with COOP planning,
CITIServ • Review of CitiServ data consolidation improving citywide Disaster Recovery- Business Continuity Planning • Common data protection at DOITT Data Centers • Improved facilities management • Unify the City’s information technology for the first time • Results in cost savings, improved timeframe in the deployment of applications, improved uptime
NYCitiServ • Server consolidation the expansion of virtualization technology • Expansion of successful use of VM technologies • Expand the use of SUN virtualization • Leverage the mainframe capacity for large scale databases and Linux capability • A consolidated environment will improve the resiliency, security and recovery capability of the entire system • While helping to green the City’s IT infrastructure and make it more energy efficient.
Five Top Reasons DR Plans Fail • 1- Business Services and IT are not linked • What does executive management think IT is capable of delivering versus what IT can actually deliver? • 2- No Disaster Recovery plan has been established • Do we have the redundancy of roles to cover the recovery of key systems? What is the chain of commend during DR operations? How do we invoke DR operations? What level of disaster or downtime is acceptable? What steps are required to resume normal operations? • 3- Operational Issues with Backups • What happens when backups do not work? Has a restorations ever been tested? What is the process if a backup fails in an operational context? • 4- Recovery Goals are Unrealistic • When does the clock start in a disaster? What is the business level tolerance for an outage? What are your real work RTO and RPO? • 5- Disaster Recovery Cost Considerations • What is the cost of testing and maintaining disaster recovery plans? Disaster Recovery solutions cost is very high - is the City willing to build and maintain needed solutions? Is the City able to accept the risk of not developing disaster recovery systems?
Seven Step DR Plan • Seven Step DR Plan- These seven progressive steps are designed to be integrated into each stage of the system development life cycle and are the foundation of the plan • 1- Develop the Disaster Recovery planning policy statement. • 2. Conduct the business impact analysis (BIA) and Gap analysis for all DoITT systems. The BIA and Gap helps to identify and prioritize critical IT systems and components. • 3. Identify preventive controls. Measures taken to reduce the effects of system disruptions can increase system availability. • 4. Develop recovery strategies. Thorough recovery strategies ensure that the system may be recovered quickly and effectively following a disruption. • 5. Develop an IT Disaster recovery plan. The disaster recovery plan should contain detailed guidance and procedures for restoring a damaged system. • 6. Plan testing, training, and exercises. Testing the plan identifies planning gaps, whereas training prepares recovery personnel for plan activation; both activities improve plan effectiveness and overall agency preparedness. • 7. Plan maintenance. The plan should be a living document that is updated regularly to remain current with system enhancements.
Primary Objectives • Disaster Recovery Operational and Capability Improvements • Develop DR Whitepaper and related DR plan for DoITT IT Operations to ensure the protection of City Data and the ability to restore critical systems in a timely manner. • Service Delivery Expectations Analysis • Conduct the business impact analysis (BIA) and Gap analysis for all DoITT IT Operations systems. • Service Delivery Alignment • Link business goals with IT capabilities. The plan has to address capabilities versus expectations; this will be done through Business Impact Analysis and Service Level Classification. • DR Testing, Recovery Support and Business Resumption Coordination • Develop DoITT IT Operations disaster recovery testing methodologies. • Data Center Strengthening, Greening and Rationalization • Assist in making strategic improvements to IT Operation by making improvements to our current data centers. • Data Retention Policy Improvements • Work with business, security and legal staffs to develop policies and operation options related to data retention. • DR Divisional and Unit Responsibility • Since DR is a agency-wide goal with responsibility from many operational and business groups. This goal is related to executive responsibly within DoITT
Protect Your Data Protection of the City’s electronic data is a major aspect of Disaster Recovery • If we lose data it can never be recovered yet if you lose systems, they can rebuild. • What happens when backups do not work? Has a restoration ever been tested? What is the process if a backup fails in an operational context? • Applications and data must be validated trough the recovery of backups to the applications level. The testing of tape backup recovery, including application validation, will be part of the DR testing approach. • Protect Your Data!
Protect ApplicationsThe state of DR Preparedness July 2009 Update application listing and categorization with CIMS • DoITT currently hosts 456 applications. Listing has been cross categorized with CIMS • DOITT host 121 technology services and infrastructure applications. • Overall we host 577 applications and services
Real world RTO and RPO review overall recovery capability • Conducted business impact analysis (BIA) and Gap analysis for all DoITT with systems groups • This analysis give DoITT management detailed visibility into the state of Disaster Recovery solutions for current service offerings along with the actual risk, impact and potential loss to the City of not having DR solutions where they are required. • Goal: Link business goals with IT capabilities. The plan has to address capabilities versus expectations. The key is ranking of applications for recovery. • Goal: Establishment of Citywide application recovery objectives and service prioritization need to be balanced with DR capability and resources
DR Overview • The strategy outlines the direction related to Disaster Recovery for application and technical infrastructure in order to maintain DoITT’s ability to continue service for its clients and deliver IT services • The key aspect of the DOITT DR plan is to link business goals with IT capabilities. The plan has to address capabilities versus expectations; this will be done through Business Impact Analysis and Service Level Classification. • Protection of the City’s electronic data is a major aspect of Disaster Recovery. • We examined DoITT DR preparedness related to data protection, data redundancy, environment readiness and procedural readiness.
Integration with COOP planning • Supporting essential service integration with service desk processes • Service • 311 • NYC.GOV • Mainframe • Exchange- Blackberry • Network - telecom
Integration with COOP planning • Remedy Disaster Recovery is essential due to business functions conducted • Manual business process have been developed as workaround for service desk • Agency applications and systems for essential services is being reviewed
Integration with COOP planning • Agencies have not been listing DOITT essential services on COOP worksheets, services such as network, nyc.gov or 311 • Agencies will need these services to operate • Pandemic planning for COOP will leverage DoITT service, exchange and remote access are examples • Testing of COOP and DR plans is needed to insure operational readiness • Ranking key business applications from COOP will be used in the prioritization of DR systems and solutions
Questions • Questions! • Protect your data!