Part 1 – Determine Recovery Scope and Objectives (a Case Study)
As a BCP/DR software solution provider, we are regularly called on to assist our enterprise customers with preparation, management and enhancement their DR Test and Exercises. This collaboration often provides opportunities to optimize and improve both organizations’ DR Program software implementation and usage.
Recently, we partnered with one of our utility (power distribution) customers. With more than three million consumer households at stake, they have designated disaster preparedness a ‘mission critical’ program. Periodic testing and validation of their documented Disaster Recovery Plans are the only way to certify their DR Program as credible and viable.
Their Board of Directors mandates the IT DR Team to regularly test and report on the IT Service Continuity and Disaster Preparedness of their business datacenter. This includes customer service and support, new connection provisioning, meter reading, plus billing and collections functions. Thus, their Disaster Recovery program includes quarterly table-top drills and an Annual DR Exercise. The scale of each year’s test is driven by compliance and program requirements and is determined by the executive and DR teams involved.
The first step in preparing any exercise is to determine the recovery scope. For this year’s datacenter failover exercise, 57 Tier 1 and 2 service applications where declared to be “impacted”. From there it was determined that 350 infrastructure-dependent assets would be “affected”- including IBM Mainframes, Unix and Linux servers as well as assorted Wintel systems. IBM and 3PAR storage sub-systems and Oracle, SQL, and SAP database components were also involved.
The scale of these annual tests can vary. While the previous year’s test was more extensive, this year’s was limited to failover response testing and recovery of underlying information technology infrastructure only. The exercise would simulate “Production” failover to a backup datacenter. Limited application recovery would be performed with service restoration confirmation, but there would be no end-user application testing.
The exercise would involve more than 270 participants – including Incident Commanders, recovery teams, application owners, and executives, with continuous shift coverage over a 72-hour exercise period.
With the Recovery Scope identified, specific exercise Objectives can be set, giving all participants a clear understanding of their intended actions and goals.
The first Objective was to validate existing DR Plans including verification that Recovery Time Objectives (RTO) were both realistic and achievable. The second was to validate their automated CommandCentre incident management software platform and identify any gaps in recovery plans.
From a high-level perspective, DR Program Goals included compliance, improved exercise efficiency, enhanced industry best practice conformance, and – where possible – reduced slack time in the overall recovery critical path.
The status of all business services would all also be monitored, measured, and managed over the course of the exercise. These included affected applications (DR) and processes (BCP), activated plans, and people and skill resources.
Once the Scope and Objectives had been established, exercise preparation could begin, including identification of all impacted plans to ensure (as much as possible) each is up to date and optimized for the actual exercise. Exercise Preparation will be discussed in detail in Part 2 of this blog series, with Exercise Execution in Part 3, and After Exercise Review in Part 4 to follow.