Insights from a DR Test

Recently, eBRP was invited to participate as observers of an Annual DR test of a southwest energy Company – conducting its first Disaster Recovery test of its new Backup Data Centre. Our role included assisting and advising client teams on leveraging features of our eBRP Suite (which holds all their planning and Plan information) to monitor, measure and manage the test.

DR History

Previously, the Company contracted with a ‘warm site’ provider to practice “bare metal” recovery. Their plans were maintained in a legacy software system, and annual DR tests were limited to only a select few applications.

After a 3-year backup data center construction project, the Company conducted its first failover test of all the 135 applications classified as “Tier 0” – those with RTO’s of up to 120 hours. Associated Plans were migrated to eBRP Suite to enable use of its CommandCentre EOC tool to monitor and manage recovery tasks.

Stakeholders

Stakeholders involved in this Annual DR test included:

·      Infrastructure Restoration Teams

·      Application Validation Teams

·      Client (Application) Testers

·      Incident Commanders & Senior Managers

More than 200 staff members were engaged in the exercise over the course of the test period – which was originally estimated to require 120 hours.  This was the first time this DR strategy had ever been tested.

Test Results

Over 240 servers including mainframe, AIX, Linux and Windows servers along with DB2, Oracles, Hana and SQL databases were in the test scope. After an 8am kickoff on Tuesday October 18, Infrastructure recovery proceeded smoothly, and scripted Server restoration progressed according to planning assumptions.

Tier 0 apps with 24 hr. RTO’s were up and running (ready for end-user testing) within the first 6 hours. In all, more than 135 applications were restored, verified and presented for testing within 60 hours.

eBRP’s CommandCentre was used for its intended purposes by Teams and groups – Incident Managers, Recovery Teams, Client Testers – as well as Senior Managers monitoring the progress of the test.

Lessons Learned

A test of this scope – recovery of more than 100 critical applications, with 200+ recovery team members working round the clock in 8 hour shifts – provided insights that were very different for the assumptions made during the planning stages:

·      Incident Managers and Senior Managers prefer a ’35,000ft.’ view of progress. They seldom wish to drill down to greater details

·      Once the recovery is underway, RTO, RPO and other BIA parameters are largely irrelevant

·      A mass notification tool, integrated with eBRP, was effective for Polling, Periodic Updates and providing relevant instructions to teams during recovery

·      Understanding dependencies (on infrastructure, other apps, etc.) is absolutely essential for ensuring efficient workflow between dependent teams.

·      Plans are important, but to monitor & manage the incidents, Incident Commanders & Senior Managers depended on high-level Dashboards that were refreshed in real-time.

The entire exercise was managed and monitored using eBRP Suite. The test concluded with all Application restored and Client Tests validated in less than half the original timeframe. Unlike earlier tests, there was no need for Conference Bridges or yellow Post-It notes lining the EOC walls.

SHARE:
Ramesh Warrier

Ramesh Warrier

eBRP Founder and Chief Designer of eBRP Suite, Ramesh is a proponent of constant change, a visionary who believes that the practice of Business Continuity can deliver improved operational efficiency. Ramesh, B.Tech in Electrical Engineering, has nearly 30 years experience in Business & Technology roles. His thoughts are expressed in blogs, white-papers, frequent webcasts and speaking engagements at industry conferences.

Related Posts

A Toolkit to Build Enterprise Resiliency

A Toolkit to Build Enterprise Resil...

A well-rounded Enterprise Resiliency Toolkit (𝗧𝗼𝗼𝗹𝗸𝗶𝘁) would provide key tools…
Enterprise Resiliency: Navigating Through Disruptions

Enterprise Resiliency: Navigating T...

In today’s threat landscape, the ability of an organization to…
Orchestrating BC/DR Testing: Virtual – Emergency Operations Centers

Orchestrating BC/DR Testing: Virtua...

  Enhancing Planning and Logistics Management  Coordinating BC/DR tests involves…
Insights into creating a successful Disaster Recovery Test – Part 2: Preparation

Insights into creating a successful...

Insights into creating a successful Disaster Recovery exercise – Part 1: Objectives

Insights into creating a successful...

Aligning Cyber Incident Response Planning with Your BC/DR Program

Aligning Cyber Incident Response Pl...

Cyber disruptions – and their impact on both reputations and…
What Can You Do when your BCM software Relationship Falls Apart

What Can You Do when your BCM softw...

“This isn’t working.”  “I’ve changed.”  “I don’t see a future…
Aligning BC/DR to CSIRP Challenges

Aligning BC/DR to CSIRP Challenges

The immediate reaction to a cyber-security incident is the FUD…
Technology Modeling – the eBRP Way

Technology Modeling - the eBRP Way

Definition: Technology modeling is a point-in-time snapshot of an Enterprise’s…
eBIA – The eBRP Way

eBIA - The eBRP Way

Definition: A Business Impact Analysis (BIA) is the cornerstone of…