Test the Emergency Procedures

Having company procedures for everything is nice. As a testing professional, you have unique skills to bring to the table in trying and testing the procedures - what's the worst that could happen?

Test the Emergency Procedures
Photo by Karsten Wรผrth / Unsplash

Having company procedures for everything is nice. Sure, you can have procedures for when there is a system outage, for developer access to pre-prod environments and perhaps even a plan for if key staff are suddenly off the grid. But do you test the emergency procedures? Or do you wait for the emergencies to happen for real?

๐Ÿ’ก
Yesterday evening there was a power outage in the area of our office. No big deal for us, until the next day we arrived at the office with no Internet to work from. It turned out the fuses had to be reset - but the biggest challenge was to find someone locally with access to the fuses in the utility room. We had a procedure to call remote support. But remote support had a major flaw: onsite access would be required for power outages.

If your work is for the EU in any of the following sectors: energy and utility, transportation, banking, financials, healthcare sector, supply and digital infrastructure one of the CEO's worries is towards the coming EU regulations on cyber security and system resilience. One of the compliance requirements is about reporting security incidents. There is potentially a huge fine if security incidents aren't reported to the authorities within 24 hours. Similar to what is known with GDPR.

The usual reaction is from many companies to to write yet another company procedure, where they envision how things work out bad things should happen. And it's not just about P1 functional incidents in Prod - it's also about having procedures in place for power outages, data centre breakdowns and environmental disasters. These things happen - even for cloud solutions.

As a testing professional, you have to be aware of these coming compliance regulations so that you can ask the right questions for the system under test:

  • Can it be re-established fully within a reasonable time?
  • Is access to the source code and other credentials protected?
  • How are company user profiles on SaaS solutions handled - (including testing tools in the cloud)?
  • Are there solutions in place to manage company laptops and phones?
  • In approval systems, is there a process for the delegation of powers

As a testing professional, you have unique skills to bring to the table in trying and testing the procedures - what's the worst that could happen?

Given the EU regulations, many of these items are no longer a matter of risk appetite. The considerations are required. But there is a recognizable challenge in how often we want to be compliant. How often would the (compliance) requirements need testing? You can help frame the approach by suggesting one of these forms of test:

  • with a reference to a procedure
  • with the performance of a desk-top exercise/dry-run
  • with a recurring effort
  • with an automated/continuous solution

I often think of this as being a scale from something unique, to a pet project and then over to an industrialized concept that eventually becomes trivial. Read more about the concept here - under Evolution Characteristics:

Landscape
A map of physical terrain is visual, specific to the battle at hand, and includes the position of troops and obstacles relative to an anchor (magnetic North). A map of a competitive business Landscape (a Wardley Map) is also visual and context-specific, but instead of magnetic North, the anchor is t

I have experienced a "business continuity test" being run by an experienced team that usually collaborates on emergencies - but I have also seen the C-suite fumble in testing alternate communication platforms if their primary MS Teams solution is down.

It's no longer optional for the C-suite to establish company emergency procedures - and it's up to us to help them be confident that their thing works. As always.