Security process improvement or how I saved an org > £1M/year

This situation I’m referring to happened about 7 years ago. I tell it not so much because of ‘the thing itself’ but as a tale of the type of waste that is commonly found in big organisations, that ends up being invisible until a value chain is fully understood and its constraints and bottlenecks identified and addressed.

Due to someone leaving the organisation, I found myself owning the Firewall Change Management process (a SOX control). On the handover, I’m told that the process works, there are few escalations and I get properly introduced to the local Ops team who owns most of the operational aspects, and the remote team in India who did most of the actual non-service impacting changes.

After a few weeks into it, and almost a week thereafter, some “strange” things start happening. I get an escalation for a firewall change that had been requested about 40 days before and I could relate to a transformation that was one of the key projects in the organisation, and at a cost of over £2.000

I remember thinking… 40 days ? For a firewall change ? What could possibly take that long ? £2000 ? How does that work ?And I felt something was off, that this wasn’t a one-off thing as had heard rumours that the process was slow but people had learned to engage with it early to try and avoid delivery impact.

I first mentioned it to the CISO, who had other priorities, and could’t really get the mandate to focus on doing something about it, so the first problem I had was to find a way to be able to dedicate time to it.

So instead of hiding behind my email and desk (as unfortunately still characterises too many people in the security industry), I offered coffee to the Project Manager who was affected by that instance and also scrubbed emails to see escalations in the past 3 months and paid them coffee too.

The perception was unanimous on the 5 people I talked with. Slow, unclear, few instances of not being right the first time, escalations a long time to address etc. 2 of the people I spoke with were quite Senior in the organisation, so I requested they send me an email with the summary of their concerns so that hopefully I could get management attention to focus on it, and with very quick succession of 5 people providing feedback by email, management attention I got. Here’s what I then found.

Finding and fixing the constraint

We had a network design based on VRF (Virtual Routing and Forwarding) instances and quite a big number of them for segregation purposes.

The process assumed that all changes were driven by a Capital project (Capex) so it was mandatory that a CapEx code be associated with every change request, though a significant spend was done on OpEx projects. Because the assumption had been made that all changes are CapEx driven, that also led to the assumption that every firewall change requires a design document. So what ended up happening was that one of our key IT outsourced suppliers had a team of 4 people just doing small design docs to document simple firewall changes, but they weren’t 100% dedicated to it so hand-offs were also delaying the process. The bulk of the changes requested as BAU were pretty harmless stuff like connecting old legacy systems to our latest monitoring platforms and things like that.

This part alone was adding on average a 3 week delay from the request to actually get implementation.

Over the course of the following month, had a few other conversations with the India-based team, where they were sending requests back because the designers were trying to route traffic through the wrong VRFs and the team was catching that early on and sending the requests back to the designers.

This made absolutely no sense in my head. We were delaying the business for didn’t seem like a good reason at all, and on top of that my outsourced team was still having to pick up mistakes and tell them where they should be routing it through.

So I had to find a way to improve this, so here’s what I did:

Defined guidelines for what could be approved as a BAU request
Dropped the requirement to have design documents for BAU requests
Defined specific heuristics about what cannot be approved as BAU request
Change the tickets flow for BAU requests to go directly to the India-based team and they’d work directly with the internal stakeholders if clarifications were required

This was also accompanied by having the local Ops team do monthly spot-checks on quality of changes made by the India-based team to confirm unauthorised or incoherent changes weren’t being introduced.

Addressing other bottlenecks

There were a few other bottlenecks as well. Part of the process for the CapEx projects relied on the Firewall teams sending email to the security consultant assigned to the project to get email approval before making the changes.

In theory, nothing too wrong about it but in practice this was often causing unnecessary delays with the security teams not replying to emails in a timely manner and then MY process getting a bad rep as a result. I wasn’t happy about that. Now, I could address it in one of two ways, I thought.

I can either ask the India-based team to start copying me on those, and then it’s me who need to be chasing my colleagues for answers and that would drag my productivity down too on all else that was my responsibility, or I could find a way of eliminating that part of the process entirely.

From an SDLC perspective, the changes to Firewalls could only come after the Design had been approved, and there was a specific section in the LLD (Low Level Design) which had tables with the required connectivity. In our Document management system, we could easily when the security consultant had approved the design as there was a record kept.

So I changed the India-based team process to, instead of sending out emails and waiting for responses, to check that the design had been approved by the security team, validate that the flow being requested was in the LLD and proceed from there.

Conclusion

These changes had a massive impact in the performance of this process. The outsourced IT supplier ended up disbanding those 4 people to other work as they were no longer doing what I’d argue were useless design documents.

Financially, and looking at the numbers, we identified the savings were expected to be around £1.2M / year and that’s not even counting productivity improvements, as most requests started being implemented within a week or two of request, which was completely amazing and unheard of in the organisation. That was my first distinction by a C-level exec 🙂

Most organisations (especially typically Waterfall based Governance ones) are generally full of these inefficiencies, and security already has a bad rep enough not to make it even worse with badly designed processes.

It may not be as exciting as hacking a server, or engineering a really clever security check on our pipelines, but this is what reducing pain and friction for our business colleagues means.