Cost cutting and agile teams – finding a way to cope

Disclaimer

This is not a post about avoiding cost cutting.
Not about paying lip service to cost cutting.
Not about hostage situations (we’ve come this far, you have to give us more money)
This is also not aimed at extremely agile organisations that have active executive engagement with delivery teams, they’re the lucky ones.
It’s also not aimed at organisations that have a “courageous executive” supporting an agile approach to delivery. Again, they’re relatively lucky.

Basic Admin

Determine your team’s annual run rate – most likely it’s the base cost to employ your team for a year. Can be awkward if you’re a contractor and have permanent members of staff in your team, but there’s usually a way. Many organisations have either a median cost figure for an “average employee” or a mid-band figure for each employee grade. Make sure you factor in holidays.

It is vital that “team” in this context is the full cross functional team. It makes far less sense to reduce headcount of just one or two roles. By reducing the team skill capacity “relatively evenly”, you reduce the chances that you make a fatal cut. The corollary to this, is if your team is significantly unbalanced when compared to the work they deliver, then consider a rebalancing exercise first.

It’s pessimistic, but reliable to assume that all the other “overhead” costs associated with project execution will remain unaffected. Given the likely conditions, being able to reduce the non-value-add-overheads would be something outside your sphere of influence.

Convert your cost cutting target into an equivalent team burn rate. This gives you an indication of how much smaller your team has to be in order for you to fit within the requested cost envelope.

Your most expensive people are usually more experienced. Therefore, it’s reasonable to assume that your pared down team will not be as capable, and therefore will increase the rate that errors are produced. You have a trade-off decision to make – accept the lower quality output or sacrifice some of your (reduced) “volume output” to maintain your quality levels.

The most significant factor (for me) in making this decision, is how long that system / component needs to be changed/developed. For example a single use component that will only be used for 3 months (i.e. disposable) can be generally tolerated with a significantly lower quality than a system that needs to regularly evolve over a period of a decade. Lehman’s Laws of Software Evolution are worth revisiting for inference.

Assuming that your reduced team will have to maintain their system for a long time (the average life of an IT system is about a decade, if this article is to be believed – “Software Lifetime and its Evolution Process over Generations”) then you have no real choice when it comes to quality or output – you have to prioritise quality.

That brings you to your next challenge. How do you compensate for the fact that your team will simply be not as skilled once you lose some members? Simply continuing your team’s way of working and hoping for the best is unlikely to succeed – with the reduction in expertise, your team will produce more bugs.

Your basic strategy should consist of two strands – to cope with the increased bug density, and to reduce that skills deficit. To cope with the bugs, use a suitably balanced combination of tactics:

detect bugs earlier in the lifecycle,
reduce the complexity of the bugs that are found,
reduce the cognitive load required to fix the bugs, and
reduce the impact of the bugs that do make it through your delivery process unnoticed

A word of caution on the reduction of skills deficit strand – it’s a much slower solution than introducing coping mechanisms for the increased number of bugs and cannot be relied on as the primary solution strategy for short-medium term improvements.

When developing your mitigation tactics, it is sensible to avoid as many options as possible that rely on individuals in teams “working harder” or working with “elevated skills” as those are unrealistic. The lever you can exert the most significant change with is team behaviour, specifically team ways of working. It would also be sensible to incorporate working patterns that boost individual learning, as that is an approach to (eventually) reducing that expertise gap.

Detect bugs earlier

All bugs are detected because of an incongruity between what is observed and what is expected from a model (regardless of whether or not this model is tangibly identified or implicitly part of the knowledge that your product owner / subject matter expert / developer / architect etc. has. Each person / role would have a different model representation and therefore would be able to detect different bugs.

One strategy for detecting bugs earlier, is to compare observations to as many of these mental models as soon as a possible, for example by having product owners or SMEs be actively embedded within your delivery teams, working alongside the developers daily. The most established strategy for increasing visibility of the work being done as early as humanly possible is pairing, especially pairing with different roles (e.g. developer and analyst, or analyst and tester). Pairing is also one of the most effective mechanisms available for reducing that skills deficit.

Another strategy for detecting bugs earlier, is to perform downstream activities sooner (for example production deployments). There are associated costs – for example the delivery strategy will need to be based on incremental development where thin slices of “complete” functionality are built, integrated and deployed. Organisations that are more sequential in nature (for example, organisations that have a difficult / scary / time consuming / manual “route-to-live” process) tend to get the biggest benefit from attempting this, but they’re also the organisations that are the most afraid to try.

Reducing the overall duration between creating the bug, detecting it, and fixing it will reduce the cognitive load required to fix the bug, as the team’s working memory will already be loaded with the appropriate context. Discovering a bug “a long time” after it was created will require significantly more mental preparation to re-gain the mental models in play at the time of creating the bug.

Reduce the complexity of bugs

Given that all bugs are coded, the only fundamentally viable strategy to reduce the complexity of bugs that are found is to write simpler bugs. I find Einstein’s original quote can be helpful to communicate this message:

“It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience”
Albert Einstein

A scientific approach to analysis can be helpful as it can help reduce the number of “implementation patterns” that exist in your IT solution. That may introduce learning curve challenges, e.g. if your delivery teams are unaccustomed with hypothesis driven development.

Structured techniques such as test driven development can also help, as they help manage the “thinking complexity” of software development, as well as have the side effect of producing tests to be able to continuously monitor your code quality over time (even more helpful when your team’s expertise has been lowered). But watch out for common problems in tests – e.g. https://www.yegor256.com/2018/12/11/unit-testing-anti-patterns.html

But the biggest strategy to use to help reduce the complexity (and number) of bugs in your system, is to write less code.

Reduce the cognitive load

Employing strategies that reduce the complexity of bugs also have the side effect of reducing the cognitive load needed to understand and fix the bug. Additional strategies can include:

Working in smaller blocks would also serve to act as a limiter for the amount of complexity that can possibly be present in each block. Work partitioning techniques such as user story decomposition can help.

Techniques that use both visual and auditory inputs are easier to process by people, as the two modes are processed by different channels and use different working memories. Techniques such as Rubber Duck Debugging are a form of a think aloud protocol where the auditory channel can help an individual formulate a hypothesis along a fundamentally different line to what they can see, thereby increasing their effective cognitive ability.

Working in larger groups (for example, see mob programming) can be an extremely effective technique for always maintaining a high degree of cognitive capacity consistently as the overall effect is to smooth the natural peaks and troughs in the cognitive abilities of the individuals (e.g. some people are morning people, others are night owls etc., but there’ll always be someone “firing on all cylinders”). Linus’ Law is a concise articulation – given enough eyeballs, all bugs are shallow.

Reduce the impact of bugs that escape

It’s inevitable that bugs will escape into production. The final dimension to consider is about recovery. The easier and faster it is to recover from a production incident, the less severe the effects of the problem. Recovery time is spent:

locating the root cause of the fault
fixing the root cause
delivering the fix

Locating the fault: there are two basic (and complementary if required) steps to take. Your triage and fault finding processes could determine if the bug was introduced as part of the last release, as well as where specifically in your solution the fault lies. The first piece of insight could take considerably less time than the second. If the fault was introduced as part of the last release, one possible early response could be to roll-back the release, which act to stop further failures from occurring. A robust fix can then be developed and a new release planned. This naturally comes with an “opportunity cost” associated with not having access to the rest of the functionality also present in the pulled release. This opportunity cost can be virtually eliminated, but it will require the ability to release every single change independently. Development and deployment strategies (e.g. using trunk based development and feature toggles) can greatly reduce the complexity associated with this.

Fixing the root cause: strategies outlined earlier to reduce the cognitive load and identify bugs earlier also help here.

Delivering the fix: This is inherently limited by the speed, flexibility, resilience and degree of automation of your route-to-live processes. The single biggest change you can make to drastically improve your delivery processes is to reduce the size of each release (get as close to release each minor change independently) and repeat the release processes constantly (I’ve experienced multiple releases a day into production, even in a public sector context). Attempting to do this will identify all of the sticking points and problems associated with your route-to-live processes, and will give you areas to target improvements. It is almost always helpful to decouple software deployments (technical, automated, controlled by delivery teams) from business releases (business triggered, business features are toggled, aligned with wider organisational change programmes) as you are then able to decouple automation improvements from business change readiness.

Final Thoughts

These are useful strategies for coping when your delivery teams have to compensate for a reduction in their base capabilities. However, there is nothing fundamentally preventing delivery organisations from just implementing these strategies to increase the effectiveness of the teams that they currently have. Is anything stopping you?