Disclaimer
- This is not a post about avoiding cost cutting.
- Not about paying lip service to cost cutting.
- Not about hostage situations (we’ve come this far, you have to give us more money)
- This is also not aimed at extremely agile organisations that have active executive engagement with delivery teams, they’re the lucky ones.
- It’s also not aimed at organisations that have a “courageous executive” supporting an agile approach to delivery. Again, they’re relatively lucky.
Basic Admin
Determine your team’s annual run rate – most likely it’s the
base cost to employ your team for a year. Can be awkward if you’re a contractor
and have permanent members of staff in your team, but there’s usually a way.
Many organisations have either a median cost figure for an “average employee”
or a mid-band figure for each employee grade. Make sure you factor in holidays.
It is vital that “team” in this context is the full cross
functional team. It makes far less sense to reduce headcount of just one or two
roles. By reducing the team skill capacity “relatively evenly”, you reduce the
chances that you make a fatal cut. The corollary to this, is if your team is
significantly unbalanced when compared to the work they deliver, then consider
a rebalancing exercise first.
It’s pessimistic, but reliable to assume that all the other
“overhead” costs associated with project execution will remain unaffected.
Given the likely conditions, being able to reduce the non-value-add-overheads
would be something outside your sphere of influence.
Convert your cost cutting target into an equivalent team
burn rate. This gives you an indication of how much smaller your team has to be
in order for you to fit within the requested cost envelope.
Your most expensive people are usually more experienced.
Therefore, it’s reasonable to assume that your pared down team will not be as
capable, and therefore will increase the rate that errors are produced. You
have a trade-off decision to make – accept the lower quality output or
sacrifice some of your (reduced) “volume output” to maintain your quality
levels.
The most significant factor (for me) in making this decision, is how long that system / component needs to be changed/developed. For example a single use component that will only be used for 3 months (i.e. disposable) can be generally tolerated with a significantly lower quality than a system that needs to regularly evolve over a period of a decade. Lehman’s Laws of Software Evolution are worth revisiting for inference.
Assuming that your reduced team will have to maintain their system for a long time (the average life of an IT system is about a decade, if this article is to be believed – “Software Lifetime and its Evolution Process over Generations”) then you have no real choice when it comes to quality or output – you have to prioritise quality.
That brings you to your next challenge. How do you
compensate for the fact that your team will simply be not as skilled once you
lose some members? Simply continuing your team’s way of working and hoping for
the best is unlikely to succeed – with the reduction in expertise, your team
will produce more bugs.
Your basic strategy should consist of two strands – to cope
with the increased bug density, and to reduce that skills deficit. To cope with
the bugs, use a suitably balanced combination of tactics:
- detect bugs earlier in the lifecycle,
- reduce the complexity of the bugs that are found,
- reduce the cognitive load required to fix the
bugs, and
- reduce the impact of the bugs that do make it through your
delivery process unnoticed
A word of caution on the reduction of skills deficit strand
– it’s a much slower solution than introducing coping mechanisms for the
increased number of bugs and cannot be relied on as the primary solution
strategy for short-medium term improvements.
When developing your mitigation tactics, it is sensible to avoid as many options as possible that rely on individuals in teams “working harder” or working with “elevated skills” as those are unrealistic. The lever you can exert the most significant change with is team behaviour, specifically team ways of working. It would also be sensible to incorporate working patterns that boost individual learning, as that is an approach to (eventually) reducing that expertise gap.
Detect bugs earlier
All bugs are detected because of an incongruity between what
is observed and what is expected from a model (regardless of whether or not
this model is tangibly identified or implicitly part of the knowledge that your
product owner / subject matter expert / developer / architect etc. has. Each
person / role would have a different model representation and therefore would
be able to detect different bugs.
One strategy for detecting bugs earlier, is to compare
observations to as many of these mental models as soon as a possible, for
example by having product owners or SMEs be actively embedded within your
delivery teams, working alongside the developers daily. The most established
strategy for increasing visibility of the work being done as early as humanly
possible is pairing, especially pairing with different roles (e.g. developer
and analyst, or analyst and tester). Pairing is also one of the most effective mechanisms
available for reducing that skills deficit.
Another strategy for detecting bugs earlier, is to perform
downstream activities sooner (for example production deployments). There are
associated costs – for example the delivery strategy will need to be based on
incremental development where thin slices of “complete” functionality are
built, integrated and deployed. Organisations that are more sequential in
nature (for example, organisations that have a difficult / scary / time
consuming / manual “route-to-live” process) tend to get the biggest benefit
from attempting this, but they’re also the organisations that are the most
afraid to try.
Reducing the overall duration between creating the bug,
detecting it, and fixing it will reduce the cognitive load required to fix the
bug, as the team’s working memory will already be loaded with the appropriate
context. Discovering a bug “a long time” after it was created will require
significantly more mental preparation to re-gain the mental models in play at
the time of creating the bug.
Reduce the complexity of bugs
Given that all bugs are coded, the only fundamentally viable strategy to reduce the complexity of bugs that are found is to write simpler bugs. I find Einstein’s original quote can be helpful to communicate this message:
“It can scarcely be denied that the supreme goal of all theory is to make the irreducible basic elements as simple and as few as possible without having to surrender the adequate representation of a single datum of experience”
Albert Einstein
A scientific approach to analysis can be helpful as it can help reduce the number of “implementation patterns” that exist in your IT solution. That may introduce learning curve challenges, e.g. if your delivery teams are unaccustomed with hypothesis driven development.
Structured techniques such as test driven development can also help, as they help manage the “thinking complexity” of software development, as well as have the side effect of producing tests to be able to continuously monitor your code quality over time (even more helpful when your team’s expertise has been lowered). But watch out for common problems in tests – e.g. https://www.yegor256.com/2018/12/11/unit-testing-anti-patterns.html
But the biggest strategy to use to help reduce the
complexity (and number) of bugs in your system, is to write less code.
Reduce the cognitive load
Employing strategies that reduce the complexity of bugs also
have the side effect of reducing the cognitive load needed to understand and
fix the bug. Additional strategies can include:
Working in smaller blocks would also serve to act as a
limiter for the amount of complexity that can possibly be present in each block.
Work partitioning techniques such as user story decomposition can help.
Techniques that use both visual and auditory inputs are easier to process by people, as the two modes are processed by different channels and use different working memories. Techniques such as Rubber Duck Debugging are a form of a think aloud protocol where the auditory channel can help an individual formulate a hypothesis along a fundamentally different line to what they can see, thereby increasing their effective cognitive ability.
Working in larger groups (for example, see mob programming) can be an extremely effective technique for always maintaining a high degree of cognitive capacity consistently as the overall effect is to smooth the natural peaks and troughs in the cognitive abilities of the individuals (e.g. some people are morning people, others are night owls etc., but there’ll always be someone “firing on all cylinders”). Linus’ Law is a concise articulation – given enough eyeballs, all bugs are shallow.
Reduce the impact of bugs that escape
It’s inevitable that bugs will escape into production. The
final dimension to consider is about recovery. The easier and faster it is to
recover from a production incident, the less severe the effects of the problem.
Recovery time is spent:
- locating the root cause of the fault
- fixing the root cause
- delivering the fix
Locating the fault:
there are two basic (and complementary if required) steps to take. Your triage
and fault finding processes could determine if the bug was introduced as part
of the last release, as well as where specifically in your solution the fault
lies. The first piece of insight could take considerably less time than the
second. If the fault was introduced as part of the last release, one possible
early response could be to roll-back the release, which act to stop further
failures from occurring. A robust fix can then be developed and a new release
planned. This naturally comes with an “opportunity cost” associated with not
having access to the rest of the functionality also present in the pulled
release. This opportunity cost can be virtually eliminated, but it will require
the ability to release every single change independently. Development and
deployment strategies (e.g. using trunk based development and feature toggles)
can greatly reduce the complexity associated with this.
Fixing the root cause:
strategies outlined earlier to reduce the cognitive load and identify bugs
earlier also help here.
Delivering the fix:
This is inherently limited by the speed, flexibility, resilience and degree of
automation of your route-to-live processes. The single biggest change you can
make to drastically improve your delivery processes is to reduce the size of
each release (get as close to release each minor change independently) and
repeat the release processes constantly (I’ve experienced multiple releases a
day into production, even in a public sector context). Attempting to do this
will identify all of the sticking points and problems associated with your
route-to-live processes, and will give you areas to target improvements. It is
almost always helpful to decouple software deployments (technical, automated,
controlled by delivery teams) from business releases (business triggered, business
features are toggled, aligned with wider organisational change programmes) as
you are then able to decouple automation improvements from business change
readiness.
Final Thoughts
These are useful strategies for coping when your delivery teams have to compensate for a reduction in their base capabilities. However, there is nothing fundamentally preventing delivery organisations from just implementing these strategies to increase the effectiveness of the teams that they currently have. Is anything stopping you?