On Tech

Tag: Risk (Page 1 of 2)

The maintenance mode myth

“Over the years, I’ve worked with many organisations who transition live software services into an operations team for maintenance mode. There’s usually talk of being feature complete, of costs needing to come under control, and the operations team being the right people for BAU work. 

It’s all a myth. You’re never feature complete, you’re not measuring the cost of delay, and you’re expecting your operations team to preserve throughput, reliability, and quality on a shoestring budget.

You can ignore opportunity costs, but opportunity costs won’t ignore you.”

Steve Smith

Introduction

Maintenance mode is when a digital service is deemed to be feature complete, and transitioned into BAU maintenance work. Feature development is stopped, and only fixes and security patches are implemented. This usually involves a delivery team handing over their digital service to an operations team, and then the delivery team is disbanded.

Maintenance mode is everywhere that IT as a Cost Centre can be found. It is usually implemented by teams handing over their digital services to the operations team upon feature completion, and then the teams are disbanded. This happens with the Ops Run It operating model, and with You Build It You Run It as well. Its ubiquity can be traced to a myth: 

Maintenance mode by an operations team preserves the same protection for the same financial exposure

This is folklore. Maintenance mode by your operations team might produce lower run costs, but it increases the risk of revenue losses from stagnant features, operational costs from availability issues, and reputational damage from security incidents.

Imagine a retailer DIYers.com, with multiple digital services in multiple product domains. The product teams use You Build It You Run It, and have achieved their Continuous Delivery target measure of daily deployments. There is a high standard of quality and reliability, with incidents rapidly resolved by on-call product team engineers.

DIYers.com digital services are put into maintenance mode with the operations team after three months of live traffic. Product teams are disbanded, and engineers move into newer teams. There is an expected decrease in throughput, from daily to monthly deployments. However, there is also an unexpected decrease in quality and reliability. The operations team handles a higher number of incidents, and takes longer to resolve them than the product teams.

This produces some negative outcomes:

  • Higher operational costs. The reduced run costs from fewer product teams are overshadowed by the financial losses incurred during more frequent and longer periods of DIYers.com website unavailability. 
  • Lower customer revenues. DIYers.com customers are making fewer website orders than before, spending less on merchandise per order, and complaining more about stale website features. 

DIYers.com learned the hard way that maintenance mode by an operations team reduces protection, and increases financial exposure. 

Maintenance mode reduces protection

Maintenance mode by an operations team reduces protection, because it increases deployment lead times.

Transitioning a digital service into an operations team means fewer deployments. This can be visualised with deployment throughput levels. A You Build It You Run It transition reduces weekly deployments or more to a likely target measure of monthly deployments.

An Ops Run It transition probably reduces monthly deployments to a target measure of quarterly deployments.

Maintenance mode also results in slower deployments. This happens silently, unless deployment lead time is measured. Reducing deployment frequency creates plenty of slack, and that additional time is consumed by the operations team building, testing, and deploying a digital service from a myriad of codebases, scripts, config files, deployment pipelines, functional tests, etc. 

Longer deployment lead times result in:

  • Lower quality. Less rigour is applied to technical checks, due to the slack available. Feedback loops become enlarged and polluted, as test suites become slower and non-determinism creeps in. Defects and config workarounds are commonplace. 
  • Lower reliability. Less time is available for proactive availability management, due to the BAU maintenance workload. More time is needed to identify and resolve incidents. Faulty alerts, inadequate infrastructure, and major financial losses upon failure become the norm.

This situation worsens at scale. Each digital service inflicted on an operations team adds to their BAU maintenance workload. There is a huge risk of burnout amongst operations analysts, and deployment lead times subsequently rising until monthly deployments become unachievable.

At DIYers.com, the higher operational costs were caused by a loss of protection. The drop from daily to monthly deployments was accompanied by a silent drop in deployment lead time from 1 hour to 1 week. This created opportunities for quality and reliability problems to emerge, and operational costs to increase.

Maintenance mode increases financial exposure

Maintenance mode by an operations team increases financial exposure, because opportunity costs are constant, and unmanageable with long deployment lead times.

Opportunity costs are constant because user needs are unbounded. It is absurd to declare a digital service to be feature complete, because user demand does not magically stop when feature development is stopped. Opportunities to profit from satisfying user needs always exist in a market. 

Maintenance mode is wholly ignorant of opportunity costs. It is an artificial construct, driven by fixed capex budgets. It is true that developing a digital service indefinitely leads to diminishing returns, and expected return on investment could be higher elsewhere. However, a binary decision to end all investment in a digital service squanders any future opportunities to proactively increase revenues. 

Opportunity costs are unmanageable with long deployment times, because a market can move faster than an overworked operations team. The cost of delay can be enormous if days or weeks of effort are needed to build, test, and deploy. Critical opportunities can be missed, such as:

  • Increasing revenues by building a few new features to satisfy a sudden, unforeseeable surge in user demand. 
  • Protecting revenues when a live defect is found, particularly in a key trading period like Black Friday.
  • Protecting revenues, costs, and brand reputation when a zero day security vulnerability is discovered.  

The log4shell security flaw left hundreds of millions of devices vulnerable to arbitrary code execution. It is easy to imagine operations teams worldwide, frantically trying to patch tens of different digital services they did not build themselves, in the face of long deployment lead times and the threat of serious reputational damage. 

At DIYers.com, the lower customer revenues were caused by feature stagnation. The lack of funding for digital services meant customers became dissatisfied with the DIYers.com website, and many of them shopped on competitor websites instead.

Maintenance mode is best performed by product teams

Maintenance mode is best performed by product teams, because they are able to protect the financial exposure of digital services with minimal investment. 

Maintenance mode makes sense, in the abstract. IT as a Cost Centre dictates there are only so many fixed capex budgets per year. In addition, sometimes a digital service lacks the user demand to justify continuing with a dedicated product team. Problems with maintenance mode stem from implementation, not the idea. It can be successful with the following conditions:

  1. Be transparent. Communicate maintenance mode is a consequence of fixed capex budgets, and digital services do not have long-term funding without demonstrating product/market fit e.g. with Net Promoter Score
  2. Transition from Ops Run It to You Build It You Run It. Identify any digital services owned by an operations team, and transition them to product teams for all build and run activities. 
  3. Target the prior deployment lead time. Ensure maintenance mode has a target measure of less frequent deployments and the pre-transition deployment lead time. 
  4. Make product managers accountable. Empower budget holders for product teams to transition digital services in and out of maintenance mode, based on business metrics and funding scenarios. 
  5. Block transition routes to operations teams. Update service management policies to state only self-hosted COTS and back office foundational systems can be run by an operations team. 
  6. Track financial exposure. Retain a sliver of funding for user research into fast moving opportunities, and monitor financial flows in a digital service during normal and abnormal operations. 
  7. Run maintenance mode as background tasks. Empower product teams to retain their live digital services, then transfer those services into sibling teams when funding dries up.  

Maintenance mode works best when product teams run their own digital services. If a team has a live digital service #1 and new funding to develop digital service #2 in the same product domain, they monitor digital service #1 on a daily basis and deploy fixes and patches as necessary. This gives product teams a clear understanding of the pitfalls and responsibilities of running a digital service, and how to do better in the future. 

If funding dictates a product team is disbanded or moved into a different product domain, any digital services owned by that team need to be transferred to a sibling team in the current product domain. This minimises the knowledge sharing burden and BAU maintenance workload for the new product team. It also protects deployment lead times for the existing digital services, and consequently their reliability and quality standards. 

Maintenance mode by product teams requires funding for one permanent product team in each product domain. This drives some positive behaviours in organisational design. It encourages teams working in the same product domain to be sited in the same geographic region, which encourages a stronger culture based on a shared sense of identity. It also makes it easier to reawaken a digital service, as the learning curve is much smaller when sufficient user demand is found to justify further development. 

Consider DIYers.com, if maintenance mode was by owned product teams. The organisation-wide target measures for maintenance mode would be expanded, from monthly deployments to monthly deployments performed in under a day.

In the stock domain, the listings team is disbanded when funding ends. Its live service is moved into the stock team, and runs in the background indefinitely while development efforts continue on the stock service. The same happens in the search domain, with the recommend service moving into the search team. 

In the journeys domain, the electricals and tools teams both run out of funding. Their live digital services are transferred into the furniture team, which is renamed the journeys team and made accountable for all live digital services there. 

Of course, there is another option for maintenance mode by product teams. If a live digital service is no longer competitive in the marketplace and funding has expired, it can be deleted. That is the true definition of done.

End-To-End Testing considered harmful

End-To-End Testing is used by many organisations, but relying on extensive end-to-end tests is fundamentally incompatible with Continuous Delivery. Why is End-To-End Testing so commonplace, and yet so ineffective? How is Continuous Testing a lower cost, higher value testing strategy?

NOTE: The latter half of this article was superseded by the talk “End-To-End Testing Considered Harmful” in September 2016

Introduction

“Good testing involves balancing the need to mitigate risk against the risk of trying to gather too much information” Jerry Weinberg

Continuous Delivery is a set of holistic principles and practices to reduce time to market, and it is predicated upon rapid and reliable test feedback. Continuous Delivery mandates any change to code, configuration, data, or infrastructure must pass a series of automated and exploratory tests in a Deployment Pipeline to evaluate production readiness, so test execution times must be low and test results must be deterministic if an organisation is to achieve shorter lead times.

For example, consider a Company Accounts service in which year end payments are submitted to a downstream Payments service.

End-To-End Testing Considered Harmful - Company Accounts

The behaviour of the Company Accounts service could be checked at build time by the following types of automated test:

  • Unit tests check intent against implementation by verifying a discrete unit of code
  • Acceptance tests check implementation against requirements by verifying a functional slice of the system
  • End-to-end tests check implementation against requirements by verifying a functional slice of the system, including unowned dependent services

While unit tests and acceptance tests vary in terms of purpose and scope, acceptance tests and end-to-end tests vary solely in scope. Acceptance tests exclude unowned dependent services, so an acceptance test of a Company Accounts user journey would use a System Under Test comprised of the latest Company Accounts code and a Payments Stub.

End-To-End Testing Considered Harmful - A Company Accounts Acceptance Test

End-to-end tests include unowned dependent services, so an end-to-end test of a Company Accounts user journey would use a System Under Test comprised of the latest Company Accounts code and a running version of Payments.

End-To-End Testing Considered Harmful - A Company Accounts End-To-End Test

If a testing strategy is to be compatible with Continuous Delivery it must have an appropriate ratio of unit tests, acceptance tests, and end-to-end tests that balances the need for information discovery against the need for fast, deterministic feedback. If testing does not yield new information then defects will go undetected, but if testing takes too long delivery will be slow and opportunity costs will be incurred.

The folly of End-To-End Testing

“Any advantage you gain by talking to the real system is overwhelmed by the need to stamp out non-determinism” Martin Fowler

End-To-End Testing is a testing practice in which a large number of automated end-to-end tests and manual regression tests are used at build time with a small number of automated unit and acceptance tests. The End-To-End Testing test ratio can be visualised as a Test Ice Cream Cone.

End-To-End Testing Considered Harmful - The Test Ice Cream Cone

End-To-End Testing often seems attractive due to the perceived benefits of an end-to-end test:

  1. An end-to-end test maximises its System Under Test, suggesting a high degree of test coverage
  2. An end-to-end test uses the system itself as a test client, suggesting a low investment in test infrastructure

Given the above it is perhaps understandable why so many organisations adopt End-To-End Testing – as observed by Don Reinertsen, “this combination of low investment and high validity creates the illusion that system tests are more economical“. However, the End-To-End Testing value proposition is fatally flawed as both assumptions are incorrect:

  1. The idea that testing a whole system will simultaneously test its constituent parts is a Decomposition Fallacy. Checking implementation against requirements is not the same as checking intent against implementation, which means an end-to-end test will check the interactions between code pathways but not the behaviours within those pathways
  2. The idea that testing a whole system will be cheaper than testing its constituent parts is a Cheap Investment Fallacy. Test execution time and non-determinism are directly proportional to System Under Test scope, which means an end-to-end test will be slow and prone to non-determinism

Martin Fowler has warned before that “non-deterministic tests can completely destroy the value of an automated regression suite“, and Stephen Covey’s Circles of Control, Influence, and Concern highlights how the multiple actors in an end-to-end test make non-determinism difficult to identify and resolve. If different teams in the same Companies R Us organisation owned the Company Accounts and Payments services the Company Accounts team would control its own service in an end-to-end test, but would only be able to influence the second-party Payments service.

End-To-End Testing Considered Harmful - A Company Accounts End-To-End Test Single Organisation

The lead time to improve an end-to-end test depends on where the change is located in the System Under Test, so the Company Accounts team could analyse and implement a change in the Company Accounts service in a relatively short lead time. However, the lead time for a change to the Payments service would be constrained by the extent to which the Company Accounts team could persuade the Payments team to take action.

Alternatively, if a separate Payments R Us organisation owned the Payments service it would be a third-party service and merely a concern of the Company Accounts team.

End-To-End Testing Considered Harmful - A Company Accounts End-To-End Test Multiple Organisations

In this situation a change to the Payments service would take much longer as the Company Accounts team would have zero control or influence over Payments R Us. Furthermore, the Payments service could be arbitrarily updated with little or no warning, which would increase non-determinism in Company Accounts end-to-end tests and make it impossible to establish a predictable test baseline.

A reliance upon End-To-End Testing is often a symptom of long-term underinvestment producing a fragile system that is resistant to change, has long lead times, and optimised for Mean Time Between Failures instead of Mean Time To Repair. Customer experience and operational performance cannot be accurately predicted in a fragile system due to variations caused by external circumstances, and focussing on failure probability instead of failure cost creates an exposure to extremely low probability, extremely high cost events known as Black Swans such as Knights Capital losing $440 million in 45 minutes. For example, if the Payments data centre suffered a catastrophic outage then all customer payments made by the Company Accounts service would fail.

End-To-End Testing Considered Harmful - Company Accounts Payments Failure

An unavailable Payments service would leave customers of the Company Accounts service with their money locked up in in-flight payments, and a slow restoration of service would encourage dissatisfied customers to take their business elsewhere. If any in-flight payments were lost and it became public knowledge it could trigger an enormous loss of customer confidence.

End-To-End Testing is an uncomprehensive, high cost testing strategy. An end-to-end test will not check behaviours, will take time to execute, and will intermittently fail, so a test suite largely composed of end-to-end tests will result in poor test coverage, slow execution times, and non-deterministic results. Defects will go undetected, feedback will be slow and unreliable, maintenance costs will escalate, and as a result testers will be forced to rely on their own manual end-to-end regression tests. End-To-End Testing cannot produce short lead times, and it is utterly incompatible with Continuous Delivery.

The value of Continuous Testing

“Cease dependence on inspection to achieve quality. Eliminate the need for inspection on a mass basis by building quality into the product in the first place” Dr W Edwards Deming

Continuous Delivery advocates Continuous Testing – a testing strategy in which a large number of automated unit and acceptance tests are complemented by a small number of automated end-to-end tests and focussed exploratory testing. The Continuous Testing test ratio can be visualised as a Test Pyramid, which might be considered the antithesis of the Test Ice Cream Cone.

End-To-End Testing Considered Harmful - The Test Pyramid

Continuous Testing is aligned with Test-Driven Development and Acceptance Test Driven Development, and by advocating cross-functional testing as part of a shared commitment to quality it embodies the Continuous Delivery principle of Build Quality In. However, Continuous Testing can seem daunting due to the perceived drawbacks of unit tests and acceptance tests:

  1. A unit test or acceptance test minimises its System Under Test, suggesting a low degree of test coverage
  2. A unit test or acceptance test uses its own test client, suggesting a high investment in test infrastructure

While the End-To-End Testing value proposition is invalidated by incorrect assumptions of high test coverage and low maintenance costs, the inverse is true of Continuous Testing – its value proposition is validated by incorrect assumptions of low test coverage and high maintenance costs:

  1. A unit test will check intent against implementation and an acceptance test will check implementation against requirements, which means both the behaviour of a code pathway and its interactions with other pathways can be checked
  2. A unit test will restrict its System Under Test scope to a single pathway and an acceptance test will restrict itself to a single service, which means both can have the shortest possible execution time and deterministic results

A non-deterministic acceptance test can be resolved in a much shorter period of time than an end-to-end test as the System Under Test has a single owner. If Companies R Us owned the Company Accounts service and Payments R Us owned the Payments service a Company Accounts acceptance test would only use services controlled by the Company Accounts team.

End-To-End Testing Considered Harmful - Acceptance Test Multiple Organisations

If the Company Accounts team attempted to identify and resolve non-determinism in an acceptance test they would be able to make the necessary changes in a short period of time. There would also be no danger of unexpected changes to the Payments service impeding an acceptance test of the latest Company Accounts code, which would allow a predictable test baseline to be established.

End-to-end tests are a part of Continuous Testing, not least because the idea that testing the constituent parts of a system will simultaneously test the whole system is a Composition Fallacy. A small number of automated end-to-end tests should be used to validate core user journeys, but not at build time when unowned dependent services are unreliable and unrepresentative. The end-to-end tests should be used for release time smoke testing and runtime production monitoring, with synthetic transactions used to simulate user activity. This approach will increase confidence in production releases and should be combined with real-time monitoring of business and operational metrics to accelerate feedback loops and understand user behaviours.

In Continuous Delivery there is a recognition that optimising for Mean Time To Repair is more valuable than optimising for Mean Time Between Failures as it enables an organisation to minimise the impact of production defects, and it is more easily achievable. Defect cost can be controlled as Little’s Law guarantees smaller production releases will shorten lead times to defect resolution, and Continuous Testing provides the necessary infrastructure to shrink feedback loops for smaller releases. The combination of Continuous Testing and Continuous Delivery practices such as Blue Green Releases and Canary Releases empower an organisation to create a robust system capable of neutralising unanticipated events, and advanced practices such as Dark Launching and Chaos Engineering can lead to antifragile systems that seek to benefit from Black Swans. For example, if Chaos Engineering surfaced concerns about the Payments service the Company Accounts team might Dark Launch its Payments Stub into production and use it in the unlikely event of a Payments data centre outage.

End-To-End Testing Considered Harmful - Company Accounts Payments Stub Failure

While the Payments data centre was offline the Company Accounts service would gracefully degrade to collecting customer payments in the Payments Stub until the Payments service was operational again. Customers would be unaffected by the production incident, and if competitors to the Company Accounts service were also dependent on the same third-party Payments service that would constitute a strategic advantage in the marketplace. Redundant operational capabilities might seem wasteful, but Continuous Testing promotes operational excellence and as Nassim Nicholas Taleb has remarked “something unusual happens – usually“.

Continuous Testing can be a comprehensive and low cost testing strategy. According to Dave Farley and Jez Humble “building quality in means writing automated tests at multiple levels“, and a test suite largely comprised of unit and acceptance tests will contain meticulously tested scenarios with a high degree of test coverage, low execution times, and predictable test results. This means end-to-end tests can be reserved for smoke testing and production monitoring, and testers can be freed up from manual regression testing for higher value activities such as exploratory testing. This will result in fewer production defects, fast and reliable feedback, shorter lead times to market, and opportunities for revenue growth.

From end-to-end testing to continuous testing

“Push tests as low as they can go for the highest return in investment and quickest feedback” Janet Gregory and Lisa Crispin

Moving from End-To-End Testing to Continuous Testing is a long-term investment, and should be based on the notion that an end-to-end test can be pushed down the Test Pyramid by decoupling its concerns as follows:

  • Connectivity – can services connect to one another
  • Conversation – can services talk with one another
  • Conduct – can services behave with one another

Assume the Company Accounts service depends on a Pay endpoint on the Payments service, which accepts a company id and payment amount before returning a confirmation code and days until payment. The Company Accounts service sends the id and amount request fields and silently depends on the code response field.

End-To-End Testing Considered Harmful - Company Accounts Pay

The connection between the services could be unit tested using Test Doubles, which would allow the Company Accounts service to test its reaction to different Payments behaviours. Company Accounts unit tests would replace the Payments connector with a Mock or Stub connector to ensure scenarios such as an unexpected Pay timeout were appropriately handled.

The conversation between the services could be unit tested using Consumer Driven Contracts, which would enable the Company Accounts service to have its interactions continually verified by the Payments service. The Payments service would issue a Provider Contract describing its Pay API at build time, the Company Accounts service would return a Consumer Contract describing its usage, and the Payments service would create a Consumer Driven Contract to be checked during every build.

End-To-End Testing Considered Harmful - Company Accounts Consumer Driven Contract

With the Company Accounts service not using the days response field it would be excluded from the Consumer Contract and Consumer Driven Contract, so a build of the Payments service that removed days or added a new comments response field would be successful. If the code response field was removed the Consumer Driven Contract would fail, and the Payments team would have to collaborate with the Company Accounts team on a different approach.

The conduct of the services could be unit tested using API Examples, which would permit the Company Accounts service to check for behavioural changes in new releases of the Payments service. Each release of the Payments service would be accompanied by a sibling artifact containing example API requests and responses for the Pay endpoint, which would be plugged into Company Accounts unit tests to act as representative test data and warn of behavioural changes.

End-To-End Testing Considered Harmful - Company Accounts API Examples

If a new version of the Payments service changed the format of the code response field from alphanumeric to numeric it would cause the Company Accounts service to fail at build time, indicating a behavioural change within the Payments service and prompting a conversation between the teams.

Conclusion

“Not only won’t system testing catch all the bugs, but it will take longer and cost more – more than you save by skipping effective acceptance testing” – Jerry Weinberg

End-To-End Testing seems attractive to organisations due to its promise of high test coverage and low maintenance costs, but the extensive use of automated end-to-end tests and manual regression tests can only produce a fragile system with slow, unreliable test feedback that inflates lead times and is incompatible with Continuous Delivery. Continuous Testing requires an upfront and ongoing investment in test automation, but a comprehensive suite of automated unit tests and acceptance tests will ensure fast, deterministic test feedback that reduces production defects, shortens lead times, and encourages the Continuous Delivery of robust or antifragile systems.

Further Reading

  1. Continuous Delivery by Dave Farley and Jez Humble
  2. Principles Of Product Development Flow by Don Reinertsen
  3. 7 Habits of Highly Effective People by Stephen Covey
  4. Test Pyramid by Martin Fowler
  5. Test Ice Cream Cone by Alister Scott
  6. Integrated Tests Are A Scam by JB Rainsberger
  7. Agile Testing and More Agile Testing by Janet Gregory and Lisa Crispin
  8. Perfect Software and Other Illusions by Jerry Weinberg
  9. Release Testing Is Risk Management Theatre by Steve Smith
  10. The Art Of Agile Development by James Shore and Shane Warden
  11. Making End-To-End Tests Work by Adrian Sutton
  12. Just Say No To More End-To-End Tests by Mike Wacker
  13. Antifragile by Nassim Nicholas Taleb
  14. On Antifragility In Systems And Organisational Architecture by Jez Humble

Acknowledgements

Thanks to Amy Phillips, Beccy Stafford, Charles Kubicek, and Chris O’Dell for their early feedback on this article.

Release Testing Is Risk Management Theatre

Continuous Delivery often leads to the discovery of suboptimal practices within an organisation, and the Release Testing antipattern is a common example. What is Release Testing, and why is it an example of Risk Management Theatre?

Pre-Agile Testing

“I was a principal test analyst. I worked in a separate testing team to the developers. I spent most of my time talking to them to understand their changes, and had to work long hours to do my testing” – Suzy

The traditional testing strategy of many IT organisations was predicated upon a misguided belief described by Elisabeth Hendrickson as “testers test, programmers code, and the separation of the two disciplines is important“. Segregated development and testing teams worked in sequential phases of the value stream, with each product increment handed over to the testers for a prolonged period of testing prior to sign-off.

Release Testing Is Risk Management Theatre - Pre Agile Testing

This strategy was certainly capable of uncovering defects, but it also had a detrimental impact on lead times and quality. The handover period between development and testing inserted delays into the value stream, creating large feedback loops that increased rework. Furthermore, the segregation of development and testing implicitly assigned authority for changes to developers and responsibility for quality to testers. This disassociated developers from defect consequences and testers from business requirements, invariably resulting in higher defect counts and lower quality over time.

Agile Testing

“I was a product tester. I worked in an agile team with developers and a business analyst. I contributed to acceptance tests and did exploratory testing. I don’t miss the old ways” – Dwayne

The publication of the Agile Manifesto in 2001 led to a range of lightweight development processes that introduced a radically different testing approach. Agile methods advocate cross-functional teams of co-located developers and testers, in which testing is considered a continuous activity and there is a shared commitment to product quality.

Release Testing Is Risk Management Theatre - Agile Testing

In an agile team developers and testers collaborate on practices such as Test Driven Development and Acceptance Test Driven Development in accordance with the Test Pyramid strategy, which recommends a large number of automated unit and acceptance tests in proportion to a small number of automated end-to-end and manual tests.

Release Testing Is Risk Management Theatre - Test Pyramid

The Test Pyramid favours automated unit and acceptance tests as they offer a greater value at a lower cost. Test execution time and determinism are directly proportional to System Under Test size, and as automated unit and acceptance tests have minimal scope they provide fast, deterministic feedback. Automated end-to-end tests and exploratory testing are also valuable, but the larger System Under Test means feedback is slower and less reliable.

This testing strategy is a vast improvement upon its predecessor. Uniting developers and testers in a product team eliminates handover delays and recombines authority with responsibility, resulting in a continual emphasis upon product quality and lower lead times.

Release Testing Is Risk Management Theatre - Agile Testing Test Pyramid

Release Testing

“I was an operational acceptance tester. I worked in a separate testing team to the developers and functional testers. I never had time to find defects or understand requirements, and always got the blame” – Jamie

The transition from siloed development and testing teams to cross-functional product teams is a textbook example of how organisational change enables Continuous Delivery – faster feedback and improved quality will unlock substantial cycle time gains and decrease opportunity costs. However, all too often Continuous Delivery is impeded by Release Testing – an additional phase of automated and/or manual end-to-end regression testing, performed on the critical path independent of the product team.

Release Testing Is Risk Management Theatre - Release Testing

Release Testing is often justified as a guarantee of product quality, but in reality it is a disproportionately costly practice with little potential for defect discovery. The segregation of release testers from the product team reinserts handover delays into the value stream and dilutes responsibility for quality, increasing feedback loops and rework. Furthermore, as release testers must rely upon end-to-end tests their testing invariably becomes a Test Ice Cream Cone of slow, brittle tests with long execution times and high maintenance costs.

Release Testing Is Risk Management Theatre - Test Ice Cream Cone

The reliance of Release Testing upon end-to-end testing on the critical path means a low degree of test coverage is inevitable. Release testers will always be working to a pre-arranged business deadline outside their control, and consequently test coverage will often be curtailed to such an extent the blameless testers will find it difficult to uncover any significant defects.

Release Testing Is Risk Management Theatre - Release Testing Test Ice Cream Cone

When viewed through a Continuous Delivery frame the high cost and low value of Release Testing become evident, and attempting to redress that imbalance is a zero-sum game. Decreasing the cost of Release Testing means fewer end-to-end tests, which will decrease execution time but also decrease test coverage. Increasing the value of Release Testing means more end-to-end tests, which will increase test coverage but also increase execution time. Release Testing can therefore be considered an example of what Jez Humble describes as Risk Management Theatre – an overly-costly practice with an artificial sense of value.

Release Testing is high cost, low value Risk Management Theatre

Build Quality In

Continuous Delivery is founded upon the Lean Manufacturing principle of Build Quality In, and the advice of Dr. W. Edwards Deming that “we cannot rely on mass inspection to improve quality” is especially pertinent to Release Testing. An organisation should build quality into its product rather than expect testers to inspect quality in at a later date, and that means eliminating Release Testing by moving release testers back into the product team.

Release Testing Is Risk Management Theatre - No Release Testing

Folding release testers into product development removes the handover delays and responsibility barriers imposed by Release Testing. End-to-end regression tests can be audited by all stakeholders, with valuable tests retained and the remainder discarded. More importantly, ex-release testers will be freed up to work on higher-value activities off the critical path, such as exploratory testing and business analysis.

Batch Size Reduction

Given the limited value of Release Testing it is prudent to consider other risk reduction strategies, and a viable alternative supported by Continuous Delivery is Batch Size Reduction – releasing smaller changesets more frequently into production. Splitting a large experiment into smaller independent experiments reduces variation in outcomes, so by decomposing large changesets into smaller unrelated changesets we can reduce the probability of failure associated with any one changeset.

For example, assume an organisation has a median cycle time of 12 weeks – perhaps due to Release Testing – and a pending release of 4 features. The probability of failure for this release has been estimated as 1 in 2 (50%), and there is a desire to reduce that level of risk.

Release Testing Is Risk Management Theatre - Probability One Release

As the 50% estimate is aggregated from 4 features it can be improved by reducing delivery costs – perhaps by eliminating Release Testing – and releasing features independently every 3 weeks. While this theoretically produces 4 homogeneous releases with a 1 in 8 (12.5%) failure probability, the heterogeneity of product development creates variable feature complexity – and smaller changesets enable more accurate estimation of comparative failure probabilities. In this example the 4 changesets allow a more detailed risk assessment that assigns features 2 and 3 a higher failure probability, which means more exploratory testing time can be allocated to those specific features to reduce overall failure probability.

Release Testing Is Risk Management Theatre - Probability Multiple Heterogenous Releases

When a production defect does occur, batch size reduction has the ability to significantly reduce defect cost. The cost of a defect is comprised of the sunk cost incurred between activation and discovery, and the opportunity cost incurred between discovery and resolution. Those costs are a function of cost per unit time and duration, where cost per unit time represents economic impact and duration represents time.

For example, assume the organisation unwisely retained its 12 week lead time and a production defect D1 has been found 3 weeks after release. An assessment of external market conditions calculates a static cost per unit time of £10,000 a week, which means a sunk cost of £30,000 has already been incurred and a £120,000 opportunity cost is looming.

Release Testing Is Risk Management Theatre - Opportunity Cost Long Lead Time

As cost per unit time is governed by external market conditions it is difficult to influence, but duration is controlled by Little’s Law which states that lead time is directly proportional to work in progress. This means the opportunity cost duration of a defect can be decreased by releasing the defect fix in a smaller changeset, which will result in a shorter lead time and a reduced defect cost. If a fix for D1 is released in its own changeset in 1 week, that would decrease the opportunity cost by 92% to £10,000 and produce a 73% overall reduction in defect cost to £40,000.

Release Testing Is Risk Management Theatre - Opportunity Cost Short Lead Time

Conclusion

Release Testing is the definitive example of Risk Management Theatre in the IT industry today and a significant barrier to Continuous Delivery. End-to-end regression testing on the critical path cannot provide any meaningful reduction in defect probability without incurring costs that harm product quality and inflate lead times. Continuous Delivery advocates a lower cost, higher value alternative in which the product team owns responsibility for product quality, with an emphasis upon exploratory testing and batch size reduction to decrease risk.

Tester names have been altered

Further Reading

  1. Leading Lean Software Development  by Mary and Tom Poppendieck
  2. Assign Responsibility And Authority by Shelley Doll
  3. Integrated Tests Are A Scam by JB Rainsberger
  4. Continuous Delivery by Dave Farley and Jez Humble
  5. Organisation Antipattern – Release Testing by Steve Smith
  6. The Essential Deming by W. Edwards Deming
  7. Explore It! by Elisabeth Hendrickson
  8. Principles Of Product Development Flow by Don Reinertsen

No Release Testing

This series of articles explains why Release Testing – end-to-end regression testing on the critical path – is a wasteful practice that impedes Continuous Delivery and is unlikely to uncover business critical defects.

  1. Organisation Antipattern: Release Testing – introduces the Release Testing antipattern and why it cannot discover defects
  2. Organisation Antipattern: Consumer Release Testing – introduces the consumer-side variant of the Release Testing antipattern
  3. More Releases With Less Risk – describes how releasing smaller changesets more frequently can reduce probability and cost of failure
  4. Release Testing Is Risk Management Theatre – explains why Release Testing is so ineffective, and offers batch size reduction as an alternative

Organisation antipattern: Passive Disaster Recovery

Passive Disaster Recovery is Risk Management Theatre

When an IT organisation is vulnerable to a negative Black Swan – an extremely low probability, extremely high cost event causing ruinous financial loss – a traditional countermeasure to minimise downtime and opportunity costs is Passive Disaster Recovery. This is where a secondary production environment is established in a separate geographic location to the primary production environment, with every product increment released into Production and Disaster Recovery retained in a cold standby state.

For example, consider an organisation hosting version v1040 of a customer-facing service in its Production environment. In the event of a catastrophic failure, customers should be immediately routed to the Disaster Recovery environment and receive the same quality of service.

Organisation Antipattern - Disaster Recovery Environment - Vision

Regardless of physical/virtual hosting and manual/automated infrastructure provisioning, Passive Disaster Recovery is predicated upon the fundamentally flawed assumption that active and passive environments will be identical at any given point in time. Over time the unused Disaster Recovery environment will suffer from hardware, infrastructure, configuration, and software drift until it consists of Snowflake Servers that will likely require significant manual intervention if and when Disaster Recovery is activated. With negative Black Swan opportunity costs incurred at a rapid pace the entire future of the organisation might be placed in jeopardy.

Organisation Antipattern - Disaster Recovery Environment - Failover Drift

Passive Disaster Recovery remains common due to an industry-wide underestimation of negative Black Swan events. It is easier for an individual or an organisation to appreciate the extremely low probability of a disastrous business event rather than the extremely high opportunity cost, and as a result a Disaster Recovery environment tends to be procured when a business project begins and left to decay into Risk Management Theatre when the capex funding ends.

Continuous Delivery advocates a radically different approach to Disaster Recovery as it is explicitly focussed upon reducing the time, risk, and opportunity cost of delivering high quality services. One of its principles is Bring The Pain Forward – increasing the cadence of high cost, low frequency events to drive down transaction costs – and applying it to Disaster Recovery means moving from passive to active standby via Blue Green Releases and rotating production responsibility between two near-identical environments.

Organisation Antipattern - Disaster Recovery Environment - Blue Green Releases

In the above diagram, the Blue production environment is currently hosting v1040 and the Green environment is being upgraded with v1041. Once v1041 passes its automated smoke tests and manual exploratory tests it is signed off and customers are seamlessly rerouted from Blue to Green. A short period of time afterwards Blue is upgraded in the background and awaits the next production release.

Organisation Antipattern - Disaster Recovery Environment - Green Blue Releases

As well as enabling zero downtime releases and a cheap rollback mechanism, Blue Green Releases provides an effective Disaster Recovery strategy as the standby production environment is always active and in a known good state. If the Green environment suffers a complete outage customers can be switched to the Blue environment with complete confidence, and vice versa.

Organisation Antipattern - Disaster Recovery Environment - Blue Green Failover

By practicing Blue Green Releases an organisation is effectively rehearsing its Disaster Recovery strategy on every production release, and this can lead to advanced practices such as Chaos Engineering , Fault Injection , and Game Days. It requires a continuous investment in hardware and infrastructure, but it will reduce exposure to negative Black Swans and may even offer a strategic advantage over competitors.

Organisation antipattern: Dual Value Streams

Dual Value Streams conceal transaction and opportunity costs

The goal of Continuous Delivery is to optimise cycle time in order to increase product revenues, and cycle time is measured as the average lead time of the value stream from code checkin to production release. This was memorably summarised by Mary and Tom Poppendieck as the Poppendieck Question:

“How long would it take your organization to deploy a change that involves just one single line of code? Do you do this on a repeatable, reliable basis?”

The Poppendieck Question is an excellent lead-in to the Continuous Delivery value proposition, but the problem with using it to assess the cycle time of an organisation yet to adopt Continuous Delivery is there will often be two very different answers – one for features, and one for fixes. For example, consider an organisation with a quarterly release cycle. The initial answer to the Poppendieck Question would be “90 days” or similar. Dual Value Streams -  Feature Value Stream However, when the transaction cost of releasing software is disproportionately high a truncated value stream will often emerge for production defect fixes, in which value stream activities are deliberately omitted to slash cycle time. This results in Dual Value Streams – a Feature Value Stream with a cycle time of months, and a Fix Value Stream with a cycle time of days. If our example organisation can release a defect fix in a few days, the correct answer to the Poppendieck Question becomes “90 days or 3 days”. Dual Value Streams - Fix Value Stream Fix Value Streams exist because production defect fixes have a clear financial value that is easily communicated and outweighs the high transaction cost of Feature Value Streams. An organisation will be imbued with a sense of urgency, as a sunk cost has demonstrably been incurred and by releasing a fix faster an opportunity cost can be reduced. People in siloed teams will collaborate upon a fix, and by using a minimal changeset it becomes possible to reason about which value stream activities can be discarded e.g. omitting capacity testing for a UI fix.

Dual Value Streams is an organisational antipattern because it is a local optimisation with little overall benefit to the organisation. There has been an investment in a release mechanism with a smaller batch size and a lower transaction cost, but as it is reserved for defect fixes it cannot add new customer value to the product. The long-term alternative is for organisations to adopt Continuous Delivery and invest in a single value stream with a minimal overall transaction cost. If our example organisation folded its siloed teams into cross-functional teams and moved activities off the critical path a fortnightly release cycle would become a distinct possibility. Dual Value Streams - Value Stream Dual Value Streams is an indicator of organisational potential for Continuous Delivery. When people are aware of the opportunity costs associated with releasing software as well as the transaction costs they are more inclined to work together in a cross-functional manner. When changesets contain a small number of changes it becomes easier to collectively reason about which value stream activities are useful and which should be moved off the critical path or retired.

Furthermore, a Fix Value Stream implicitly validates the use of smaller batch sizes as a risk reduction strategy. Defect fixes are released in small changes to minimise both opportunity costs and the probability of any further errors. Given that strategy works for fixes, why not release features more frequently and measure an organisation against a value-centric Poppendieck Question?

“How long would it take your organization to release a single value-adding line of code? Do you do this on a repeatable, reliable basis?”

More releases with less risk

Continuous Delivery reduces defect probability and cost

Continuous Delivery often challenges conventional wisdom within the IT industry, and by advocating the rapid release of value-add to reduce risk it contradicts the traditional belief that a low release cadence is an effective risk reduction strategy. How can releasing software more frequently reduce both defect probability and defect cost?

The probability of a defect is the likelihood of a change within a changeset unexpectedly impeding value-add and imposing an opportunity cost. Given the defect probability of a changeset is proportional to its size we can calculate the defect probability of a change as follows:

Fix More With Less - Defect Probability

n = number of changesets
probability = (1 / 2n) * 100 [percentage]

The above formula indicates that decreasing changeset size by increasing the number of changesets will reduce defect probability, and this is confirmed by Don Reinertsen’s assertion that “many smaller experiments produce less variation than one big one“. For example, if a change is released in 1 changeset there is a 1 in 2 chance or 50% probability of failure. If it was instead released in 3 changesets there would be a 1 in 8 chance or 12.5% probability of failure.

The cost of a defect is the product of cost per unit time and duration, where cost per unit time represents economic impact and duration represents lifetime.

cost = cost per unit time [currency] * duration [unit time]

A defect has an inception date at its outset, a discovery date when diagnosed, and a resolution date when fixed. The interactions between these dates and cost per unit time enable a division of defect cost into sunk cost and opportunity cost. The sunk cost of a defect represents the economic damage already incurred at the point of discovery, while opportunity cost represents the economic damage still to be incurred.

Fix More With Less - Defect Cost

sunk cost duration = discovery date – inception date [unit time]
sunk cost = cost per unit time * sunk cost duration [currency]

opportunity cost duration = resolution date – discovery date [unit time]
opportunity cost = cost per unit time * opportunity cost duration [currency]

cost = sunk cost + opportunity cost [currency]

As cost per unit time is controlled by market conditions it is far easier to reduce opportunity cost duration by shortening lead times. This can be accomplished via batch size reduction, as Mary and Tom Poppendieck have observed that “time through the system is directly proportional to the amount of work-in-process” due to Little’s Law:

lead time = work in progress [units] / completion rate [units per time period]

Little’s Law is universal for all stable systems in which these variables are consistent long-term averages, and it is mathematical proof that reducing batch size will reduce lead time. For example, if a jug contains 4 litres of water and pours 2 litres per second then it will empty in 2 seconds. If instead the jug contained 2 litres of water and still poured 2 litres per second it would empty in 1 second.

Releasing smaller changesets more frequently into production can also reduce sunk cost duration, as small batches accelerate feedback. A smaller batch size will decrease the lead time and complexity associated with each changeset, creating faster feedback loops that will reduce the time required to discover a defect.

Consider an organisation with an average changeset size of 24 changes and an average lead time of 12 days. How can we reduce the defect probability of the next production release R1?

Fix More With Less - Defect Probability Smaller Changeset

n = 1
probability = (1 / 21) * 100 = 50%

Based on the binomial probabilities involved we recommend to the organisation that it reduce defect probability by applying batch size reduction to R1 and splitting its changeset into 2 smaller releases R1 and R2. This would decrease defect probability from 50% to 25%.

Fix More With Less - Defect Probability Larger Changeset

n = 2
probability = (1 / 22) * 100 = 25%

Unfortunately the organisation ignores our advice to release smaller changesets, and the release of R1 at a later date introduces a defect D1 that remains undiscovered for 6 days. D1 impedes a sufficient amount of value-add that a cost per unit time of £20,000 per day is estimated, which means a sunk cost of £120,000 has already been incurred and an opportunity cost of £240,000 is forecast. The organisation immediately triages D1 for a fix, but how can we reduce its opportunity cost?

Fix More With Less - Defect Cost Large

cost per unit time = £20,000
sunk cost = 6 days * £20,000 = £120,000
opportunity cost = 12 days * £20,000 = £240,000
overall cost = sunk cost + opportunity cost = £360,000

Given the organisation currently has an average batch size of 24 changes per changeset and a 12 day average lead time, Little’s Law computes an average completion rate of 2 changes per day and informs us that a reduced batch size of 12 changes per changeset would produce a 6 day lead time.

completion rate = work in process / lead time
completion rate = 24 changes per changeset / 12 days = 2 changes per day

lead time = work in process / completion rate
lead time = 12 changes per changeset / 2 changes per day = 6 days

Based on Little’s Law we again recommend to the organisation a halved batch size of 12 changes per changeset, and this time our advice is accepted. A fix for D1 is included in the next changeset released into production in 6 days, which produces an opportunity cost saving of £120,000.

Fix More With Less - Defect Cost Smaller Opportunity Cost

cost per unit time = £20,000
sunk cost = 6 days * £20,000 = £120,000
opportunity cost = 6 days * £20,000 = £120,000
overall cost = sunk cost + opportunity cost = £240,000

As well as decreasing the total cost of D1 by 33%, the new lead time of 6 days increases the rate of feedback for future production defects. When a subsequent release introduces defect D2 at a lower cost per unit time of £10,000 per day the reduced size and complexity of the offending changeset means D2 is discovered in only 3 days.

Fix More With Less - Defect Cost Smaller Sunk Cost

cost per unit time = £10,000
sunk cost = 3 days * £10,000 = £30,000
opportunity cost = 6 days * £10,000 = £60,000
overall cost = sunk cost + opportunity cost = £90,000

When we triage D2 we discover its cost per unit time has decreased to £1,000 per day, meaning its sunk cost is a poor indicator of opportunity cost and its Cost of Delay is lower than expected. Based upon the new 6 day lead time we recommend to the organisation that it defer a D2 fix for at least one release in order to implement pending value-add of greater value than the £12,000 opportunity cost of D2.

Fix More With Less - Defect Cost Even Smaller Opportunity Cost

cost per unit time = 3 days * £10,000, 12 days * £1,000
sunk cost = 3 days * £10,000 = £30,000
opportunity cost = 12 days * £1,000 = £12,000
overall cost = sunk cost + opportunity cost = £42,000

The assumption within many IT organisations that risk is directly proportional to rate of change is flawed, as it assumes a constant large batch size. Risk is actually proportional to size of change, and a low release cadence of large changesets is not as effective a risk reduction strategy as a high release cadence of small changesets. Continuous Delivery enables the release of smaller changesets to rapidly release value-add as well as reducing both the probability and cost of defects.

Organisation antipattern: Consumer Release Testing

Consumer Release Testing is high cost, low value risk management theatre

Despite the historical advice of Harold Dodge that “you cannot inspect quality into a product” and the contemporary advice of Don Reinertsen that “testing is probably the single most common critical-path queue” the Release Testing antipattern remains prevalent in the IT industry, and is by no means limited to standalone applications.

Consider the development of a consumer application that requires data from a provider application in order to fulfill its business capabilities. The consumer team contains developers and testers collaborating upon the Testing Pyramid strategy, which recommends unit/acceptance tests over end-to-end tests on the basis that test execution time is proportional to System Under Test scope. This means the necessary provider interactions are test-driven by the consumer team using the Test Stub pattern, which creates a lightweight provider implementation to supply canned responses back to the consumer.

Consumer Release Testing - Product Team Stubbed Provider

By using a stub the consumer interactions with the provider can be tested in a minimal System Under Test, which ensures that changes made by the consumer team produce fast and deterministic feedback. Success and failure scenarios (e.g. socket failure, socket timeout, provider error code) can be rapidly developed without relying upon a running provider instance, and the consumer team should be capable of rapidly responding to changing requirements in the future.

However, in many IT organisations the consumer team will be hindered by Consumer Release Testing – a phase of post-development end-to-end regression testing of the full consumer and provider stack, performed by a segregated testing team on the critical path.

Consumer Release Testing - Consumer Release Testing

The desire for provider risk mitigation is understandable given that consumer revenues are to an extent dependent upon the provider, but Consumer Release Testing exacerbates the original flaws of Release Testing:

  1. Extensive end-to-end testing – including both consumer and provider in System Under Test scope increases test execution time and maintenance costs
  2. Independent testing phase – dividing authority and responsibility for the consumer results in quality issues and feedback delays
  3. Critical path constraints – working on the critical path means the release testers will always be pressured to reduce test coverage to meet pre-agreed deadlines

By extending the Release Testing strategy it is evident that Consumer Release Testing is itself risk management theatre – it is highly unlikely to uncover any substantial defects in consumer/provider interactions without a significant increase in test coverage, which will drive up product lead times and opportunity costs.

A far more effective risk reduction strategy is to accept the conventional wisdom that testing is an activity not a phase, and move the blameless release testers into the consumer product team. This ensures that all team members are equally invested in product quality and empowers testers to focus upon higher-value activities such as exploratory testing, which has been described by Elisabeth Hendrickson as “particularly good at revealing vulnerabilities that no one thought to look for before“. For example, some exploratory testing off the critical path of the consumer against a running provider instance might uncover some additional error scenarios that would then be fed into the automated unit/acceptance tests.

Consumer Release Testing - Product Team Real Provider

A high value, low cost alternative to Consumer Release Testing is for the consumer and provider to actively cooperate in risk reduction, which can result in a substantial reduction in provider risk. The probability of a provider failure can be decreased by independently testing the conflated concerns of end-to-end testing as follows:

  • Connectivity: the consumer can test provider expectations of consumer connections via release time smoke tests and run time monitoring
  • Compatibility: the provider can test consumer expectations of messaging via build time Consumer Driven Contracts issued by the consumer
  • Conduct: the consumer can test its expectations of provider behaviour via build time API Examples issued by the provider

The cost of a provider failure can be reduced via incremental release strategies such as consumer-side Feature Toggles and provider-side Blue-Green Deployments. These practices encourage a provider release to be gradually phased into production usage, so that the consumer can switch back to the previous provider version if necessary.

This approach is a viable alternative to Consumer Release Testing, but it is of limited value without provider cooperation. If the provider cannot or will not participate in risk reduction then the consumer must assess risk based upon historical provider lead times. As large batch sizes increase risk an infrequent provider release schedule is indicative of heightened risk, and if the cost of failure is significant then a limited form of Consumer Release Testing may be deemed justifiable. In those circumstances the consumer development team should perform end-to-end tests off the critical path using a lightweight test client, so that the slow feedback loops and non-determinism of Consumer Release Testing are diminished.

Organisation antipattern: Release Testing

Release Testing is high cost, low value risk management theatre

Described by Elisabeth Hendrickson as originating with the misguided belief that “testers test, programmers code, and the separation of the two disciplines is important“, the traditional segregation of development and testing into separate phases has disastrous consequences for product quality and validates Jez Humble’s adage that “bad behavior arises when you abstract people away from the consequences of their actions“. When a development team has authority for changes and a testing team has responsibility for quality, there will be an inevitable increase in defects and feedback loops that will inflate lead times and increase organisational vulnerability to opportunity costs.

Release Testing - Develop and Test

Agile software development aims to solve this problem by establishing cross-functional product teams, in which testing is explicitly recognised as a continuous activity and there is a shared commitment to product quality. Developers and testers collaborate upon a testing strategy described by Lisa Crispin as the Testing Pyramid, in which Test Driven Development drives the codebase design and Acceptance Test Driven Development documents the product design. The Testing Pyramid values unit and acceptance tests over manual and end-to-end tests due to the execution times and well-publicised limitations of the latter, such as Martin Fowler stating that “end-to-end tests are more prone to non-determinism“.

Release Testing - Product Team

Given Continuous Delivery is predicated upon the optimisation of product integrity, lead times, and organisational structure in order to deliver business value faster, the creation of cross-functional product teams is a textbook example of how to optimise an organisation for Continuous Delivery. However, many organisations are prevented from fully realising the benefits of product teams due to Release Testing – a risk reduction strategy that aims to reduce defect probability via manual and/or automated end-to-end regression testing independent of the product team.

Release Testing - Release Testing

While Release Testing is traditionally seen as a guarantee of product quality, it is in reality a fundamentally flawed strategy of disproportionately costly testing due to the following characteristics:

  1. Extensive end-to-end testing – as end-to-end tests are slow and less deterministic they require long execution times and incur substantial maintenance costs. This ensures end-to-end testing cannot conceivably cover all scenarios and results in an implicit reduction of test coverage
  2. Independent testing phase – a regression testing phase brazenly re-segregates development and testing, creating a product team with authority for changes and a release testing team with responsibility for quality. This results in quality issues, longer feedback delays, and substantial wait times
  3. Critical path constraints – post-development testing must occur on the critical path, leaving release testers under constant pressure to complete their testing to a deadline. This will usually result in an explicit reduction of test coverage in order to meet expectations

As Release Testing is divorced from the development of value-add by the product team, the regression tests tend to either duplicate existing test scenarios or invent new test scenarios shorn of any business context. Furthermore, the implicit and explicit constraints of end-to-end testing on the critical path invariably prevent Release Testing from achieving any meaningful amount of test coverage or significant reduction in defect probability.

This means Release Testing has a considerable transaction cost and limited value, and attempts to reduce the costs or increase the value of Release Testing are a zero-sum game. Reducing transaction costs requires fewer end-to-end tests, which will decrease execution time but also decrease the potential for defect discovery. Increasing value requires more end-to-end tests, which will marginally increase the potential for defect discovery but will also increase execution time. We can therefore conclude that Release Testing is an example of what Jez Humble refers to as Risk Management Theatre – a process providing an artificial sense of value at a disproportionate cost:

Release Testing is high cost, low value Risk Management Theatre

To undo the detrimental impact of Release Testing upon product quality and lead times, we must heed the advice of W. Edwards Deming that “we cannot rely on mass inspection to improve quality“. Rather than try to inspect quality into each product increment, we must instead build quality in by replacing Release Testing with feedback-driven product development activities in which release testers become valuable members of the product team. By moving release testers into the product team everyone is able to collaborate in tight feedback loops, and the existing end-to-end tests can be assessed for removal, replacement, or retention. This will reduce both the wait waste and overprocessing waste in the value stream, empowering the team to focus upon valuable post-development activities such as automated smoke testing of environment configuration and the manual exploratory testing of product features.

Release Testing - Final Product Team

A far more effective risk reduction strategy than Release Testing is batch size reduction, which can attain a notable reduction in defect probability with a minimal transaction cost. Championed by Eric Ries asserting that “small batches reduce risk“, releasing smaller change sets into production more frequently decreases the complexity of each change set, therefore reducing both the probability and cost of defect occurrence. In addition, batch size reduction also improves overheads and product increment flow, which will produce a further improvement in lead times.

Release Testing is not the fault of any developer, or any tester. It is a systemic fault that causes blameless teams of individuals to be bedevilled by a sub-optimal organisational structure, that actively harms lead times and product quality in the name of risk management theatre. Ultimately, we need to embrace the inherent lessons of Agile software development and Continuous Delivery – product quality is the responsibility of everyone, and testing is an activity not a phase.

No Projects

Projects kill flow and teams. Focus on products, not projects

Since the Dawn of Computer Time, enormous sums of money and embarrassing amounts of time have been squandered upon software projects that have delivered little or no return on investment, with projects floundering between segregated Business and IT divisions squabbling over overestimated value-add and underestimated delivery dates. Given Grant Rule’s assertion that “studies too numerous to mention show that software projects are challenged or fail“, why are software projects so prone to failure and why do they persist?

To answer these questions, we must understand what constitutes a software project and why its delivery model is incongruent with product development. If we start with the PRINCE 2 project definition of “a temporary organization that is needed to produce a unique and predefined outcome or result at a pre-specified time using predetermined resources“, we can offer a concise definition as follows:

A project is a fixed amount of time and money assigned to deliver value-add

The key characteristic of a software project appears to be its fixed end date, which as a delivery model has been repeatedly debunked by IT practitioners such as Allan Kelly denouncing “endless, pointless discussions about when it will be done… successful software doesn’t have a pre-specified end date” and Marc Lankhorst arguing that “over 80% of IT spending in large organisations is on maintenance“. However, the fixed end date of a software project is invariably a consequence of its requirement for a collection of value-adding features to be simultaneously delivered, suggesting an augmented definition of:

A project is a fixed amount of time and money assigned to deliver a large batch of value-add

Once we view software projects as large batches of value-add, we can apply The Principles Of Product Development Flow by Don Reinertsen and better understand why so many projects fail:

  1. Increased cycle time – a project might not be deliverable on a particular date unless either demand is throttled or capacity is increased, e.g. artifically reduce user demand or increase staffing levels
  2. Increased variability – a project might be delayed due to unpredictable blockages in the value stream, e.g. testing of features B and C blocked while testing of feature A takes longer than expected
  3. Increased feedback delays – a project might incur significant costs due to slow feedback on bad design decisions and/or defects increasing rework, e.g. failures in feature C not detected until features A and B have passed testing
  4. Increased risk – a project might have an increased probability and cost of failure due to increased requirements/technology change, increased variation, and increased feedback delays
  5. Increased overheads  – a project might endure development inefficiencies due to increased requirements/technology change, e.g. feature C development time increased by need to understand complexity of features A and B
  6. Increased inefficiencies – a project might encounter increased transaction costs due to increased requirements/technology change e.g. feature A slow to release as features B and C also required for release
  7. Increased irresponsibility – a project might suffer from diluted responsibilities, e.g. staff member has responsibility for delivery of feature A but is unincentivised to participate in delivery of features B or C

Don also provides a compelling explanation as to why the project delivery model remains prevalent, by explaining how large batches can become institutionalised as they “appear to have scale economies that increase efficiency [and] appear to reduce variability“. Software projects might indeed appear to be efficient due to perceived value stream inefficiencies and the counter-intuitiveness of batch size reduction, but from a product development standpoint it is an inefficient, ineffective delivery model that impedes value, quality, and flow.

There is a compelling alternative to the project delivery model – product development flow, in which we apply economic theory to Lean product development practices in order to flow product designs through our organisation. Product development flow emphasises the benefits of batch size reduction and encourages a one piece continuous flow delivery model, in order to reduce costs and improve return on investment.

Discarding the project delivery model in favour of product development flow requires an entirely different mindset, as epitomised by Grant urging us to “accommodate the ideas of flow production and lean systems thinking” and Allan affirming that “BAU isn’t a dirty word… enhancing products is Business As Usual, we should be proud of that“. On that basis the No Projects movement was conceived by Joshua Arnold to promote the valuation of products over projects, and anointed as:

Projects kill flow and teams. Focus on products, not projects

« Older posts

© 2024 Steve Smith

Theme by Anders NorénUp ↑