On Tech

Tag: No Release Testing

No Release Testing

This series of articles explains why Release Testing – end-to-end regression testing on the critical path – is a wasteful practice that impedes Continuous Delivery and is unlikely to uncover business critical defects.

  1. Organisation Antipattern: Release Testing – introduces the Release Testing antipattern and why it cannot discover defects
  2. Organisation Antipattern: Consumer Release Testing – introduces the consumer-side variant of the Release Testing antipattern
  3. More Releases With Less Risk – describes how releasing smaller changesets more frequently can reduce probability and cost of failure
  4. Release Testing Is Risk Management Theatre – explains why Release Testing is so ineffective, and offers batch size reduction as an alternative

More releases with less risk

Continuous Delivery reduces defect probability and cost

Continuous Delivery often challenges conventional wisdom within the IT industry, and by advocating the rapid release of value-add to reduce risk it contradicts the traditional belief that a low release cadence is an effective risk reduction strategy. How can releasing software more frequently reduce both defect probability and defect cost?

The probability of a defect is the likelihood of a change within a changeset unexpectedly impeding value-add and imposing an opportunity cost. Given the defect probability of a changeset is proportional to its size we can calculate the defect probability of a change as follows:

Fix More With Less - Defect Probability

n = number of changesets
probability = (1 / 2n) * 100 [percentage]

The above formula indicates that decreasing changeset size by increasing the number of changesets will reduce defect probability, and this is confirmed by Don Reinertsen’s assertion that “many smaller experiments produce less variation than one big one“. For example, if a change is released in 1 changeset there is a 1 in 2 chance or 50% probability of failure. If it was instead released in 3 changesets there would be a 1 in 8 chance or 12.5% probability of failure.

The cost of a defect is the product of cost per unit time and duration, where cost per unit time represents economic impact and duration represents lifetime.

cost = cost per unit time [currency] * duration [unit time]

A defect has an inception date at its outset, a discovery date when diagnosed, and a resolution date when fixed. The interactions between these dates and cost per unit time enable a division of defect cost into sunk cost and opportunity cost. The sunk cost of a defect represents the economic damage already incurred at the point of discovery, while opportunity cost represents the economic damage still to be incurred.

Fix More With Less - Defect Cost

sunk cost duration = discovery date – inception date [unit time]
sunk cost = cost per unit time * sunk cost duration [currency]

opportunity cost duration = resolution date – discovery date [unit time]
opportunity cost = cost per unit time * opportunity cost duration [currency]

cost = sunk cost + opportunity cost [currency]

As cost per unit time is controlled by market conditions it is far easier to reduce opportunity cost duration by shortening lead times. This can be accomplished via batch size reduction, as Mary and Tom Poppendieck have observed that “time through the system is directly proportional to the amount of work-in-process” due to Little’s Law:

lead time = work in progress [units] / completion rate [units per time period]

Little’s Law is universal for all stable systems in which these variables are consistent long-term averages, and it is mathematical proof that reducing batch size will reduce lead time. For example, if a jug contains 4 litres of water and pours 2 litres per second then it will empty in 2 seconds. If instead the jug contained 2 litres of water and still poured 2 litres per second it would empty in 1 second.

Releasing smaller changesets more frequently into production can also reduce sunk cost duration, as small batches accelerate feedback. A smaller batch size will decrease the lead time and complexity associated with each changeset, creating faster feedback loops that will reduce the time required to discover a defect.

Consider an organisation with an average changeset size of 24 changes and an average lead time of 12 days. How can we reduce the defect probability of the next production release R1?

Fix More With Less - Defect Probability Smaller Changeset

n = 1
probability = (1 / 21) * 100 = 50%

Based on the binomial probabilities involved we recommend to the organisation that it reduce defect probability by applying batch size reduction to R1 and splitting its changeset into 2 smaller releases R1 and R2. This would decrease defect probability from 50% to 25%.

Fix More With Less - Defect Probability Larger Changeset

n = 2
probability = (1 / 22) * 100 = 25%

Unfortunately the organisation ignores our advice to release smaller changesets, and the release of R1 at a later date introduces a defect D1 that remains undiscovered for 6 days. D1 impedes a sufficient amount of value-add that a cost per unit time of £20,000 per day is estimated, which means a sunk cost of £120,000 has already been incurred and an opportunity cost of £240,000 is forecast. The organisation immediately triages D1 for a fix, but how can we reduce its opportunity cost?

Fix More With Less - Defect Cost Large

cost per unit time = £20,000
sunk cost = 6 days * £20,000 = £120,000
opportunity cost = 12 days * £20,000 = £240,000
overall cost = sunk cost + opportunity cost = £360,000

Given the organisation currently has an average batch size of 24 changes per changeset and a 12 day average lead time, Little’s Law computes an average completion rate of 2 changes per day and informs us that a reduced batch size of 12 changes per changeset would produce a 6 day lead time.

completion rate = work in process / lead time
completion rate = 24 changes per changeset / 12 days = 2 changes per day

lead time = work in process / completion rate
lead time = 12 changes per changeset / 2 changes per day = 6 days

Based on Little’s Law we again recommend to the organisation a halved batch size of 12 changes per changeset, and this time our advice is accepted. A fix for D1 is included in the next changeset released into production in 6 days, which produces an opportunity cost saving of £120,000.

Fix More With Less - Defect Cost Smaller Opportunity Cost

cost per unit time = £20,000
sunk cost = 6 days * £20,000 = £120,000
opportunity cost = 6 days * £20,000 = £120,000
overall cost = sunk cost + opportunity cost = £240,000

As well as decreasing the total cost of D1 by 33%, the new lead time of 6 days increases the rate of feedback for future production defects. When a subsequent release introduces defect D2 at a lower cost per unit time of £10,000 per day the reduced size and complexity of the offending changeset means D2 is discovered in only 3 days.

Fix More With Less - Defect Cost Smaller Sunk Cost

cost per unit time = £10,000
sunk cost = 3 days * £10,000 = £30,000
opportunity cost = 6 days * £10,000 = £60,000
overall cost = sunk cost + opportunity cost = £90,000

When we triage D2 we discover its cost per unit time has decreased to £1,000 per day, meaning its sunk cost is a poor indicator of opportunity cost and its Cost of Delay is lower than expected. Based upon the new 6 day lead time we recommend to the organisation that it defer a D2 fix for at least one release in order to implement pending value-add of greater value than the £12,000 opportunity cost of D2.

Fix More With Less - Defect Cost Even Smaller Opportunity Cost

cost per unit time = 3 days * £10,000, 12 days * £1,000
sunk cost = 3 days * £10,000 = £30,000
opportunity cost = 12 days * £1,000 = £12,000
overall cost = sunk cost + opportunity cost = £42,000

The assumption within many IT organisations that risk is directly proportional to rate of change is flawed, as it assumes a constant large batch size. Risk is actually proportional to size of change, and a low release cadence of large changesets is not as effective a risk reduction strategy as a high release cadence of small changesets. Continuous Delivery enables the release of smaller changesets to rapidly release value-add as well as reducing both the probability and cost of defects.

Organisation antipattern: Consumer Release Testing

Consumer Release Testing is high cost, low value risk management theatre

Despite the historical advice of Harold Dodge that “you cannot inspect quality into a product” and the contemporary advice of Don Reinertsen that “testing is probably the single most common critical-path queue” the Release Testing antipattern remains prevalent in the IT industry, and is by no means limited to standalone applications.

Consider the development of a consumer application that requires data from a provider application in order to fulfill its business capabilities. The consumer team contains developers and testers collaborating upon the Testing Pyramid strategy, which recommends unit/acceptance tests over end-to-end tests on the basis that test execution time is proportional to System Under Test scope. This means the necessary provider interactions are test-driven by the consumer team using the Test Stub pattern, which creates a lightweight provider implementation to supply canned responses back to the consumer.

Consumer Release Testing - Product Team Stubbed Provider

By using a stub the consumer interactions with the provider can be tested in a minimal System Under Test, which ensures that changes made by the consumer team produce fast and deterministic feedback. Success and failure scenarios (e.g. socket failure, socket timeout, provider error code) can be rapidly developed without relying upon a running provider instance, and the consumer team should be capable of rapidly responding to changing requirements in the future.

However, in many IT organisations the consumer team will be hindered by Consumer Release Testing – a phase of post-development end-to-end regression testing of the full consumer and provider stack, performed by a segregated testing team on the critical path.

Consumer Release Testing - Consumer Release Testing

The desire for provider risk mitigation is understandable given that consumer revenues are to an extent dependent upon the provider, but Consumer Release Testing exacerbates the original flaws of Release Testing:

  1. Extensive end-to-end testing – including both consumer and provider in System Under Test scope increases test execution time and maintenance costs
  2. Independent testing phase – dividing authority and responsibility for the consumer results in quality issues and feedback delays
  3. Critical path constraints – working on the critical path means the release testers will always be pressured to reduce test coverage to meet pre-agreed deadlines

By extending the Release Testing strategy it is evident that Consumer Release Testing is itself risk management theatre – it is highly unlikely to uncover any substantial defects in consumer/provider interactions without a significant increase in test coverage, which will drive up product lead times and opportunity costs.

A far more effective risk reduction strategy is to accept the conventional wisdom that testing is an activity not a phase, and move the blameless release testers into the consumer product team. This ensures that all team members are equally invested in product quality and empowers testers to focus upon higher-value activities such as exploratory testing, which has been described by Elisabeth Hendrickson as “particularly good at revealing vulnerabilities that no one thought to look for before“. For example, some exploratory testing off the critical path of the consumer against a running provider instance might uncover some additional error scenarios that would then be fed into the automated unit/acceptance tests.

Consumer Release Testing - Product Team Real Provider

A high value, low cost alternative to Consumer Release Testing is for the consumer and provider to actively cooperate in risk reduction, which can result in a substantial reduction in provider risk. The probability of a provider failure can be decreased by independently testing the conflated concerns of end-to-end testing as follows:

  • Connectivity: the consumer can test provider expectations of consumer connections via release time smoke tests and run time monitoring
  • Compatibility: the provider can test consumer expectations of messaging via build time Consumer Driven Contracts issued by the consumer
  • Conduct: the consumer can test its expectations of provider behaviour via build time API Examples issued by the provider

The cost of a provider failure can be reduced via incremental release strategies such as consumer-side Feature Toggles and provider-side Blue-Green Deployments. These practices encourage a provider release to be gradually phased into production usage, so that the consumer can switch back to the previous provider version if necessary.

This approach is a viable alternative to Consumer Release Testing, but it is of limited value without provider cooperation. If the provider cannot or will not participate in risk reduction then the consumer must assess risk based upon historical provider lead times. As large batch sizes increase risk an infrequent provider release schedule is indicative of heightened risk, and if the cost of failure is significant then a limited form of Consumer Release Testing may be deemed justifiable. In those circumstances the consumer development team should perform end-to-end tests off the critical path using a lightweight test client, so that the slow feedback loops and non-determinism of Consumer Release Testing are diminished.

Organisation antipattern: Release Testing

Release Testing is high cost, low value risk management theatre

Described by Elisabeth Hendrickson as originating with the misguided belief that “testers test, programmers code, and the separation of the two disciplines is important“, the traditional segregation of development and testing into separate phases has disastrous consequences for product quality and validates Jez Humble’s adage that “bad behavior arises when you abstract people away from the consequences of their actions“. When a development team has authority for changes and a testing team has responsibility for quality, there will be an inevitable increase in defects and feedback loops that will inflate lead times and increase organisational vulnerability to opportunity costs.

Release Testing - Develop and Test

Agile software development aims to solve this problem by establishing cross-functional product teams, in which testing is explicitly recognised as a continuous activity and there is a shared commitment to product quality. Developers and testers collaborate upon a testing strategy described by Lisa Crispin as the Testing Pyramid, in which Test Driven Development drives the codebase design and Acceptance Test Driven Development documents the product design. The Testing Pyramid values unit and acceptance tests over manual and end-to-end tests due to the execution times and well-publicised limitations of the latter, such as Martin Fowler stating that “end-to-end tests are more prone to non-determinism“.

Release Testing - Product Team

Given Continuous Delivery is predicated upon the optimisation of product integrity, lead times, and organisational structure in order to deliver business value faster, the creation of cross-functional product teams is a textbook example of how to optimise an organisation for Continuous Delivery. However, many organisations are prevented from fully realising the benefits of product teams due to Release Testing – a risk reduction strategy that aims to reduce defect probability via manual and/or automated end-to-end regression testing independent of the product team.

Release Testing - Release Testing

While Release Testing is traditionally seen as a guarantee of product quality, it is in reality a fundamentally flawed strategy of disproportionately costly testing due to the following characteristics:

  1. Extensive end-to-end testing – as end-to-end tests are slow and less deterministic they require long execution times and incur substantial maintenance costs. This ensures end-to-end testing cannot conceivably cover all scenarios and results in an implicit reduction of test coverage
  2. Independent testing phase – a regression testing phase brazenly re-segregates development and testing, creating a product team with authority for changes and a release testing team with responsibility for quality. This results in quality issues, longer feedback delays, and substantial wait times
  3. Critical path constraints – post-development testing must occur on the critical path, leaving release testers under constant pressure to complete their testing to a deadline. This will usually result in an explicit reduction of test coverage in order to meet expectations

As Release Testing is divorced from the development of value-add by the product team, the regression tests tend to either duplicate existing test scenarios or invent new test scenarios shorn of any business context. Furthermore, the implicit and explicit constraints of end-to-end testing on the critical path invariably prevent Release Testing from achieving any meaningful amount of test coverage or significant reduction in defect probability.

This means Release Testing has a considerable transaction cost and limited value, and attempts to reduce the costs or increase the value of Release Testing are a zero-sum game. Reducing transaction costs requires fewer end-to-end tests, which will decrease execution time but also decrease the potential for defect discovery. Increasing value requires more end-to-end tests, which will marginally increase the potential for defect discovery but will also increase execution time. We can therefore conclude that Release Testing is an example of what Jez Humble refers to as Risk Management Theatre – a process providing an artificial sense of value at a disproportionate cost:

Release Testing is high cost, low value Risk Management Theatre

To undo the detrimental impact of Release Testing upon product quality and lead times, we must heed the advice of W. Edwards Deming that “we cannot rely on mass inspection to improve quality“. Rather than try to inspect quality into each product increment, we must instead build quality in by replacing Release Testing with feedback-driven product development activities in which release testers become valuable members of the product team. By moving release testers into the product team everyone is able to collaborate in tight feedback loops, and the existing end-to-end tests can be assessed for removal, replacement, or retention. This will reduce both the wait waste and overprocessing waste in the value stream, empowering the team to focus upon valuable post-development activities such as automated smoke testing of environment configuration and the manual exploratory testing of product features.

Release Testing - Final Product Team

A far more effective risk reduction strategy than Release Testing is batch size reduction, which can attain a notable reduction in defect probability with a minimal transaction cost. Championed by Eric Ries asserting that “small batches reduce risk“, releasing smaller change sets into production more frequently decreases the complexity of each change set, therefore reducing both the probability and cost of defect occurrence. In addition, batch size reduction also improves overheads and product increment flow, which will produce a further improvement in lead times.

Release Testing is not the fault of any developer, or any tester. It is a systemic fault that causes blameless teams of individuals to be bedevilled by a sub-optimal organisational structure, that actively harms lead times and product quality in the name of risk management theatre. Ultimately, we need to embrace the inherent lessons of Agile software development and Continuous Delivery – product quality is the responsibility of everyone, and testing is an activity not a phase.

© 2024 Steve Smith

Theme by Anders NorénUp ↑