On Tech

Tag: Pattern

End-To-End Testing considered harmful

End-To-End Testing is used by many organisations, but relying on extensive end-to-end tests is fundamentally incompatible with Continuous Delivery. Why is End-To-End Testing so commonplace, and yet so ineffective? How is Continuous Testing a lower cost, higher value testing strategy?

NOTE: The latter half of this article was superseded by the talk “End-To-End Testing Considered Harmful” in September 2016

Introduction

“Good testing involves balancing the need to mitigate risk against the risk of trying to gather too much information” Jerry Weinberg

Continuous Delivery is a set of holistic principles and practices to reduce time to market, and it is predicated upon rapid and reliable test feedback. Continuous Delivery mandates any change to code, configuration, data, or infrastructure must pass a series of automated and exploratory tests in a Deployment Pipeline to evaluate production readiness, so test execution times must be low and test results must be deterministic if an organisation is to achieve shorter lead times.

For example, consider a Company Accounts service in which year end payments are submitted to a downstream Payments service.

End-To-End Testing Considered Harmful - Company Accounts

The behaviour of the Company Accounts service could be checked at build time by the following types of automated test:

  • Unit tests check intent against implementation by verifying a discrete unit of code
  • Acceptance tests check implementation against requirements by verifying a functional slice of the system
  • End-to-end tests check implementation against requirements by verifying a functional slice of the system, including unowned dependent services

While unit tests and acceptance tests vary in terms of purpose and scope, acceptance tests and end-to-end tests vary solely in scope. Acceptance tests exclude unowned dependent services, so an acceptance test of a Company Accounts user journey would use a System Under Test comprised of the latest Company Accounts code and a Payments Stub.

End-To-End Testing Considered Harmful - A Company Accounts Acceptance Test

End-to-end tests include unowned dependent services, so an end-to-end test of a Company Accounts user journey would use a System Under Test comprised of the latest Company Accounts code and a running version of Payments.

End-To-End Testing Considered Harmful - A Company Accounts End-To-End Test

If a testing strategy is to be compatible with Continuous Delivery it must have an appropriate ratio of unit tests, acceptance tests, and end-to-end tests that balances the need for information discovery against the need for fast, deterministic feedback. If testing does not yield new information then defects will go undetected, but if testing takes too long delivery will be slow and opportunity costs will be incurred.

The folly of End-To-End Testing

“Any advantage you gain by talking to the real system is overwhelmed by the need to stamp out non-determinism” Martin Fowler

End-To-End Testing is a testing practice in which a large number of automated end-to-end tests and manual regression tests are used at build time with a small number of automated unit and acceptance tests. The End-To-End Testing test ratio can be visualised as a Test Ice Cream Cone.

End-To-End Testing Considered Harmful - The Test Ice Cream Cone

End-To-End Testing often seems attractive due to the perceived benefits of an end-to-end test:

  1. An end-to-end test maximises its System Under Test, suggesting a high degree of test coverage
  2. An end-to-end test uses the system itself as a test client, suggesting a low investment in test infrastructure

Given the above it is perhaps understandable why so many organisations adopt End-To-End Testing – as observed by Don Reinertsen, “this combination of low investment and high validity creates the illusion that system tests are more economical“. However, the End-To-End Testing value proposition is fatally flawed as both assumptions are incorrect:

  1. The idea that testing a whole system will simultaneously test its constituent parts is a Decomposition Fallacy. Checking implementation against requirements is not the same as checking intent against implementation, which means an end-to-end test will check the interactions between code pathways but not the behaviours within those pathways
  2. The idea that testing a whole system will be cheaper than testing its constituent parts is a Cheap Investment Fallacy. Test execution time and non-determinism are directly proportional to System Under Test scope, which means an end-to-end test will be slow and prone to non-determinism

Martin Fowler has warned before that “non-deterministic tests can completely destroy the value of an automated regression suite“, and Stephen Covey’s Circles of Control, Influence, and Concern highlights how the multiple actors in an end-to-end test make non-determinism difficult to identify and resolve. If different teams in the same Companies R Us organisation owned the Company Accounts and Payments services the Company Accounts team would control its own service in an end-to-end test, but would only be able to influence the second-party Payments service.

End-To-End Testing Considered Harmful - A Company Accounts End-To-End Test Single Organisation

The lead time to improve an end-to-end test depends on where the change is located in the System Under Test, so the Company Accounts team could analyse and implement a change in the Company Accounts service in a relatively short lead time. However, the lead time for a change to the Payments service would be constrained by the extent to which the Company Accounts team could persuade the Payments team to take action.

Alternatively, if a separate Payments R Us organisation owned the Payments service it would be a third-party service and merely a concern of the Company Accounts team.

End-To-End Testing Considered Harmful - A Company Accounts End-To-End Test Multiple Organisations

In this situation a change to the Payments service would take much longer as the Company Accounts team would have zero control or influence over Payments R Us. Furthermore, the Payments service could be arbitrarily updated with little or no warning, which would increase non-determinism in Company Accounts end-to-end tests and make it impossible to establish a predictable test baseline.

A reliance upon End-To-End Testing is often a symptom of long-term underinvestment producing a fragile system that is resistant to change, has long lead times, and optimised for Mean Time Between Failures instead of Mean Time To Repair. Customer experience and operational performance cannot be accurately predicted in a fragile system due to variations caused by external circumstances, and focussing on failure probability instead of failure cost creates an exposure to extremely low probability, extremely high cost events known as Black Swans such as Knights Capital losing $440 million in 45 minutes. For example, if the Payments data centre suffered a catastrophic outage then all customer payments made by the Company Accounts service would fail.

End-To-End Testing Considered Harmful - Company Accounts Payments Failure

An unavailable Payments service would leave customers of the Company Accounts service with their money locked up in in-flight payments, and a slow restoration of service would encourage dissatisfied customers to take their business elsewhere. If any in-flight payments were lost and it became public knowledge it could trigger an enormous loss of customer confidence.

End-To-End Testing is an uncomprehensive, high cost testing strategy. An end-to-end test will not check behaviours, will take time to execute, and will intermittently fail, so a test suite largely composed of end-to-end tests will result in poor test coverage, slow execution times, and non-deterministic results. Defects will go undetected, feedback will be slow and unreliable, maintenance costs will escalate, and as a result testers will be forced to rely on their own manual end-to-end regression tests. End-To-End Testing cannot produce short lead times, and it is utterly incompatible with Continuous Delivery.

The value of Continuous Testing

“Cease dependence on inspection to achieve quality. Eliminate the need for inspection on a mass basis by building quality into the product in the first place” Dr W Edwards Deming

Continuous Delivery advocates Continuous Testing – a testing strategy in which a large number of automated unit and acceptance tests are complemented by a small number of automated end-to-end tests and focussed exploratory testing. The Continuous Testing test ratio can be visualised as a Test Pyramid, which might be considered the antithesis of the Test Ice Cream Cone.

End-To-End Testing Considered Harmful - The Test Pyramid

Continuous Testing is aligned with Test-Driven Development and Acceptance Test Driven Development, and by advocating cross-functional testing as part of a shared commitment to quality it embodies the Continuous Delivery principle of Build Quality In. However, Continuous Testing can seem daunting due to the perceived drawbacks of unit tests and acceptance tests:

  1. A unit test or acceptance test minimises its System Under Test, suggesting a low degree of test coverage
  2. A unit test or acceptance test uses its own test client, suggesting a high investment in test infrastructure

While the End-To-End Testing value proposition is invalidated by incorrect assumptions of high test coverage and low maintenance costs, the inverse is true of Continuous Testing – its value proposition is validated by incorrect assumptions of low test coverage and high maintenance costs:

  1. A unit test will check intent against implementation and an acceptance test will check implementation against requirements, which means both the behaviour of a code pathway and its interactions with other pathways can be checked
  2. A unit test will restrict its System Under Test scope to a single pathway and an acceptance test will restrict itself to a single service, which means both can have the shortest possible execution time and deterministic results

A non-deterministic acceptance test can be resolved in a much shorter period of time than an end-to-end test as the System Under Test has a single owner. If Companies R Us owned the Company Accounts service and Payments R Us owned the Payments service a Company Accounts acceptance test would only use services controlled by the Company Accounts team.

End-To-End Testing Considered Harmful - Acceptance Test Multiple Organisations

If the Company Accounts team attempted to identify and resolve non-determinism in an acceptance test they would be able to make the necessary changes in a short period of time. There would also be no danger of unexpected changes to the Payments service impeding an acceptance test of the latest Company Accounts code, which would allow a predictable test baseline to be established.

End-to-end tests are a part of Continuous Testing, not least because the idea that testing the constituent parts of a system will simultaneously test the whole system is a Composition Fallacy. A small number of automated end-to-end tests should be used to validate core user journeys, but not at build time when unowned dependent services are unreliable and unrepresentative. The end-to-end tests should be used for release time smoke testing and runtime production monitoring, with synthetic transactions used to simulate user activity. This approach will increase confidence in production releases and should be combined with real-time monitoring of business and operational metrics to accelerate feedback loops and understand user behaviours.

In Continuous Delivery there is a recognition that optimising for Mean Time To Repair is more valuable than optimising for Mean Time Between Failures as it enables an organisation to minimise the impact of production defects, and it is more easily achievable. Defect cost can be controlled as Little’s Law guarantees smaller production releases will shorten lead times to defect resolution, and Continuous Testing provides the necessary infrastructure to shrink feedback loops for smaller releases. The combination of Continuous Testing and Continuous Delivery practices such as Blue Green Releases and Canary Releases empower an organisation to create a robust system capable of neutralising unanticipated events, and advanced practices such as Dark Launching and Chaos Engineering can lead to antifragile systems that seek to benefit from Black Swans. For example, if Chaos Engineering surfaced concerns about the Payments service the Company Accounts team might Dark Launch its Payments Stub into production and use it in the unlikely event of a Payments data centre outage.

End-To-End Testing Considered Harmful - Company Accounts Payments Stub Failure

While the Payments data centre was offline the Company Accounts service would gracefully degrade to collecting customer payments in the Payments Stub until the Payments service was operational again. Customers would be unaffected by the production incident, and if competitors to the Company Accounts service were also dependent on the same third-party Payments service that would constitute a strategic advantage in the marketplace. Redundant operational capabilities might seem wasteful, but Continuous Testing promotes operational excellence and as Nassim Nicholas Taleb has remarked “something unusual happens – usually“.

Continuous Testing can be a comprehensive and low cost testing strategy. According to Dave Farley and Jez Humble “building quality in means writing automated tests at multiple levels“, and a test suite largely comprised of unit and acceptance tests will contain meticulously tested scenarios with a high degree of test coverage, low execution times, and predictable test results. This means end-to-end tests can be reserved for smoke testing and production monitoring, and testers can be freed up from manual regression testing for higher value activities such as exploratory testing. This will result in fewer production defects, fast and reliable feedback, shorter lead times to market, and opportunities for revenue growth.

From end-to-end testing to continuous testing

“Push tests as low as they can go for the highest return in investment and quickest feedback” Janet Gregory and Lisa Crispin

Moving from End-To-End Testing to Continuous Testing is a long-term investment, and should be based on the notion that an end-to-end test can be pushed down the Test Pyramid by decoupling its concerns as follows:

  • Connectivity – can services connect to one another
  • Conversation – can services talk with one another
  • Conduct – can services behave with one another

Assume the Company Accounts service depends on a Pay endpoint on the Payments service, which accepts a company id and payment amount before returning a confirmation code and days until payment. The Company Accounts service sends the id and amount request fields and silently depends on the code response field.

End-To-End Testing Considered Harmful - Company Accounts Pay

The connection between the services could be unit tested using Test Doubles, which would allow the Company Accounts service to test its reaction to different Payments behaviours. Company Accounts unit tests would replace the Payments connector with a Mock or Stub connector to ensure scenarios such as an unexpected Pay timeout were appropriately handled.

The conversation between the services could be unit tested using Consumer Driven Contracts, which would enable the Company Accounts service to have its interactions continually verified by the Payments service. The Payments service would issue a Provider Contract describing its Pay API at build time, the Company Accounts service would return a Consumer Contract describing its usage, and the Payments service would create a Consumer Driven Contract to be checked during every build.

End-To-End Testing Considered Harmful - Company Accounts Consumer Driven Contract

With the Company Accounts service not using the days response field it would be excluded from the Consumer Contract and Consumer Driven Contract, so a build of the Payments service that removed days or added a new comments response field would be successful. If the code response field was removed the Consumer Driven Contract would fail, and the Payments team would have to collaborate with the Company Accounts team on a different approach.

The conduct of the services could be unit tested using API Examples, which would permit the Company Accounts service to check for behavioural changes in new releases of the Payments service. Each release of the Payments service would be accompanied by a sibling artifact containing example API requests and responses for the Pay endpoint, which would be plugged into Company Accounts unit tests to act as representative test data and warn of behavioural changes.

End-To-End Testing Considered Harmful - Company Accounts API Examples

If a new version of the Payments service changed the format of the code response field from alphanumeric to numeric it would cause the Company Accounts service to fail at build time, indicating a behavioural change within the Payments service and prompting a conversation between the teams.

Conclusion

“Not only won’t system testing catch all the bugs, but it will take longer and cost more – more than you save by skipping effective acceptance testing” – Jerry Weinberg

End-To-End Testing seems attractive to organisations due to its promise of high test coverage and low maintenance costs, but the extensive use of automated end-to-end tests and manual regression tests can only produce a fragile system with slow, unreliable test feedback that inflates lead times and is incompatible with Continuous Delivery. Continuous Testing requires an upfront and ongoing investment in test automation, but a comprehensive suite of automated unit tests and acceptance tests will ensure fast, deterministic test feedback that reduces production defects, shortens lead times, and encourages the Continuous Delivery of robust or antifragile systems.

Further Reading

  1. Continuous Delivery by Dave Farley and Jez Humble
  2. Principles Of Product Development Flow by Don Reinertsen
  3. 7 Habits of Highly Effective People by Stephen Covey
  4. Test Pyramid by Martin Fowler
  5. Test Ice Cream Cone by Alister Scott
  6. Integrated Tests Are A Scam by JB Rainsberger
  7. Agile Testing and More Agile Testing by Janet Gregory and Lisa Crispin
  8. Perfect Software and Other Illusions by Jerry Weinberg
  9. Release Testing Is Risk Management Theatre by Steve Smith
  10. The Art Of Agile Development by James Shore and Shane Warden
  11. Making End-To-End Tests Work by Adrian Sutton
  12. Just Say No To More End-To-End Tests by Mike Wacker
  13. Antifragile by Nassim Nicholas Taleb
  14. On Antifragility In Systems And Organisational Architecture by Jez Humble

Acknowledgements

Thanks to Amy Phillips, Beccy Stafford, Charles Kubicek, and Chris O’Dell for their early feedback on this article.

Organisation pattern: Trunk Based Development

The Version Control Strategies series

  1. Organisation antipattern – Release Feature Branching
  2. Organisation pattern – Trunk Based Development
  3. Organisation antipattern – Integration Feature Branching
  4. Organisation antipattern – Build Feature Branching

Trunk Based Development minimises development costs and risk

Trunk Based Development is a version control strategy in which developers commit their changes to the shared trunk of a source code repository with minimal branching. Trunk Based Development became well known in the mid 2000s as Continuous Integration became a mainstream development practice, and today it is equally applicable to centralised Version Control Systems (VCS) and Distributed Version Control Systems (DVCS).

In Trunk Based Development new features are developed concurrently on trunk as a series of small, incremental steps that preserve existing functionality and minimise merge complexity. Features are always released from trunk, and defect fixes are either released from trunk or a short-lived release branch.

When development of a feature spans multiple releases its entry point is concealed to ensure the ongoing changes do not impede release cadence. The addition of a new feature can be concealed with a Feature Toggle, which means a configuration parameter or business rule is used to turn a feature on or off at runtime. As shown below a Feature Toggle is turned off while its feature is in development (v1), turned on when its feature is in production (v2), and removed after a period of time (v3).

Organisation Pattern - Trunk Based Development - Feature Toggle Step By Step

Updates to an existing feature can be concealed with a Branch By Abstraction, which means an abstraction layer is temporarily introduced to encapsulate both the old behaviour in use and the new behaviour in development. As shown below a Branch By Abstraction routes requests to the old behaviour while the new behaviour is in development (v1-v2), reroutes requests to the new behaviour when it is in production (v3), and is removed after a period of time (v4).

Organisation Pattern - Trunk Based Development - Branch By Abstraction Step By Step

Trunk Based Development is synonymous with Continuous Integration, which has been described by Jez Humble et al as “the most important technical practice in the agile canon“. Continuous Integration is a development practice where all members of a team integrate and test their changes together on at least a daily basis, resulting in a shared mindset of collaboration and an always releasable codebase. This is verified by an automated build server continuously building the latest changes, and can include pre- and post-build actions such as code reviews and auto-revert on failure.

Consider an organisation that provides an online Company Accounts Service, with its codebase maintained by a team practising Trunk Based Development and Continuous Integration. In iteration 1 two features are requested – F1 Computations and F2 Write Offs – so the team discuss their concurrent development and decide on a Feature Toggle for F1 as it is a larger change. The developers commit their changes for F1 and F2 to trunk multiple times a day, with F1 tested in its on and off states to verify its progress alongside F2.

Organisation Pattern - Trunk Based Development - Trunk Based Development 1

In iteration 2 more features – F3 Bank Details and F4 Accounting Periods – begin development. F4 requires a different downstream submissions system, so the team design a Branch By Abstraction for submissions to ensure F1 and F3 can continue with the legacy submissions system until F4 is complete. F2 is signed off and released into production with F1 still toggled off at runtime. Some changes for F3 break the build, which triggers an automatic revert and a team discussion on a better design for F3.

Organisation Pattern - Trunk Based Development - Trunk Based Development 2

In iteration 3 a production defect is found in F2, and after the defect is fixed on trunk a release branch is agreed for risk mitigation. An F2.1 release branch is created from the last commit of the F2 release, the fix is merged to the branch, and F2.1 is released into production. F4 continues on trunk, with the submissions Branch By Abstraction tested in both modes. F3 is signed off and released into production using the legacy submissions system.

Organisation Pattern - Trunk Based Development - Trunk Based Development 3

In iteration 4 F1 is signed off and its Feature Toggle is turned on in production following a release. F4 is signed off and released into production, but when the Branch By Abstraction is switched to the new submissions system a defect is found. As a result the Branch By Abstraction is reverted at runtime to the legacy submissions system, and a F4.1 fix is released from trunk.

Organisation Pattern - Trunk Based Development - Trunk Based Development 4

In this example F1, F2, F3, and F4 clearly benefit from being developed by a team collaborating on a single shared code stream. For F1 the team agrees on the why and how of the Feature Toggle, with F1 tested in both its on and off states. For F2 the defect fix is made available from trunk and everyone is aware of the decision to use a release branch for risk mitigation. For F3 the prominence of a reverted build failure encourages people to contribute to a better design. For F4 there is a team decision to create a submissions Branch By Abstraction, with the new abstraction layer offering fresh insights into the legacy system and incremental commits enabling regular feedback on the new approach. Furthermore, when the new submissions system is switched on and a defect is found in F4 the ability to revert at runtime to the legacy submissions means the Company Accounts Service can remain online with zero downtime.

This highlights the advantages of Trunk Based Development:

  • Continuous Integration – incremental commits to trunk ensure an always integrated, always tested codebase with minimal integration costs and a predictable flow of features
  • Adaptive scheduling – an always releasable codebase separates the release schedule from development efforts, meaning features can be released on demand according to customer needs
  • Collaborative design – everyone working on the same code encourages constant communication, with team members sharing responsibility for design changes and a cohesive Evolutionary Architecture
  • Operational and business empowerment – techniques such as Feature Toggle and Branch By Abstraction decouple release from launch, providing the operational benefit of graceful degradation on failure and the business benefit of Dark Launching features

Breaking down features and re-architecting an existing system in incremental steps requires discipline, planning, and ingenuity from an entire team on a daily basis, and Trunk Based Development can incur a development overhead for some time if multiple technologies are in play and/or the codebase is poorly structured. However, those additional efforts will substantially reduce integration costs and gradually push the codebase in the right direction – as shown by Dave Farley and Jez Humble praising Trunk Based Development for “the gentle, subtle pressure it applies to make the design of your software better“.

A common misconception of Trunk Based Development is that it is slow, as features take longer to complete and team velocity is often lower than expected. However, an organisation should optimise globally for cycle time not locally for velocity, and by mandating a single code stream Trunk Based Development ensures developers work at the maximum rate of the team not the individual, with reduced integration costs resulting in lower lead times.

Trunk Based Development is simple, but not easy. It has a steep learning curve but the continuous integration of small changesets into trunk will minimise integration costs, encourage collaborative design, empower runtime operational and business decisions, and ultimately drive the engine of Continuous Delivery. It is for this reason Dave Farley and Jez Humble declared “we can’t emphasise enough how important this practice is in enabling continuous delivery of valuable, working software“.

Application antipattern: Hardcoded Stub

A Hardcoded Stub constrains test determinism and execution times

When testing interactions between interdependent applications we always want to minimise the scope of the System Under Test to ensure deterministic and rapid feedback. This is often accomplished by creating a Stub of the provider application – a lightweight implementation of the provider that supplies canned API responses on demand.

For example, consider an ecommerce website with a microservice architecture. The estate includes a customer-facing Books frontend that relies upon a backend Authentication service for user access controls.

Hardcoded Stub - No Stub

As the Authentication service makes remote calls to a third party, an Authentication Stub is supplied to Books for its automated acceptance testing and manual exploratory testing.

Hardcoded Stub - Stub

A common Stub implementation is a Hardcoded Stub, in which provider behaviour is defined at build time and controlled at run time by magic inputs. For the Authentication Stub that would mean a static pool of pre-authenticated users [1], accessed by magic username via the standard Authentication API [2].

Hardcoded Stub - Hardcoded Stub Single Consumer

While the Authentication Stub has the advantage of not requiring any test setup, the implicit Books dependence upon pre-defined Authentication behaviours will impair Books test determinism and execution times:

  • Changes in the Authentication Stub can cause one to many Books tests to fail unexpectedly, increasing rework
  • Adding/removing/updating Authentication behaviours requires a new Authentication Stub release, increasing feedback loops
  • Concurrent test scenarios are constrained by the size of the Authentication Stub user pool, increasing test execution times

An inability to perform concurrent testing will have a significant impact upon lead times – parallel acceptance tests reduce build times, and parallel exploratory tests speed up tester feedback. This problem is exacerbated when multiple consumers rely on the same Hardcoded Stub, such as a Music frontend tested against the same Authentication Stub as the Books frontend. The same pool of pre-authenticated users [1] is offered to both consumers [2 and 3]

Hardcoded Stub - Hardcoded Stub Multiple Consumers

In this situation the simultaneous testing of Books and Music is bottlenecked by the pre-defined capacity of the Authentication Stub, despite their real-world independence. Test data management becomes a key issue, as testers will have to manually coordinate their use of the pre-authenticated users. A Books test could easily impact a Music test or vice versa – for example, a Books tester could accidentally lock out a user about to used by Music. Such problems can easily lead to wait times within the value stream and inflated lead times.

The root cause of these problems is the overly-contextual nature of a Hardcoded Stub. Rather than predicting test scenarios upfront and providing tightly controlled pathways through provider behaviours, a better approach is to use a Configurable Test Stub – a Configurable Test Double primed by different automated tests and/or exploratory testers to compose provider behaviours. This would mean an Authentication Stub with a private, test-only API able to create users in a desired authentication state and return their generated credentials [1a and 2a] before the standard Authenticatino API is used [1b and 2b].

Hardcoded Stub - Configurable Stub Multiple Consumers

By pushing responsibility for Authentication behaviours onto Books and Music, test data management is decentralised and tests become atomic. The Authentication Stub will have a much lower rate of change, Consumer Driven Contracts can be used to safeguard conversation integrity, and both Books and Music can parallelise their test suites to substantially reduce execution times.

A Hardcoded Stub may be an acceptable starting point for testing consumer/provider interactions, but it is unwieldy with a large test suite and unscalable with multiple consumers. A Configurable Test Stub will prevent nondeterministic test results from creeping into consumers and ensure fast feedback.

Pipeline pattern: Analysis Stage

Separate out analysis to preserve commit stage processing time

The entry point of a Continuous Delivery pipeline is its Commit Stage, and as such manages the compilation, unit testing, analysis, and packaging of source code whenever a change is committed to version control. As the commit stage is responsible for identifying defective code it represents a vital feedback loop for developers, and for that reason Dave Farley and Jez Humble recommend a commit stage that is “ideally less than five minutes and no more than ten” – if the build process is too slow or non-deterministic, the pace of development can soon grind to a halt.

Both compilation and unit testing tasks can be optimised for performance, particularly when the commit stage is hosted on a multi-processor Continuous Integration server. Modern compilers require only a few seconds for compilation, and a unit test suite that follows the Michael Feathers strategy of no database/filesystem/network/user interface access should run in parallel in seconds. However, it is more difficult to optimise analysis tasks as they tend to involve third-party tooling reliant upon byte code manipulation.

When a significant percentage of commit stage time is consumed by static analysis tooling, it may become necessary to trade-off unit test feedback against static analysis feedback and move the static analysis tooling into a separate Analysis Stage. The analysis stage is triggered by a successful run of the commit stage, and analyses the uploaded artifact(s) and source code in parallel to the acceptance testing stage. If a failure is detected the relevant pipeline metadata is updated and Stop The Line applies. That artifact cannot be used elsewhere in the pipeline and further development efforts should cease until the issue is resolved.

For example, consider an organisation that has implemented a standard Continuous Delivery pipeline. The commit stage has an average processing time of 5 minutes, of which 1 minute is spent upon static analysis.

Over time the codebase grows to the extent that commit stage time increases to 6 minutes, of which 1 minute 30 seconds is spent upon static analysis. With static analysis time growing from 20% to 25% the decision is made to create a separate Analysis stage, which reduces commit time to 4 minutes 30 seconds and improves the developer feedback loop.

Static analysis is the definitive example of an automated task that periodically needs human intervention. Regardless of tool choice there will always be a percentage of false positives and false negatives, and therefore a pipeline that implements an Analysis Stage must also offer a capability for an authenticated human user to override prior results for one or more application versions.

Organisation pattern: Trunk Based Development Branching

Trunk Based Development supports Optimistic and Pessimistic Release Branching

Trunk Based Development is a style of software development in which all developers commit their changes to a single shared trunk in source control, and every commit yields a production-ready build. It is a prerequisite for Continuous Delivery as it ensures that all code is continuously integrated into a single workstream, that developers always work against the latest code, and that merge/integration pain is minimised. Trunk Based Development is compatible with a Release Branching strategy of short-lived release branches that are used for post-development defect fixes. That strategy might be optimistic and defer branch creation until a defect occurs, or be pessimistic and always create a release branch.

For example, consider an application developed using Trunk Based Development. The most recent commits to trunk were source revisions a and b which yielded application versions 610 and 611 respectively, and version 610 is intended to be the next production release.

Trunk Based Development Branching - Optimistic Release Branching

With Optimistic Release Branching, the release of version 610 is immediate as there is no upfront branching. If a defect is subsequently found then a decision must be made where to commit the fix, as trunk has progressed since 610 from a to b. If the risk of pulling forward from a to b is acceptable then the simple solution is to commit the fix to trunk as c, and consequently release version 612.

Trunk Based Development Branching - Optimistic Release Branching Low Risk Defect

However, if the risk of pulling forward from a to b is unacceptable then a 610.x release branch is created from a, with the fix committed to the branch as c and released as version 610.1. That fix is then merged back into trunk as d to produce the next release candidate 612, and the 610.x branch is earmarked for termination.

Trunk Based Development Branching - Optimistic Release Branching High Risk Defect

With Pessimistic Release Branching, the release of version 610 is accompanied by the upfront creation of a 610.x release branch in anticipation of defect(s). If a defect is found in version 610 then as with Optimistic Branching a decision must be made as to where the defect fix should be committed. If the risk of pulling forward from a to b is deemed insignificant then trunk can be pulled forward from a to b and the fix committed to trunk as c for release as version 612. The 610.x branch is therefore terminated without ever being used.

Trunk Based Development Branching - Pessimistic Release Branching Low Risk Defect


If on the other hand the risk is deemed significant then the fix is committed to the 610.x branch as c and released as version 610.1. The fix is merged back into trunk as d and version 612, which will also receive its own branch upon release.

Trunk Based Development Branching - Pessimistic Release Branching High Risk Defect

The choice between Optimistic Branching and Pessimistic Branching for Trunk Based Development is dependent upon product quality and lead times. If product quality is poor and lead times are long, then the upfront cost of Pessimistic Branching may be justifiable. Alternatively, if post-development defects are rare and production releases are frequent then Optimistic Branching may be preferable.

Application pattern: Proxy Consumer Driven Contracts

Proxy Consumer Driven Contracts simulate provider-side testing

When Continuous Delivery is applied to an estate of interdependent applications we want to release as frequently as possible to minimise batch size and lead times, and Consumer Driven Contracts is an effective build-time method of verifying consumer-provider interactions. However, Consumer Driven Contracts assumes the provider will readily share responsibility for conversation integrity with the consumer, and that can be difficult to accomplish if the provider has different practices and/or is a third-party organisation. How can Consumer Driven Contracts be adapted to an environment in which the provider cannot or will not engage in joint risk mitigation?

For example, consider an organisation in which the applications Consumer A and Consumer B are dependent upon a customer details capability offered by a third-party Provider application. The Provider Contract is communicated by the third-party organisation as an API document, and it is used by Consumer A and Consumer B to consume name/address and name/address/email respectively. Consumer Contracts are made available to the provider for testing, but as the third-party organisation has more lucrative customers and a slower release cycle it is is unwilling to engage in Consumer Driven Contracts.

Consumer Driven Contracts - Consumer Contracts

In this situation a new Provider release could cause a runtime communications failure with Consumer A and/or Consumer B with little or no warning. It is likely that some form of consumer-side integration testing will have to be adopted as a risk reduction strategy, unless the provider role within Consumer Driven Contracts can be simulated.

In Proxy Consumer Driven Contracts a provider proxy is created that assumes responsibility for testing the Consumer Driven Contracts as if it was the provider itself. Consumer A and Consumer B supply their Consumer Contracts to a Provider Proxy, which acts in a provider role.

Consumer Driven Contracts - Proxy Consumer Driven Contracts

When a new Provider release is announced by the third-party organisation the updated API document is stored within the Provider Proxy source tree. This triggers a Provider Proxy commit build that will transform the human-readable Provider Contract into a machine-readable equivalent, and use it to validate the Consumer Driven Contracts previously supplied by Consumer A and Consumer B. This means incompatibilities can be detected merely seconds after new Provider documentation is supplied by the third-party organisation.

Proxy Consumer Driven Contracts incur a greater maintenance cost than Consumer Driven Contracts, but if the provider cannot or will not become involved in risk reduction then a simulation of provider-side unit testing should be preferable to a lengthy period of consumer-side integration testing.

Pipeline Pattern: Stage Strangler

The Strangler Pattern reduces the pipeline entry cost for multiple applications

When adding an application into a Continuous Delivery pipeline, we must assess its compatibility with the Repeatable Reliable Process already used by the pipeline to release application artifacts. If the new application produces artifacts that are deemed incompatible, then we can use a Artifact Interface to hide the implementation details. However, if the new application has an existing release mechanism that is radically different, then we must balance our desire for a uniform Repeatable Reliable Process with business expectations.

Assuming that the rationale for pipelining the new application is to de-risk its release process and improve its time-to-market, spending a significant amount of time re-engineering the pipeline and/or application would conflict with Bodart’s Law and harm our value proposition. In this situation we should be pragmatic and adopt a separate, application-specific Repeatable Reliable Process and manage the multiple release mechanisms within the pipeline via a Stage Interface and the Strangler Pattern.

The Strangler Pattern is a legacy code pattern named after Strangler Fig plants, which grow in rainforests where there is intense competition for sunlight. Strangler plants germinate in the rainforest canopy, growing down and around a host tree an inch at a time until the roots are reached and the host tree dies. The Strangler Pattern uses this as an analogy to describe how to replace legacy systems, with a Strangler application created to wrap around the legacy application and gradually replace it one feature at a time until decommissioning. The incremental progress of the Strangler Pattern facilitates a higher release cadence and de-risks system cutover, as well as allowing new features to be developed alongside the transfer of existing features.

To use the Strangler Pattern in Continuous Delivery, we first define a Stage Interface as follows:

Stage#run(Application, Version, Environment)

For each pipeline stage we can then create a default implementation to act as the Repeatable Reliable Process for as many applications as possible, and consider each incoming application on its merits. If the existing release mechanism of a new application is unwanted, then we can use our default stage implementation. If the legacy release mechanism retains some value or is too costly to replace at this point in time, then we can use our Stage Interface to conceal a fresh implementation that wraps around the legacy release mechanism until a strangulation time of our choosing.

In the below example, our pipeline supports three applications – Apples, Oranges, and Pears. Apples and Oranges delegate to their own specific implementations, whereas Pears uses our standard Repeatable Reliable Process. A deploy of Apples will delegate to the Apples-specific pipeline stage implementation, which wraps the Apples legacy release mechanism.

In a similar fashion, deploying Oranges to an environment will delegate to the Oranges-specific pipeline stage implementation and its legacy release mechanism.

Whereas deploying Pears to an environment uses the standard Repeatable Reliable Process.

If and when we consider it valuable, we can update the pipeline and/or Apples application to support the standard Repeatable Reliable Process and subsequently strangle the Apples-specific pipeline stage implementation. Both Apples and Pears are unaffected by this change.

Finally, we can strange the Oranges-specific pipeline stage implementation at a time of our choosing and attain a single Repeatable Reliable Process for all applications.

It is important to note that if the legacy pipeline stage implementations are never strangled, it is unimportant as a significant amount of return on investment has still been delivered. Our applications are managed by our Continuous Delivery pipeline with a minimum of integration effort and a minimum of impact upon both applications and pipeline.

Pipeline Pattern: Artifact Container

A pipeline should be decoupled from artifact content

Note – this pattern was previously known as Binary Interface

In a Continuous Delivery pipeline, a simple Commit stage implementation may equate an application artifact with the compiled artifact(s) e.g. a JAR or a WAR:

Binaries in Single Application Pipeline

This approach may suffice for a single application pipeline, but the coupling between start/stop behaviour and artifact file type means that details of java -jar, $CATALINA_HOME/bin/startup.sh, etc. seep into the pipeline start/stop stages and Operations documentation for manually starting/stopping artifacts. This becomes more of an issue when a pipeline manages multiple applications comprised of different web server technologiesdifferent build tools, and/or different programming languages:

Each new artifact type introduced into the pipeline requires a notable increase in complexity, as conditional behaviour must be incorporated into different pipeline stages and Operations must retain knowledge of multiple start/stop methods. This threatens the Continuous Delivery principle of Repeatable Reliable Process and is a significant barrier to pipeline scalability.

The solution is to introduce a Artifact Container as the output of the Commit Stage, so that artifacts appear identical to the pipeline:

The advantage of this strategy is that it minimises the amount of application-specific knowledge that can leak into the pipeline, empowering development teams to use whatever tools they deem necessary regardless of release management. A change in web server, build tool, or programming language should not necessitate a new pipeline version.

Pipeline Pattern: Aggregate Artifact

Aggregate Artifacts can incrementally deliver complex applications

When pipelining inter-dependent applications, the strength of the pipeline architecture directly correlates to the assembly cost and scalability of the packaging solution. If the Uber-Artifact approach is tacitly accepted as a poor implementation choice, is there an alternative?

The inherent value of any packaging solution is the version manifest mapping of package name/version to constituent artifacts, and there is no reason why that manifest cannot be managed as an artifact itself. In terms of Domain-Driven Design a version manifest is a naturally occurring Aggregate, with the package name/version equating to an Aggregate Root and the constituent artifacts represented as Entities, suggesting a name of Aggregate Artifact.

In an Aggregation Pipeline, the multiple pipelines of an Integration Pipeline are collapsed into a single pipeline with multiple commit stages. A successful commit of a constituent artifact triggers the commit of an Aggregate Artifact containing the new constituent version to the binary repository. At a later date the release stage fetches the aggregate artifact and examines the pipeline metadata for each constituent. Each constituent already known to the target environment is ignored, while the previously unknown constituents are released.

There are a number of advantages to this approach:

  • Consistent release mechanism. Whether a artifact is released independently or as part of an aggregate, the same process can be used
  • No duplication of artifact persistence. Committing an aggregate artifact to the binary repository does not necessitate the re-persistence of its constituents
  • High version visibility. An aggregate artifact is human and machine readable and can be published in multiple formats e.g. email, PDF/HTML release notes
  • Lightweight incremental release process. As an aggregate artifact is a manifest a version diff with earlier releases is easy to implement

As Aggregate Artifact persistence can be as low-tech as a properties file, the cost of the aggregate commit stage is extremely low. This means that a single Aggregate Artifact can scale to support many constituents (of which some may be Aggregate Artifacts themselves), and that failure scenarios can be easily handled.

For example, if a release of Fruit Basket 1.0 fails with the successful constituent Apples 23 and the unsuccessful constituent Oranges 49, then Stop The Line applies to Fruit Basket 1.0 and Oranges 49. Once a fix has been committed for Oranges 49, a new Fruit Basket 1.1 aggregate containing Oranges 50 and the previously successful Apples 23 can be quickly created and incrementally released to the environment.

© 2024 Steve Smith

Theme by Anders NorénUp ↑