On Tech

Category: Continuous Delivery (Page 3 of 7)

Organisation antipattern: Integration Feature Branching

The Version Control Strategies series

  1. Organisation antipattern – Release Feature Branching
  2. Organisation pattern – Trunk Based Development
  3. Organisation antipattern – Integration Feature Branching
  4. Organisation antipattern – Build Feature Branching

Integration Feature Branching is overly-costly and unpredictable

Integration Feature Branching is a version control strategy where developers commit their changes to a shared remote branch of a source code repository prior to the shared trunk. Integration Feature Branching is applicable to both centralised Version Control Systems (VCS) and Distributed Version Control Systems (DVCS), with multiple variants of increasing complexity:

  • Type 1 – Integration branch and Trunk. This was originally used with VCSs such as Subversion and TFS
  • Type 2 – Feature branches, an Integration branch, and Trunk. This is used today with DVCSs such as Git and Mercurial
  • Type 3 – Feature release branches, feature branches, an Integration branch, and Trunk. This is advocated by Git Flow

In all Integration Feature Branching variants Trunk represents the latest production-ready state and Integration represents the latest completed changes ready for release. New features are developed on Integration (Type 1), or short-lived feature branches cut from Integration and merged back into Integration on completion (Types 2 and 3). When Integration contains a new feature it is merged into Trunk for release (Types 1 and 2), or a short-lived feature release branch cut from Integration and merged into Trunk and Integration on release (Type 3). When a production defect occurs it is fixed on a release branch cut from Trunk, then merged back to Integration (Types 1 and 2) or a feature release branch if one exists (Type 3).

Consider an organisation that provides an online Company Accounts Service, with its codebase maintained by a team practising Type 2 Integration Feature Branching. Initially two features are requested – F1 Computations and F2 Write Offs – so F1 and F2 feature branches are cut from Integration and developers commit their changes to F1 and F2.

Organisation Antipattern - Integration Feature Branching - Type 2 - 1

Two more features – F3 Bank Details and F4 Accounting Periods – then begin development, with F3 and F4 feature branches cut from Integration and developers committing to F3 and F4. F2 is completed and merged into Integration, and after testing it is merged into Trunk and regression tested before its production release. The F1 branch is briefly broken by a computations refactoring, with no impact on Integration.

Organisation Antipattern - Integration Feature Branching - Type 2 - 2

When F3 is completed it is merged into Integration + F2 and tested, but in the meantime a production defect is found in F2. A F2.1 fix is made on a F2.1 release branch cut from Trunk + F2, and after its release F2.1 is merged into and regression tested on both Integration + F2 + F3 and Trunk + F2. F3 is then merged into Trunk and regression tested, after which it is released into production. F1 continues development, and the F4 branch is temporarily broken by changes to the submissions system.

Organisation Antipattern - Integration Feature Branching - Type 2 - 3

When F1 is completed and merged into Integration + F2 + F3 + F2.1 it is ready for production release, but a business decision is made to release F4 first. F4 is completed and after being merged into and tested on both Integration + F2 + F3 + F2.1 + F1 and Trunk + F2 + F3 + F2.1 it is released into production. Soon afterwards F1 is merged into and regression tested on Trunk + F2 + F2.1 + F3, then released into production. A production defect is found in F4, and a F4.1 fix is made on a release branch cut from Trunk + F2 + F2.1 + F3 + F4 + F1. Once F4.1 is released it is merged into and regression tested on both Integration + F2 + F3 + F2.1 + F1 + F4 and Trunk + F2 + F2.1 + F3 + F4 + F1.

Organisation Antipattern - Integration Feature Branching - Type 2 - 4

In this example F1, F2, F3, and F4 all enjoy uninterrupted development on their own feature branches. The use of an Integration branch reduces the complexity of each merge into Trunk, and allows the business stakeholders to re-schedule the F1 and F4 releases when circumstances change. However, the isolated development of F1, F2, F3, and F4 causes complex, time-consuming merges into Integration, and Trunk requires regression testing as it can differ from Integration – such as F4 being merged into Integration + F2 + F3 + F2.1 + F1 and Trunk + F2 + F2.1 + F3. The Company Accounts Service team might have used Promiscuous Integration on feature release to reduce the complexity of merging into Integration, but there would still be a need for regression testing on Trunk.

Organisation Antipattern - Integration Feature Branching - Type 2 - 4 Promiscuous

If the Company Accounts Service team used Type 3 Integration Feature Branching the use of feature release branches between Integration and Trunk could reduce the complexity of merging into Trunk, but regression testing would still be required on Trunk to garner confidence in a production release. Type 3 Integration Feature Branching also makes the version control strategy more convoluted for developers, as highlighted by Adam Ruka criticising Git Flow’s ability to “create more useless merge commits that make your history even less readable, and add significant complexity to the workflow“.

Organisation Antipattern - Integration Feature Branching - Type 3 - 4 Promiscuous

The above example shows how Integration Feature Branching adds a costly, unpredictable phase into software development for little gain. The use of an Integration branch in Type 1 creates wasteful activities such as Integration merges and Trunk regression testing, which insert per-feature variability into delivery schedules. The use of feature branches in Type 2 discourages collaborative design and refactoring, leading to a gradual deterioration in codebase quality. The use of feature release branches in Type 3 lengthens feedback loops, increasing rework and lead times when defects occur.

Integration Feature Branching is entirely incompatible with Continuous Integration. Continuous Integration requires every team member to integrate and test their code on Trunk at least once a day in order to minimise feedback loops, and Integration Feature Branching is the polar opposite of this. While Integration Feature Branching can involve commits to Integration on a daily basis and a build server constantly verifying both Integration and Trunk integrity, it is vastly inferior to continuously integrating changes into Trunk. As observed by Dave Farley, “you must have a single shared picture of the state of the system… there is no point having a separate integration branch“.

Organisation pattern: Trunk Based Development

The Version Control Strategies series

  1. Organisation antipattern – Release Feature Branching
  2. Organisation pattern – Trunk Based Development
  3. Organisation antipattern – Integration Feature Branching
  4. Organisation antipattern – Build Feature Branching

Trunk Based Development minimises development costs and risk

Trunk Based Development is a version control strategy in which developers commit their changes to the shared trunk of a source code repository with minimal branching. Trunk Based Development became well known in the mid 2000s as Continuous Integration became a mainstream development practice, and today it is equally applicable to centralised Version Control Systems (VCS) and Distributed Version Control Systems (DVCS).

In Trunk Based Development new features are developed concurrently on trunk as a series of small, incremental steps that preserve existing functionality and minimise merge complexity. Features are always released from trunk, and defect fixes are either released from trunk or a short-lived release branch.

When development of a feature spans multiple releases its entry point is concealed to ensure the ongoing changes do not impede release cadence. The addition of a new feature can be concealed with a Feature Toggle, which means a configuration parameter or business rule is used to turn a feature on or off at runtime. As shown below a Feature Toggle is turned off while its feature is in development (v1), turned on when its feature is in production (v2), and removed after a period of time (v3).

Organisation Pattern - Trunk Based Development - Feature Toggle Step By Step

Updates to an existing feature can be concealed with a Branch By Abstraction, which means an abstraction layer is temporarily introduced to encapsulate both the old behaviour in use and the new behaviour in development. As shown below a Branch By Abstraction routes requests to the old behaviour while the new behaviour is in development (v1-v2), reroutes requests to the new behaviour when it is in production (v3), and is removed after a period of time (v4).

Organisation Pattern - Trunk Based Development - Branch By Abstraction Step By Step

Trunk Based Development is synonymous with Continuous Integration, which has been described by Jez Humble et al as “the most important technical practice in the agile canon“. Continuous Integration is a development practice where all members of a team integrate and test their changes together on at least a daily basis, resulting in a shared mindset of collaboration and an always releasable codebase. This is verified by an automated build server continuously building the latest changes, and can include pre- and post-build actions such as code reviews and auto-revert on failure.

Consider an organisation that provides an online Company Accounts Service, with its codebase maintained by a team practising Trunk Based Development and Continuous Integration. In iteration 1 two features are requested – F1 Computations and F2 Write Offs – so the team discuss their concurrent development and decide on a Feature Toggle for F1 as it is a larger change. The developers commit their changes for F1 and F2 to trunk multiple times a day, with F1 tested in its on and off states to verify its progress alongside F2.

Organisation Pattern - Trunk Based Development - Trunk Based Development 1

In iteration 2 more features – F3 Bank Details and F4 Accounting Periods – begin development. F4 requires a different downstream submissions system, so the team design a Branch By Abstraction for submissions to ensure F1 and F3 can continue with the legacy submissions system until F4 is complete. F2 is signed off and released into production with F1 still toggled off at runtime. Some changes for F3 break the build, which triggers an automatic revert and a team discussion on a better design for F3.

Organisation Pattern - Trunk Based Development - Trunk Based Development 2

In iteration 3 a production defect is found in F2, and after the defect is fixed on trunk a release branch is agreed for risk mitigation. An F2.1 release branch is created from the last commit of the F2 release, the fix is merged to the branch, and F2.1 is released into production. F4 continues on trunk, with the submissions Branch By Abstraction tested in both modes. F3 is signed off and released into production using the legacy submissions system.

Organisation Pattern - Trunk Based Development - Trunk Based Development 3

In iteration 4 F1 is signed off and its Feature Toggle is turned on in production following a release. F4 is signed off and released into production, but when the Branch By Abstraction is switched to the new submissions system a defect is found. As a result the Branch By Abstraction is reverted at runtime to the legacy submissions system, and a F4.1 fix is released from trunk.

Organisation Pattern - Trunk Based Development - Trunk Based Development 4

In this example F1, F2, F3, and F4 clearly benefit from being developed by a team collaborating on a single shared code stream. For F1 the team agrees on the why and how of the Feature Toggle, with F1 tested in both its on and off states. For F2 the defect fix is made available from trunk and everyone is aware of the decision to use a release branch for risk mitigation. For F3 the prominence of a reverted build failure encourages people to contribute to a better design. For F4 there is a team decision to create a submissions Branch By Abstraction, with the new abstraction layer offering fresh insights into the legacy system and incremental commits enabling regular feedback on the new approach. Furthermore, when the new submissions system is switched on and a defect is found in F4 the ability to revert at runtime to the legacy submissions means the Company Accounts Service can remain online with zero downtime.

This highlights the advantages of Trunk Based Development:

  • Continuous Integration – incremental commits to trunk ensure an always integrated, always tested codebase with minimal integration costs and a predictable flow of features
  • Adaptive scheduling – an always releasable codebase separates the release schedule from development efforts, meaning features can be released on demand according to customer needs
  • Collaborative design – everyone working on the same code encourages constant communication, with team members sharing responsibility for design changes and a cohesive Evolutionary Architecture
  • Operational and business empowerment – techniques such as Feature Toggle and Branch By Abstraction decouple release from launch, providing the operational benefit of graceful degradation on failure and the business benefit of Dark Launching features

Breaking down features and re-architecting an existing system in incremental steps requires discipline, planning, and ingenuity from an entire team on a daily basis, and Trunk Based Development can incur a development overhead for some time if multiple technologies are in play and/or the codebase is poorly structured. However, those additional efforts will substantially reduce integration costs and gradually push the codebase in the right direction – as shown by Dave Farley and Jez Humble praising Trunk Based Development for “the gentle, subtle pressure it applies to make the design of your software better“.

A common misconception of Trunk Based Development is that it is slow, as features take longer to complete and team velocity is often lower than expected. However, an organisation should optimise globally for cycle time not locally for velocity, and by mandating a single code stream Trunk Based Development ensures developers work at the maximum rate of the team not the individual, with reduced integration costs resulting in lower lead times.

Trunk Based Development is simple, but not easy. It has a steep learning curve but the continuous integration of small changesets into trunk will minimise integration costs, encourage collaborative design, empower runtime operational and business decisions, and ultimately drive the engine of Continuous Delivery. It is for this reason Dave Farley and Jez Humble declared “we can’t emphasise enough how important this practice is in enabling continuous delivery of valuable, working software“.

What Is Continuous Delivery

What is Continuous Delivery, and how could it help your organisation?

Continuous Delivery is often cited by organisations with a high performance IT capability, yet it is difficult to find a concise explanation. Executives and managers want to understand its business case, and practitioners want to understand its impact.

Introduction

“Software is eating the world” – Marc Andreessen

If there is one constant in the 21st Century, it is the ever-accelerating rate of technology change. British Gas manages 75,000 thermostats onlineTesla delivers over-the-air fixes to 29,000 cars, and with the Deloitte Shift Index reporting the last 50 years have seen the life expectancy of a Fortune 500 company decline from 75 years to 15 years it is evident the business world has been permanently disrupted by the ubiquity of software. Every business is an IT business now.

If an organisation wants a competitive advantage today it must strategically position IT at the core of its business to rapidly deliver new capabilities and respond to customer demands faster. If successful the rewards are great, as shown by the 2014 State Of DevOps Report declaring organisations with a high performance IT capability were twice as likely to exceed profitability, market share, and productivity goals. However, in many organisations IT has historically been a cost centre used solely for efficiency gains, and as a result years of underinvestment in software delivery must now be rectified.

The Last Mile

The Last Mile is a term used in IT to describe the value stream in an organisation from development to production, and was originally used in the telecoms industry to describe broadband provisioning from the telephone exchange to the customer premises. In IT the Last Mile is traditionally formed of discrete sequential phases, with work managed via a phase-gate project process.

What Is Continuous Delivery - Last Mile

In telecoms a long Last Mile to the customer means a slower, inferior service and IT is no different – many quality problems occur in the Last Mile, and a longer time to market means slower customer feedback and increased opportunity costs. Unfortunately many organisations have not invested in the Last Mile for years and consequently rely upon a manual release process with developers, testers, system administrators, and database administrators occupied by the following tasks:

  • Manual configuration – system administrators manually configure releases, causing runtime application errors
  • Manual infrastructure – system administrators manually provision servers and networks, introducing environmental errors
  • Manual testing – testers manually regression test releases without production reference data, leading to slow and inaccurate feedback
  • Manual database management – database administrators manually apply unversioned database scripts, resulting in faulty and/or nonperformant changes
  • Manual operations – systems administrators manually deploy, start, and stop releases, increasing potential for human error
  • Manual monitoring – system administrators manually configure operational checks, leading to unreliable alerts
  • Manual rollback – systems administrators manually unwind releases upon failure, leaving an inconsistent system state
  • Manual audit – testers, database administrators, and systems administrators manually log their actions, producing an inaccurate audit trail

Such a release process is likely to be controlled by a heavyweight change management process, with extensive documentation required and collaboration between teams restricted to a ticketing system. Production releases will take hours or even days, and will be conducted out of business hours due to fear of failure. This results in a stressful, infrequent release process that increases costs and prevents organisations from delivering features to customers on predictable timelines.

For example, consider an organisation with fortnightly iterations and quarterly production releases. If a product increment is estimated to generate £70,000 per week and is not released for 3 months of 28 days it will incur an opportunity cost of £840,000. In this scenario it is unlikely the Last Mile activities performed would outweigh the opportunity cost.

What Is Continuous Delivery - Last Mile Opportunity Cost

The IT capability of an organisation can be measured by its cycle time, which is the average lead time from development to production. This is sometimes framed as the Poppendieck question:

“How long would it take your organisation to deploy a change that involves just one single line of code?” Mary Poppendieck

A low cycle time can furnish an organisation with a short, repeatable, and reliable Last Mile that represents a strategic advantage over the competition – and Continuous Delivery is the means to accomplish that goal.

Continuous Delivery

Inspired by the first principle of the Agile Manifesto stating “our highest priority is to satisfy the customer through early and continuous delivery of valuable software“, Continuous Delivery is a set of holistic principles and practices that advocate automating a deployment pipeline to rapidly and reliably release software into production. Creating a Continuous Delivery pipeline enables smaller, more frequent production releases that show the business what customers want – and do not want – much faster, reducing opportunity costs and increasing product revenues.

The seminal book “Continuous Delivery” by Dave Farley and Jez Humble describes how Continuous Delivery is composed of the following principles:

Principle Description
Repeatable Reliable Process Use the same deterministic release mechanism in all environments
Automate Almost Everything Automate as much of the release workflow as possible
Keep Everything In Version Control Version control code, config, schemas, infrastructure, etc.
Build Quality In Develop and test work items in a single continuous activity
Bring The Pain Forward Increase cadence of infrequent, costly events to reduce errors
Done Means Released Do not consider a feature complete until it is in production
Everybody Is Responsible Align individuals and teams with the release process
Continuous Improvement Continuously improve the people and technology involved

It is important to note that only the first 3 of these principles are technology-focussed, which is an early indication of the impact of organisational structures and communication pathways upon Continuous Delivery.

The Deployment Pipeline

At the heart of Continuous Delivery is the Deployment Pipeline pattern, which extends the development practice of Continuous Integration to establish an automated workflow of build, test, and release activities from checkin to production. Any change to code, configuration, infrastructure, reference data, or database schema triggers a pipeline run, which packages a new artifact version and stores it in the artifact repository. That artifact is then subjected to a series of automated and exploratory tests to evaluate its production readiness, progressing to the next stage on success or halting upon failure. The result of this rigorous and repeatable testing process is a release candidate that meets well-defined quality standards and instils confidence prior to production release.

A Continuous Delivery pipeline

Automating a deployment pipeline offers the following advantages:

  • Automated configuration – application behaviour is modified consistently, reducing runtime errors
  • Automated infrastructure – operating systems, middleware, and networks are automatically provisioned, preventing environmental errors
  • Automated testing – acceptance and performance tests are automated with production reference data, providing fast feedback
  • Automated database management – database scripts are versioned and applied from development to production, uncovering errors sooner
  • Automated operations – releases are a push button process for authenticated users, reducing human error
  • Automated monitoring – operational checks are automated, increasing confidence in the production environment
  • Automated rollback – rollback uses the standard release mechanism, ensuring a reliable rollback on error
  • Automated statistics – lead time data is easily collected, enabling visualisation of metrics such as cycle time
  • Automated audit – all actions are recorded, forming a timely and accurate audit trail

A deployment pipeline tightens feedback loops, reduces error rates, and automates repetitive tasks so that humans are freed up to work on higher-value activities – testers can perform exploratory testing, database administrators can plan for capacity, and systems administrators can create business-facing monitoring checks to learn from customers. Automated configuration and automated infrastructure provisioning are particularly important as they allow operational changes to be developed, tested, and released using exactly the same process as functional changes.

There is a wide range of practices that can be applied throughout a deployment pipeline, including the following

Practice Description Advantage
Build Artifacts Once Build immutable artifacts on commit Prevents compiler errors post-development
Stop The Line Prioritise a releaseable codebase over new features Improves flow of features to production on predictable timelines
Trunk Based Development Commit changes directly to trunk Minimises merge costs and encourages good design practices
Feature Toggle Turn features on/off at runtime Reduces operational fragility and permits fine-grained feature launches
Atomic Tests Write automated tests that own all their data Enables tests to be parallelised and speed up developer feedback
Consumer Driven Contracts Test consumer/provider interactions as unit tests Shrinks integration costs and constantly tests conversational integrity
Separate Schema And Data Split schema and reference data origins Improves reliability of database changes and data quality
Expand/Contract Stagger constructive and destructive schema changes Enables database migrations to occur with zero downtime
Blue Green Releases Perform new release into standby production servers Allows production server upgrades to occur with zero downtime
Canary Releasing Perform new release one server at a time Reduces risk of production upgrade error affecting customers

Together these practices enable an organisation to rapidly develop, test, and release high quality software with a low error rate and zero downtime in production.

Batch Size Reduction

Once a deployment pipeline is established it can be used to reduce batch size and release smaller changesets into production more frequently, resulting in a lower cycle time that delivers value-add faster at a lower level of risk. In the earlier example, assume the product increment with an estimated value of £70,000 is comprised of 4 features and the estimated failure probability is 1 in 2 (50%).

What Is Continuous Delivery - Large Batch Size

Now assume the organisation has adopted Continuous Delivery to the extent it can release 1 feature at a time every 3 weeks via its deployment pipeline. A smaller batch size reduces changeset complexity, so more accurate estimates of value-add and risk per release become possible – and as product development is heterogeneous different features offer different amounts of value-add and risk, meaning releases can be ordered by relative value-add and risk.

What Is Continuous Delivery - Small Batch Size

This breakdown of value-add and risk shows that Feature 2 should be released prior to Feature 1, as it will realise 50% of the original value-add in 25% of the original timeline with only a 6.25% failure probability. Exploratory testing of Features 3 and 4 can be increased to mitigate their heightened risk, and if a production defect does occur the smaller changeset size and lower cycle time means defects can be identified and resolved at a much lower cost.

Combining small production releases with a well-defined authorisation model and automated audit trail means a deployment pipeline can be a definitive compliance tool entirely compatible with ITIL. The ITIL v3 Service Transition definition of a Standard Change as a pre-authorised, low risk, and common change matches the Continuous Delivery concept of frequently releasing small changesets into production. The change management approval process can be implemented as an automated button push for authorised users, and in the rare situation where a small production release is not possible a more involved Normal Change procedure can be followed.

Organisational Change

While a wide range of tools are available to build a deployment pipeline – such as Zookeeper for configuration management, Puppet/AWS/Docker for infrastructure provisioning, Capistrano for deployments, and Graphite/Logstash for traceability – tool selection will not have a significant impact upon the success or failure of a Continuous Delivery programme. The central challenge of Continuous Delivery is undoubtedly organisational change, as many organisations consist of siloed teams with their own local incentives and priorities. Business stakeholders handover requirements to Development, who handover release candidates to Testing, who handover releases to Operations, who handover features to customers – and those handovers introduce significant delays into the value stream that will comprise the majority of cycle time.

In the earlier example, the 12 week cycle time would likely contain delays caused by issues such as test hardware procurement, database administrator unavailability, and change advisory board scheduling. Such delays are entirely unrelated to technology choices and must be addressed via organisational change.

What Is Continuous Delivery - Organisational Change

If an organisation is to successfully adopt Continuous Delivery it must harmonise its communication pathways with Conway’s Law, which postulates a correlation between organisational and system architecture now accepted as canon within IT:

“Any organisation that designs a system (defined broadly) will produce a design whose structure is a copy of the organisation’s communication structure” Mel Conway

Conway’s Law explains why siloed teams within the same value stream inevitably use their own tooling and processes – such as developers using a different database migrator to database administrators – and strongly implies that the most effective method of software delivery is cross-functional product teams with complete authority and responsibility for their deployment pipeline.

What Is Continuous Delivery - Product Team

If an organisation wants to reap the rewards of Continuous Delivery it must change itself, and as Linda Rising and Mary Lynn Manns have observed “change is best introduced bottom-up with support at appropriate points from management“. Executives must communicate well-defined business outcomes, encourage innovators and early adopters, and help their staff with the transition. As Edgar Schein suggests in his book “Organisational Culture and Leadership” change often triggers learning anxiety and survival anxiety in individuals, so for people to commit to Continuous Delivery they must feel part of a culture of continuous improvement in a blame-free environment.

The DevOps philosophy conceived by Patrick Debois in 2009 is a popular method of fostering a culture of collaboration, and its emphasis upon Development and Operations working together on feature development and operability while sharing incentives can be a powerful force for change. However, as Dave Farley has observed “DevOps rarely says enough about the goal of delivering valuable software” and misinterpretations of its collaboration philosophy such as the DevOps Team antipattern are common.

While a deployment pipeline cannot directly reduce cycle time, it can indirectly contribute by acting as a central communication tool that facilitates organisational change. Radiating lead times and real-time customer monitoring from the deployment pipeline will show people the direct correlation between value stream bottlenecks and loss of customer revenue, and as a result people will better understand how change will benefit the entire organisation. This means organisational change for Continuous Delivery has an implicit dependence upon automation that can be summarised as follows:

Continuous Delivery is 10% automation and 90% organisational change – but don’t try it without that 10%

Dual Value Streams

While Continuous Delivery appears to be a daunting prospect many organisations already contain evidence of their potential for success, as they have two different value streams – a Feature Value Stream of siloed teams that will take weeks or months, and a Fix Value Stream of natural collaboration that will take days.

What Is Continuous Delivery - Dual Value Streams

The reason Fix Value Streams have a much lower cycle time than Feature Value Streams is defect fixes are more easily assigned an estimated Cost Of Delay that can be communicated throughout an organisation. When people know a defect has caused a sunk cost and an opportunity cost is pending there is a shared sense of urgency that encourages collaboration and a truncated value stream. This is a leading indicator of organisational potential for Continuous Delivery, and highlights how inappropriate the project paradigm is for product development. When an organisation works on features in smaller batches it can use Cost Of Delay to better prioritise work and improve flow through its Continuous Delivery pipeline.

Conclusion

It is time for businesses to recognise the strategic value of an IT capability that can rapidly innovate and respond to customer feedback in existing and emerging markets. Continuous Delivery enables an organisation to significantly reduce its time to market for new features, resulting in improved quality and increased product revenues.

Automating a deployment pipeline and accomplishing organisational change for Continuous Delivery is a long-term investment. Jez Humble at al say “the key is to find ways to make small, incremental changes that deliver improved customer outcomes“, and Dave Farley says “break down barriers, increase automation, increase collaboration and iterate“. Regardless of approach, a successful adoption of Continuous Delivery will provide an organisation with an enormous advantage over competitors. If an organisation does not adopt Continuous Delivery, it will eventually lose out to a competitor that can deliver faster, learn from customers faster, and make money faster. You can ignore the economics, but the economics won’t ignore you…

Further Reading

  1. Continuous Delivery by Dave Farley and Jez Humble
  2. Lean Enterprise by Jez Humble, Joanne Molesky, and Barry O’Reilly
  3. Implementing Lean Software Development by Mary and Tom Poppendieck
  4. Fearless Change by Linda Rising and Mary Lynn Mann
  5. Organisational Culture and Leadership by Edgar Schein
  6. Principles Of Product Development Flow by Don Reinertsen

Version Control Strategies

A taxonomy of version control strategies for and against Continuous Integration

This series of articles describes a taxonomy for different types of Feature Branching – developers working on branches in isolation from trunk – and how Continuous Integration is impacted by Feature Branching variants.

  1. Organisation antipattern: Release Feature Branching – the what, why, and how of long-lived feature branches
  2. Organisation pattern: Trunk Based Development – the what, why, and how of trunk development
  3. Organisation antipattern: Integration Feature Branching – the what, why, and how of long-lived integration branches
  4. Organisation antipattern: Build Feature Branching – the what, why, and how of short-lived feature branches

Organisation antipattern: Release Feature Branching

The Version Control Strategies series

  1. Organisation Antipattern – Release Feature Branching
  2. Organisation Pattern – Trunk Based Development
  3. Organisation Antipattern – Integration Feature Branching
  4. Organisation Antipattern – Build Feature Branching

Release Feature Branching dramatically increases development costs and risk

Feature Branching is a version control practice in which developers commit their changes to a branch of a source code repository before merging to trunk at a later date. Popularised in the 1990s and 2000s by centralised Version Control Systems (VCS) such as ClearCase, Feature Branching has evolved over the years and is currently enjoying a resurgence in popularity thanks to Distributed Version Control Systems (DVCS) such as Git.

The traditional form of Feature Branching originally promoted by ClearCase et al might be called Release Feature Branching. The central branch known as trunk is considered a flawless representation of all previously released work, and new features for a particular release are developed on a long-lived branch. Developers commit changes to their branch, automated tests are executed, and testers manually verify the new features. Those features are then released into production from the branch, merged into trunk by the developers, and regression tested on trunk by the testers. The branch can then be earmarked for deletion and should only be used for production defect fixes.

Consider an organisation that provides an online company accounts service, with its codebase maintained by a team practicing Release Feature Branching. Two epics – E1 Corporation Tax and E2 Trading Losses – begin development on concurrent feature branches. The E1 branch is broken early on, but E2 is unaffected and carries on regardless.

In month 2, two more epics – E3 Statutory Accounts and E4 Participator Loans – begin. E3 is estimated to have a low impact but its branch is broken by a refactoring and work is rushed to meet the E3 deadline. Meanwhile the E4 branch is broken by a required architecture change and gradually stabilised.

In month 3, E3 is tested and released into production before being merged into trunk and regression tested. The E2 branch becomes broken so progress halts until it is fixed. The E1 branch is tested and released into production before the merge and regression testing of trunk + E3 + E1.

In month 4, E2 is tested and released into production but the subsequent merge and regression testing of trunk + E3 + E1 + E2 unexpectedly fails. While the E2 developers fix trunk E4 is tested and released, and once trunk is fixed the merge and regression testing of trunk + E3 + E1 + E2 + E4 is performed. Soon afterwards a critical defect is found in E4, so a E4.1 fix is also released.

At this point all 4 feature branches could theoretically be deleted, but Corporation Tax changes are requested for E1 on short notice and a trunk release is refused by management due to the perceived risk. The dormant E1 branch is resurrected so E1.1 can be released into production and merged into trunk. While the E1 merge was trunk + E3 the E1.1 merge is trunk + E3 + E2 + E4.1, resulting in a more complex merge and extensive regression testing.

In this example E1, E2, E3, and E4 enjoyed between 1 and 3 months of uninterrupted development, and E4 was even released into production while trunk was broken. However, each period of isolated development created a feedback delay on trunk integration, and this was worsened by the localisation of design activities such as the E3 refactoring and E4 architectural change. This ensured merging and regression testing each branch would be a painful, time-consuming process that prevented new features from being worked on – except E1.1, which created an even more costly and risky integration into trunk.

This situation could have been alleviated by the E1, E2, E3, and/or E4 developers directly merging the changes on other branches into their own branch prior to their production release and merge into trunk. For instance, in month 4 the E4 developers might have merged the latest E1 changes, the latest E2 changes, and the final E3 changes into the E4 branch prior to release.

Martin Fowler refers to this process of directly merging between branches as Promiscuous Integration, and promiscuously integrating E1, E2, and E3 into E4 would certainly have reduced the complexity of the eventual trunk + E3 + E1 + E2 + E4 merge. However, newer E1 and E2 changes could still introduce complexity into that merge, and regression testing E4 on trunk would still be necessary.

The above example shows how Release Feature Branching inserts an enormously costly and risky integration phase into software delivery. Developer time must be spent managing and merging feature branches into trunk, and with each branch delaying feedback for prolonged periods a complex merge process per branch is inevitable. Tester time must be spent regression testing trunk, and although some merge tools can automatically handle syntactic merge conflicts there remains potential for Semantic Conflicts and subtle errors between features originating from different branches. Promiscuous Integration between branches can reduce merge complexity, but it requires even more developer time devoted to branch management and the need for regression testing on trunk is unchanged.

Since the mid 2000s Release Feature Branching has become increasingly rare due to a greater awareness of its costs. Branching, merging, and regression testing are all non-value adding activities that reduce available time for feature development, and as branches diverge over time there will be a gradual decline in collaboration and codebase quality. This is why it is important to heed the advice of Dave Farley and Jez Humble that “you should never use long-lived, infrequently merged branches as the preferred means of managing the complexity of a large project“.

No Release Testing

This series of articles explains why Release Testing – end-to-end regression testing on the critical path – is a wasteful practice that impedes Continuous Delivery and is unlikely to uncover business critical defects.

  1. Organisation Antipattern: Release Testing – introduces the Release Testing antipattern and why it cannot discover defects
  2. Organisation Antipattern: Consumer Release Testing – introduces the consumer-side variant of the Release Testing antipattern
  3. More Releases With Less Risk – describes how releasing smaller changesets more frequently can reduce probability and cost of failure
  4. Release Testing Is Risk Management Theatre – explains why Release Testing is so ineffective, and offers batch size reduction as an alternative

Application antipattern: Hardcoded Stub

A Hardcoded Stub constrains test determinism and execution times

When testing interactions between interdependent applications we always want to minimise the scope of the System Under Test to ensure deterministic and rapid feedback. This is often accomplished by creating a Stub of the provider application – a lightweight implementation of the provider that supplies canned API responses on demand.

For example, consider an ecommerce website with a microservice architecture. The estate includes a customer-facing Books frontend that relies upon a backend Authentication service for user access controls.

Hardcoded Stub - No Stub

As the Authentication service makes remote calls to a third party, an Authentication Stub is supplied to Books for its automated acceptance testing and manual exploratory testing.

Hardcoded Stub - Stub

A common Stub implementation is a Hardcoded Stub, in which provider behaviour is defined at build time and controlled at run time by magic inputs. For the Authentication Stub that would mean a static pool of pre-authenticated users [1], accessed by magic username via the standard Authentication API [2].

Hardcoded Stub - Hardcoded Stub Single Consumer

While the Authentication Stub has the advantage of not requiring any test setup, the implicit Books dependence upon pre-defined Authentication behaviours will impair Books test determinism and execution times:

  • Changes in the Authentication Stub can cause one to many Books tests to fail unexpectedly, increasing rework
  • Adding/removing/updating Authentication behaviours requires a new Authentication Stub release, increasing feedback loops
  • Concurrent test scenarios are constrained by the size of the Authentication Stub user pool, increasing test execution times

An inability to perform concurrent testing will have a significant impact upon lead times – parallel acceptance tests reduce build times, and parallel exploratory tests speed up tester feedback. This problem is exacerbated when multiple consumers rely on the same Hardcoded Stub, such as a Music frontend tested against the same Authentication Stub as the Books frontend. The same pool of pre-authenticated users [1] is offered to both consumers [2 and 3]

Hardcoded Stub - Hardcoded Stub Multiple Consumers

In this situation the simultaneous testing of Books and Music is bottlenecked by the pre-defined capacity of the Authentication Stub, despite their real-world independence. Test data management becomes a key issue, as testers will have to manually coordinate their use of the pre-authenticated users. A Books test could easily impact a Music test or vice versa – for example, a Books tester could accidentally lock out a user about to used by Music. Such problems can easily lead to wait times within the value stream and inflated lead times.

The root cause of these problems is the overly-contextual nature of a Hardcoded Stub. Rather than predicting test scenarios upfront and providing tightly controlled pathways through provider behaviours, a better approach is to use a Configurable Test Stub – a Configurable Test Double primed by different automated tests and/or exploratory testers to compose provider behaviours. This would mean an Authentication Stub with a private, test-only API able to create users in a desired authentication state and return their generated credentials [1a and 2a] before the standard Authenticatino API is used [1b and 2b].

Hardcoded Stub - Configurable Stub Multiple Consumers

By pushing responsibility for Authentication behaviours onto Books and Music, test data management is decentralised and tests become atomic. The Authentication Stub will have a much lower rate of change, Consumer Driven Contracts can be used to safeguard conversation integrity, and both Books and Music can parallelise their test suites to substantially reduce execution times.

A Hardcoded Stub may be an acceptable starting point for testing consumer/provider interactions, but it is unwieldy with a large test suite and unscalable with multiple consumers. A Configurable Test Stub will prevent nondeterministic test results from creeping into consumers and ensure fast feedback.

Organisation antipattern: Passive Disaster Recovery

Passive Disaster Recovery is Risk Management Theatre

When an IT organisation is vulnerable to a negative Black Swan – an extremely low probability, extremely high cost event causing ruinous financial loss – a traditional countermeasure to minimise downtime and opportunity costs is Passive Disaster Recovery. This is where a secondary production environment is established in a separate geographic location to the primary production environment, with every product increment released into Production and Disaster Recovery retained in a cold standby state.

For example, consider an organisation hosting version v1040 of a customer-facing service in its Production environment. In the event of a catastrophic failure, customers should be immediately routed to the Disaster Recovery environment and receive the same quality of service.

Organisation Antipattern - Disaster Recovery Environment - Vision

Regardless of physical/virtual hosting and manual/automated infrastructure provisioning, Passive Disaster Recovery is predicated upon the fundamentally flawed assumption that active and passive environments will be identical at any given point in time. Over time the unused Disaster Recovery environment will suffer from hardware, infrastructure, configuration, and software drift until it consists of Snowflake Servers that will likely require significant manual intervention if and when Disaster Recovery is activated. With negative Black Swan opportunity costs incurred at a rapid pace the entire future of the organisation might be placed in jeopardy.

Organisation Antipattern - Disaster Recovery Environment - Failover Drift

Passive Disaster Recovery remains common due to an industry-wide underestimation of negative Black Swan events. It is easier for an individual or an organisation to appreciate the extremely low probability of a disastrous business event rather than the extremely high opportunity cost, and as a result a Disaster Recovery environment tends to be procured when a business project begins and left to decay into Risk Management Theatre when the capex funding ends.

Continuous Delivery advocates a radically different approach to Disaster Recovery as it is explicitly focussed upon reducing the time, risk, and opportunity cost of delivering high quality services. One of its principles is Bring The Pain Forward – increasing the cadence of high cost, low frequency events to drive down transaction costs – and applying it to Disaster Recovery means moving from passive to active standby via Blue Green Releases and rotating production responsibility between two near-identical environments.

Organisation Antipattern - Disaster Recovery Environment - Blue Green Releases

In the above diagram, the Blue production environment is currently hosting v1040 and the Green environment is being upgraded with v1041. Once v1041 passes its automated smoke tests and manual exploratory tests it is signed off and customers are seamlessly rerouted from Blue to Green. A short period of time afterwards Blue is upgraded in the background and awaits the next production release.

Organisation Antipattern - Disaster Recovery Environment - Green Blue Releases

As well as enabling zero downtime releases and a cheap rollback mechanism, Blue Green Releases provides an effective Disaster Recovery strategy as the standby production environment is always active and in a known good state. If the Green environment suffers a complete outage customers can be switched to the Blue environment with complete confidence, and vice versa.

Organisation Antipattern - Disaster Recovery Environment - Blue Green Failover

By practicing Blue Green Releases an organisation is effectively rehearsing its Disaster Recovery strategy on every production release, and this can lead to advanced practices such as Chaos Engineering , Fault Injection , and Game Days. It requires a continuous investment in hardware and infrastructure, but it will reduce exposure to negative Black Swans and may even offer a strategic advantage over competitors.

Pipeline antipattern: Artifact Promotion

Promoting artifacts between repositories is a poor man’s metadata

Note: this antipattern used to be known as Mutable Binary Location

A Continuous Delivery pipeline is an automated representation of the value stream of an organisation, and rules are often codified in a pipeline to reflect the real-world journey of a product increment. This means artifact status as well as artifact content must be tracked as an artifact progresses towards production.

One way of implementing this requirement is to establish multiple artifact repositories, and promote artifacts through those repositories as they successfully pass different pipeline stages. As an artifact enters a new repository it becomes accessible to later stages of the pipeline and inaccessible to earlier stages.

For example, consider an organisation with a single QA environment and multiple repositories used to house in-progress artifacts. When an artifact is committed and undergoes automated testing it resides within the development repository.

Pipeline Antipattern Artifact Promotion - Development

When that artifact passes automated testing it is signed off for QA, which will trigger a move of that artifact from the development repository to the QA repository. It now becomes available for release into the QA environment.

Pipeline Antipattern Artifact Promotion - QA

When that artifact is pulled into the QA environment and successfully passes exploratory testing it is signed off for production by a tester. The artifact will be moved from the QA repository to the production repository, enabling a production release at a later date.

Pipeline Antipattern Artifact Promotion - Production

A variant of this strategy is for multiple artifact repositories to be managed by a single repository manager, such as Artifactory or Nexus.

Pipeline Antipattern Artifact Promotion - Repository Manager

This strategy fulfils the basic need of restricting which artifacts can be pulled into pre-production and production environments, but its reliance upon repository tooling to represent artifact status introduces a number of problems:

  • Reduced feedback – an unknown artifact can only be reported as not found, yet it could be an invalid version, an artifact in an earlier stage, or a failed artifact
  • Orchestrator complexity – the pipeline runner has to manage multiple repositories, knowing which repository to use for which environment
  • Inflexible architecture – if an environment is added to or removed from the value stream the toolchain will have to change
  • Lack of metrics – pipeline activity data is limited to vendor-specific repository data, making it difficult to track wait times and cycle times

A more flexible approach better aligned with Continuous Delivery is to establish artifact status as a first-class concept in the pipeline and introduce per-binary metadata support.

Pipeline Antipattern Artifact Promotion - Metadata

When a single repository is used, all artifacts reside in the same location alongside their versioned metadata, which provides a definitive record of artifact activity throughout the pipeline. This means unknown artifacts can easily be identified, the complexity of the pipeline orchestrator can be reduced, and any value stream design can be supported over time with no changes to the repository itself.

Furthermore, as the collection of artifact metadata stored in the repository indicates which artifact passed/failed which environment at any given point in time, it becomes trivial to pipeline dashboards that can display pending releases, application cycle times, and where delays are occurring in the value stream. This is a crucial enabler of organisational change for Continuous Delivery, as it indicates where bottlenecks are occurring in the value stream – likely between people working in separate teams in separate silos.

Organisation antipattern: Dual Value Streams

Dual Value Streams conceal transaction and opportunity costs

The goal of Continuous Delivery is to optimise cycle time in order to increase product revenues, and cycle time is measured as the average lead time of the value stream from code checkin to production release. This was memorably summarised by Mary and Tom Poppendieck as the Poppendieck Question:

“How long would it take your organization to deploy a change that involves just one single line of code? Do you do this on a repeatable, reliable basis?”

The Poppendieck Question is an excellent lead-in to the Continuous Delivery value proposition, but the problem with using it to assess the cycle time of an organisation yet to adopt Continuous Delivery is there will often be two very different answers – one for features, and one for fixes. For example, consider an organisation with a quarterly release cycle. The initial answer to the Poppendieck Question would be “90 days” or similar. Dual Value Streams -  Feature Value Stream However, when the transaction cost of releasing software is disproportionately high a truncated value stream will often emerge for production defect fixes, in which value stream activities are deliberately omitted to slash cycle time. This results in Dual Value Streams – a Feature Value Stream with a cycle time of months, and a Fix Value Stream with a cycle time of days. If our example organisation can release a defect fix in a few days, the correct answer to the Poppendieck Question becomes “90 days or 3 days”. Dual Value Streams - Fix Value Stream Fix Value Streams exist because production defect fixes have a clear financial value that is easily communicated and outweighs the high transaction cost of Feature Value Streams. An organisation will be imbued with a sense of urgency, as a sunk cost has demonstrably been incurred and by releasing a fix faster an opportunity cost can be reduced. People in siloed teams will collaborate upon a fix, and by using a minimal changeset it becomes possible to reason about which value stream activities can be discarded e.g. omitting capacity testing for a UI fix.

Dual Value Streams is an organisational antipattern because it is a local optimisation with little overall benefit to the organisation. There has been an investment in a release mechanism with a smaller batch size and a lower transaction cost, but as it is reserved for defect fixes it cannot add new customer value to the product. The long-term alternative is for organisations to adopt Continuous Delivery and invest in a single value stream with a minimal overall transaction cost. If our example organisation folded its siloed teams into cross-functional teams and moved activities off the critical path a fortnightly release cycle would become a distinct possibility. Dual Value Streams - Value Stream Dual Value Streams is an indicator of organisational potential for Continuous Delivery. When people are aware of the opportunity costs associated with releasing software as well as the transaction costs they are more inclined to work together in a cross-functional manner. When changesets contain a small number of changes it becomes easier to collectively reason about which value stream activities are useful and which should be moved off the critical path or retired.

Furthermore, a Fix Value Stream implicitly validates the use of smaller batch sizes as a risk reduction strategy. Defect fixes are released in small changes to minimise both opportunity costs and the probability of any further errors. Given that strategy works for fixes, why not release features more frequently and measure an organisation against a value-centric Poppendieck Question?

“How long would it take your organization to release a single value-adding line of code? Do you do this on a repeatable, reliable basis?”

« Older posts Newer posts »

© 2021 Steve Smith

Theme by Anders NorénUp ↑