On Tech

Category: Continuous Delivery (Page 6 of 7)

The Strangler Pipeline – Autonomation

The Strangler Pipeline is grounded in autonomation

Previous entries in the Strangler Pipeline series:

  1. The Strangler Pipeline – Introduction
  2. The Strangler Pipeline – Challenges
  3. The Strangler Pipeline – Scaling Up
  4. The Strangler Pipeline – Legacy and Greenfield

The introduction of Continuous Delivery to an organisation is an exciting opportunity for Development and Operations to Automate Almost Everything into a Repeatable Reliable Process, and at Sky Network Services we aspired to emulate organisations such as LMAX, Springer, and 7Digital by building a fully automated Continuous Delivery pipeline to manage our Landline Fulfilment and Network Management platforms. We began by identifying our Development and Operations stakeholders, and establishing a business-facing programme to automate our value stream. We emphasised to our stakeholders that automation was only a step towards our end goal of improving upon our cycle time of 26 days, and that the Theory Of Constraints warns that automating the wrong constraint will have little or no impact upon cycle time.

Our determination to value cycle time optimisation above automation in the Strangler Pipeline was soon justified by the influx of new business projects. The unprecedented growth in our application estate led to a new goal of retaining our existing cycle time while integrating our greenfield application platforms, and as our core business domain is telecommunications not Continuous Delivery we concluded that fully automating our pipeline would not be cost-effective. By following Jez Humble and Dave Farley’s advice to “optimise globally, not locally”, we focussed pipeline stakeholder meetings upon value stream constraints and successfully moved to an autonomation model aimed at stakeholder-driven optimisations.

Described by Taiichi Ohno as one of “the two pillars of the Toyota Production System“, autonomation is defined as automation with a human touch. It refers to the combination of human intelligence and automation where full automation is considered uneconomical. While the most prominent example of autonomation is problem detection at Toyota, we have applied autonomation within the Strangler Pipeline as follows:

  • Commit stage. While automating the creation of an aggregate artifact when a constituent application artifact is committed would reduce the processing time of platform creation, it would have zero impact upon cycle time and would replace Operations responsibility for release versioning with arbitrary build numbers. Instead the Development teams are empowered to track application compatibilities and create aggregate binaries via a user interface, with application versions selectable in picklists and aggregate version numbers auto-completed in order to reduce errors.
  • Failure detection and resolution. Although creating an automated rollback or self-healing releases would harden the Strangler Pipeline, we agreed that such a solution was not a constraint upon cycle time and would be costly to implement. When a pipeline failure occurs it is recorded in the metadata of the application artifact, and we Stop The Line to prevent further use until a human has logged onto the relevant server(s) to diagnose and correct the problem.
  • Pipeline updates. Although the high frequency of Strangler Pipeline updates implies value in further automation of its own Production release process, a single pipeline update cannot improve cycle time and we wish to retain scheduling flexibility –  as pipeline updates increase the probability of release failure, it would be unwise to release a new pipeline version immediately prior to a Production platform release. Instead a Production request is submitted for each signed off pipeline artifact, and while the majority are immediately released the Operations team reserve the right to delay if their calendar warns of a pending Production platform release.

Autonomation emphasises the role of root cause analysis, and after every major release failure we hold a session to identify the root cause of the problem, the lessons learned, and the necessary counter-measures to permanently solve the problem. At the time of writing our analysis shows that 13% of release failures were caused by pipeline defects, 10% by misconfiguration of TeamCity Deployment Builds, and the majority originated in our siloed organisational structure. This data provides an opportunity to measure our adoption of the principles of Continuous Delivery according to Shuhari:

  • shu – By scaling our automated release mechanism to manage greenfield and legacy application platforms, we have implemented Repeatable Reliable Process, Automate Almost Everything, and Keep Everything In Version Control
  • ha – By introducing combinational static analysis tests and a pipeline user interface to reduce our defect rate and TeamCity usability issues, we have matured to Bring The Pain Forward and Build Quality In
  • ri – Sky Network Services is a Waterscrumfall organisation where Business, Development, and Operations work concurrently on different projects with different priorities, which means we sometimes fall foul of Conway’s Law and compete over constrained resources to the detriment of cycle time. We have yet to achieve Done Means Released, Everybody Is Responsible, and Continuous Improvement

An example of our organisational structure impeding cycle time would be the first release of the new Messaging application 186-13, which resulted in the following value stream audit:

Messaging 186-13 value stream

While each pipeline operation was successful in less than 20 seconds, the disparity between Commit start time and Production finish time indicate significant delivery problems. Substantial wait times between environments contributed to a lead time of 63 days, far in excess of our average lead time of 6 days. Our analysis showed that Development started work on Messaging 186-13 before Operations ordered the necessary server hardware, and as a result hardware lead times restricted environment availability at every stage. No individual or team was at fault for this situation – the fault lay in the system, with Development and Operations working upon different business projects at the time with non-aligned goals.

With the majority of the Sky Network Services application estate now managed by the Strangler Pipeline it seems timely to reflect upon our goal of retaining our original cycle time of 26 days. Our data suggests that we have been successful, with the cycle time of our Landline Fulfilment and Network Management platforms now 25 days and our greenfield platforms between 18 and 21 days. However, examples such as Messaging 186-13 remind us that cycle time cannot be improved by automation alone, and we must now redouble our efforts to implement Done Means Released, Everybody Is Responsible, and Continuous Improvement. By building the Strangler Pipeline we have followed Donella Meadows‘ change management advice to “reduce the probability of destructive behaviours and to encourage the possibility of beneficial ones” and given all we have achieved I am confident that we can Continuously Improve together.

My thanks to my colleagues at Sky Network Services

The Strangler Pipeline – Legacy and greenfield

The Strangler Pipeline uses the Stage Strangler pattern to manage legacy and greenfield applications

Previous entries in the Strangler Pipeline series:

  1. The Strangler Pipeline – Introduction
  2. The Strangler Pipeline – Challenges
  3. The Strangler Pipeline – Scaling Up

When our Continuous Delivery journey began at Sky Network Services, one of our goals was to introduce a Repeatable, Reliable Process for our Landline Fulfilment and Network Management platforms by creating a pipeline deployer to replace the disparate Ruby and Perl deployers used by Development and Operations. The combination of a consistent release mechanism and our newly-developed Artifact Container would have enabled us to Bring The Pain Forward from failed deployments, improve lead times, and easily integrate future greenfield platforms and applications into the pipeline. However, the simultaneous introduction of multiple business projects meant that events conspired against us.

While pipeline development was focussed upon improving slow platform build times, business deadlines for the Fibre Broadband project left our Fibre, Numbering, and Providers technical teams with greenfield Landline Fulfilment applications that were compatible with our Artifact Container and incompatible with the legacy Perl deployer. Out of necessity those teams dutifully followed Conway’s Law and created deployment buttons in TeamCity housing application-specific deployers as follows:

  • Fibre: A loathed Ant deployer
  • Numbering: A loved Ant deployer
  • Providers: A loved Maven/Java deployer

Over a period of months, it became apparent that this approach was far from ideal for Operations. Each Landline Fulfilment platform release became a slower, more arduous process as the Perl deployer had to be accompanied by a TeamCity button for each greenfield application. Not only did these extra steps increase processing times, the use of a Continuous Integration tool ill-suited to release management introduced symptoms of the Deployment Build antipattern and errors started to creep into deployments.

While Landline Fulfilment releases operated via this multi-step process, a pipeline deployer was developed for the greenfield application platforms. The Landline Assurance, Wifi Fulfilment, and Wifi Assurance technical teams had no time to spare for release tooling and immediately integrated into the pipeline. The pipeline deployer proved successful and consequently demand grew for the pipeline to manage Landline Fulfilment releases as a single aggregate artifact – although surprisingly Operations requested the pipelining of greenfield applications first, due to the proliferation of per-application, per-environment deployment buttons in TeamCity.

A migration method was therefore required for pipelining the entire Landline Fulfilment platform that would not increase the risk of release failure or incur further development costs, and with those constraints in mind we adapted the Strangler pattern for Continuous Delivery as the Stage Strangler pattern. First coined by Martin Fowler and Michael Feathers, the Strangler pattern describes how to gradually wrap a legacy application in a greenfield application in order to safely replace existing features, add new features, and ultimately replace the entire application. By creating a Stage Interface for the different Landline Fulfilment deployers already in use, we were able to kick off a series of conversations with the Landline Fulfilment technical teams about pipeline integration.

We began the Stage Strangler process with the Fibre application deployer, as the Fibre team were only too happy to discard it. We worked together on the necessary changes, deleting the Fibre deployer and introducing a set of version-toggled pipeline deployment buttons in TeamCity. The change in release mechanism was advertised to stakeholders well in advance, and a smooth cutover built up our credibility within Development and Operations.

Deploying Fibre

While immediate replacement of the Numbering application deployer was proposed due to the Deficient Deployer antipattern causing per-server deployment steps for Operations, the Numbering team successfully argued for its retention as it provided additional application monitoring capabilities. We updated the Numbering deployer to conform to our Stage Interface and eliminate the Deficient Deployer symptoms, and then wrote a Numbering-specific pipeline stage that delegated Numbering deployments to that deployer.

Deploy Numbering

The Providers team had invested a lot of time in their application deployer – a custom Maven/Java deployer with an application-specific signoff process embedded within the Artifactory binary repository. Despite Maven’s Continuous Delivery incompatibilitiesbuild numbers being polluted by release numbers, and the sign-off process triggering the Artifact Promotion antipattern, the Providers team resolutely wished to retain their deployer due to their sunk costs. This resulted in a long-running debate over the relative merits of the different technical solutions, but the Stage Strangler helped us move the conversation forward by shaping it around pipeline compatibility rather than technical uniformity. We wrote a Providers-specific pipeline stage that delegated Providers deployments to that deployer, and the Providers team removed their signoff process in favour of a platform-wide sign-off process managed by Operations.

Deploy Providers

As all greenfield applications have now been successfully integrated into the pipeline and the remaining Landline Fulfilment legacy applications are in the process of being strangled, it would be accurate to say that the Stage Strangler pattern provided us with a minimal cost, minimal risk method of integrating applications and their existing release mechanisms into our Continuous Delivery pipeline. The use of the Strangler pattern has empowered technical teams to make their own decisions on release tooling, and a sign of our success is that development of new pipeline features continues unabated while the Numbering and Providers teams debate the value of strangling their own deployers in favour of a universal pipeline deployer.

Deploy Anything

The Strangler Pipeline – Scaling up

The Strangler Pipeline scales via a Artifact Container and Aggregate Artifacts

Previous entries in the Strangler Pipeline series:

  1. The Strangler Pipeline – Introduction
  2. The Strangler Pipeline – Challenges

While Continuous Delivery experience reports abound from organisations such as LMAX and Springer, the pipelines described tend to be focussed upon applying the Repeatable, Reliable Process and Automate Everything principles to the release of a single application. Our Continuous Delivery journey at Sky Network Services has been a contrasting experience, as our sprawling application estate has led to significant scalability demands in addition to more common challenges such as slow build times and unrepeatable release mechanisms.

When pipeline development began 18 months ago, the Sky Network Services application estate consisted of our Network Inventory and Landline Fulfilment platforms of ~25 applications, with a well-established cycle time of monthly Production releases.

However, in a short period of time the demand for pipeline scalability skyrocketed due to the introduction of Fibre Broadband, Landline Assurance, Wifi Fulfilment, Wifi Realtime, and Wifi Assurance:

This means that in under a year our application estate doubled in size to 6 platforms of ~65 applications with the following characteristics:

  • Different application technologies – applications are Scala or Java built by Ant/Maven/Ruby, with Spring/Yadic application containers and Tomcat/Jetty/Java web containers
  • Different platform owners – the Landline Fulfilment platform is owned by multiple teams
  • Different platforms for same applications – the Orders and Services applications are used by both Landline Fulfilment and Wifi Fulfilment
  • Different application lifecycles – applications may be updated every day, once a week, or less frequently

To attain our scalability goals without sacrificing cycle time we followed the advice of Jez Humble and Dave Farley that “the simplest approach, and one that scales up to a surprising degree, is to have a [single] pipeline“, and we built a single pipeline based upon the Artifact Container and Aggregate Artifact pipeline patterns.

For the commit stage of application artifacts, the pipeline provides an interface rather than an implementation. While a single application pipeline would be solely responsible for the assembly and unit testing of application artifacts, this strategy would not scale for multi-application pipelines. Rather than incur significant costs in imposing a common build process upon all applications, the commit interface asks that each application artifact be fully acceptance-tested, provide associated pipeline metadata, and conform to our Artifact Container. This ensures that application artifacts are readily accessible to the pipeline with minimal integration costs, and that the pipeline itself remains independent of different application technologies.

For the creation of platform artifacts, the pipeline contains a commit stage implementation that creates and persists aggregate artifacts to the artifact repository. Whereas an application commit is automatically triggered by a version control modification, a platform commit is manually triggered by a platform owner specifying the platform version and a list of pre-built constituent application artifacts. The pipeline compares constituent metadata against its aggregate definitions to ensure a valid aggregate can be built, before creating an aggregate XML file to act as a version manifest for future releases of that platform version. The use of aggregate artifacts provides a tool for different teams to collaborate on the same platform, different platforms to share the same application artifacts, and for different application lifecycles to be encapsulated behind a communicable platform release version.

While the Strangler Pipeline manages the release of application artifacts via a Repeatable Reliable Process akin to a single application pipeline, the use of the Aggregate Artifact pattern means that an incremental release mechanism is readily available for platform artifacts. When the release of an aggregate artifact into an environment is triggered, the pipeline inspects the metadata of each aggregate constituent and only releases the application artifacts that have not previously entered the target environment. For example, if Wifi Fulfilment 1.0 was previously released containing Orders 317 and Services 192, a release of Wifi Fulfilment 2.0 containing Orders 317 and Services 202 would only release the updated Services artifact. This approach reduces lead times and by minimising change sets reduces the risk of release failure.

A good heuristic for pipeline scalability is that a state of Authority without Responsibility is a smell. For example, we initially implemented a per-application configuration whitelist as a hardcoded regex within the pipeline. That might have sufficed in a single application pipeline, but the maintenance cost in a multi-application pipeline became a painful burden as different application-specific configuration policies evolved. The problem was solved by making the whitelist itself configurable, which empowered teams to be responsible for their own configuration and allowed configuration to change independent of a pipeline version.

In hindsight, while the widespread adoption of our Artifact Container has protected the pipeline from application-specific behaviours impeding pipeline scalability, it is the use of the Aggregate Artifact pattern that has so successfully enabled scalable application platform releases. The Strangler Pipeline has the ability to release application platform versions containing a single updated application, multiple updated applications, or even other application platforms themselves.

The Strangler Pipeline – Challenges

The Strangler Pipeline introduced a Repeatable Reliable Process for start/stop, deployment, and database migration

Previous entries in the Strangler Pipeline series:

  1. The Strangler Pipeline – Introduction

To start our Continuous Delivery journey at Sky Network Services, we created a cross-team working group and identified the following challenges:

  • Slow platform build times. Developers used brittle, slow Maven/Ruby scripts to construct platforms of applications
  • Different start/stop methods. Developers used a Ruby script to start/stop individual applications, server administrators used a Perl script to start/stop platforms of applications
  • Different deployment methods. Developers used a Ruby script to deploy applications, server administrators used a Perl script to deploy platforms of applications driven by a Subversion tag
  • Different database migration methods. Developers used Maven to migrate applications, database administrators used a set of Perl scripts to migrate platforms of applications driven by the same Subversion tag

As automated release management is not our core business function, we initially examined a number of commercial and open-source off-the-shelf products such as ThoughtWorks GoLinkedIn GluAnt Hill Pro, and Jenkins. However, despite identifying Go as an attractive option we reluctantly decided to build a custom pipeline. As our application estate already consisted of ~30 applications, we were concerned that the migration cost of introducing a new release management product would be disproportionately high. Furthermore, a well-established Continuous Integration solution of Artifactory Pro and a 24-agent TeamCity build farm was in situ, and to recommend discarding such a large financial investment with no identifiable upfront value would have been professional irresponsibility bordering upon consultancy. We listened to Bodart’s Law and reconciled ourselves to building a low-cost, highly scalable pipeline capable of supporting our applications in order of business and operational value.

With trust between Development and Operations at a low ebb, our first priority was to improve platform build times. With Maven used to build and release the entire application estate, the use of non-unique snapshots in conjunction with the Maven Release plugin meant that a platform build could take up to 60 minutes, recompiled the application binaries, and frequently failed due to transitive dependencies. To overcome this problem we decreed that using the Maven Release plugin violated Build Your Binaries Only Once, and we placed Maven in a bounded CI context of clean-verify. Standalone application binaries were built at fixed versions using the Axel Fontaine solution, and a custom Ant script was written to transform Maven snapshots into releasable artifacts. As a result of these changes platform build times shrank from 60 minutes to 10 minutes, improving release cadence and restoring trust between Development and Operations.

In the meantime, some of our senior Operations staff had been drawing up a new process for starting/stopping applications. While the existing release procedure of deploy -> stop -> migrate -> set current version -> start was compatible with the Decouple Deployment From Release principle, the start/stop scripts used by Operations were coupled to Apache Tomcat wrapper scripts due to prior use. The Operations team were aware that new applications were being developed for Jetty and Java Web Server, and collectively it was acknowledged that the existing model left Operations in the undesirable state of Responsibility Without Authority. To resolve this Operations proposed that all future application binaries should be ZIP archives containing zero-parameter start and stop shell scripts, and this became the first version of our Binary Interface. This strategy empowered Development teams to choose whichever technology was most appropriate to solve business problems, and decoupled Operations teams from knowledge of different start/stop implementations.

Although the Binary Interface proved over time to be successful, the understandable desire to decommission the Perl deployment scripts meant that early versions of the Binary Interface also called for deployment, database migration, and symlinking scripts to be provided in each ZIP archive. It was successfully argued that this conflated the need for binary-specific start/stop policies with application-neutral deploy/migrate policies, and as a result the latter responsibilities were earmarked for our pipeline.

Implementing a cross-team plan of action for database migration has proven far more challenging. The considerable amount of customer-sensitive data in our Production databases encouraged risk aversion, and there was a sizeable technology gap. Different Development teams used different Maven plugins and database administrators used a set of unfathomable Perl scripts run from a Subversion tag. That risk aversion and gulf in knowledge meant that a cross-team migration strategy was slow to emerge, and its implementation remains in progress. However, we did experience a Quick Win and resolve the insidious Subversion coupling when a source code move in Subversion caused an unnecessary database migration failure. A pipeline stage was introduced to deliver application SQL from Artifactory to the Perl script source directories on the database servers. While this solution did not provide full database migration, it resolved an immediate problem for all teams and better positioned us for full database migration at a later date.

With the benefit of hindsight, it is clear that the above tooling discrepancies, disparate release processes, and communications issues were rooted in Development and Operations historically working in separate silos, as forewarned by Conway’s Law. These problems were solved by Development and Operations teams coming together to create and implement cross-team policies, and this formed a template for future co-operation on the Strangler Pipeline.

Pipeline Pattern: Stage Strangler

The Strangler Pattern reduces the pipeline entry cost for multiple applications

When adding an application into a Continuous Delivery pipeline, we must assess its compatibility with the Repeatable Reliable Process already used by the pipeline to release application artifacts. If the new application produces artifacts that are deemed incompatible, then we can use a Artifact Interface to hide the implementation details. However, if the new application has an existing release mechanism that is radically different, then we must balance our desire for a uniform Repeatable Reliable Process with business expectations.

Assuming that the rationale for pipelining the new application is to de-risk its release process and improve its time-to-market, spending a significant amount of time re-engineering the pipeline and/or application would conflict with Bodart’s Law and harm our value proposition. In this situation we should be pragmatic and adopt a separate, application-specific Repeatable Reliable Process and manage the multiple release mechanisms within the pipeline via a Stage Interface and the Strangler Pattern.

The Strangler Pattern is a legacy code pattern named after Strangler Fig plants, which grow in rainforests where there is intense competition for sunlight. Strangler plants germinate in the rainforest canopy, growing down and around a host tree an inch at a time until the roots are reached and the host tree dies. The Strangler Pattern uses this as an analogy to describe how to replace legacy systems, with a Strangler application created to wrap around the legacy application and gradually replace it one feature at a time until decommissioning. The incremental progress of the Strangler Pattern facilitates a higher release cadence and de-risks system cutover, as well as allowing new features to be developed alongside the transfer of existing features.

To use the Strangler Pattern in Continuous Delivery, we first define a Stage Interface as follows:

Stage#run(Application, Version, Environment)

For each pipeline stage we can then create a default implementation to act as the Repeatable Reliable Process for as many applications as possible, and consider each incoming application on its merits. If the existing release mechanism of a new application is unwanted, then we can use our default stage implementation. If the legacy release mechanism retains some value or is too costly to replace at this point in time, then we can use our Stage Interface to conceal a fresh implementation that wraps around the legacy release mechanism until a strangulation time of our choosing.

In the below example, our pipeline supports three applications – Apples, Oranges, and Pears. Apples and Oranges delegate to their own specific implementations, whereas Pears uses our standard Repeatable Reliable Process. A deploy of Apples will delegate to the Apples-specific pipeline stage implementation, which wraps the Apples legacy release mechanism.

In a similar fashion, deploying Oranges to an environment will delegate to the Oranges-specific pipeline stage implementation and its legacy release mechanism.

Whereas deploying Pears to an environment uses the standard Repeatable Reliable Process.

If and when we consider it valuable, we can update the pipeline and/or Apples application to support the standard Repeatable Reliable Process and subsequently strangle the Apples-specific pipeline stage implementation. Both Apples and Pears are unaffected by this change.

Finally, we can strange the Oranges-specific pipeline stage implementation at a time of our choosing and attain a single Repeatable Reliable Process for all applications.

It is important to note that if the legacy pipeline stage implementations are never strangled, it is unimportant as a significant amount of return on investment has still been delivered. Our applications are managed by our Continuous Delivery pipeline with a minimum of integration effort and a minimum of impact upon both applications and pipeline.

Continuous Delivery and organisational change

Continuous Delivery unaccompanied by organisational change will not reduce cycle time

Our Continuous Delivery value proposition describes a goal of reducing cycle time – the average time for a software release to propagate through to Production – in order to improve our time-to-market, saving time and money that can be invested back into product development and growing revenues. However, it is important to bear in mind that like any cross-organisation transformational programme Continuous Delivery is susceptible to Conway’s Law:

Any organisation that designs a system (defined broadly) will produce a design whose structure is a copy of the organisation’s communication structure

This extraordinary sociological observation predicts that multiple teams working on the same problem will produce disparate solutions, and that the structure of an organisation must be adaptable if product development is to remain sustainable. As a Continuous Delivery pipeline will likely traverse multiple organisational units (particularly in silo-based organisations), these are pertinent warnings that were addressed by Dave Farley and Jez Humble in the principles of Continuous Delivery:

  1. Repeatable Reliable Process
  2. Automate Almost Everything
  3. Keep Everything In Version Control
  4. Bring The Pain Forward
  5. Build Quality In
  6. Done Means Released
  7. Everybody Is Responsible
  8. Continuous Improvement

The majority of these principles are clearly focussed upon culture and behaviours, yet some Continuous Delivery implementations are entirely based upon Reliable Repeatable Process and Automate Almost Everything at the expense of more challenging principles such as Everybody Is Responsible.

For example, in our siloed organisation we are asked to improve the cycle time of an application from 28 days to 14 days, with the existing deployment and migration mechanisms manual processes that each take 20 minutes to perform. We introduce a Continuous Delivery pipeline in which we Automate Almost Everything, we Keep Everything In Version Control, and we establish our Repeatable Reliable Process. However, despite deployment and migration now taking only 5 minutes each, our cycle time is unaffected! How is this possible?

To explain this disheartening situation, we need to use Lean Thinking and examine the value stream of our application. While our new release mechanism has reduced the machine time of each pipeline stage (i.e. time releasing an artifact), the process lead time (i.e. time required to release and sign off a artifact) is largely unaffected. This is because process lead time includes wait time, and in a siloed organisation there are likely to be significant handoff periods both during and between pipeline stages which are “fraught with opportunities for waste“. If the deployment and migration mechanisms have each been reduced to 5 minutes but a 3 hour handoff from server administrator to database administrator remains, our Repeatable Reliable Process will never affect our cycle time.

To accomplish organisational change alongside Continuous Delivery, the most effective method of breaking down silo barriers is to visualise your value stream and act upon waste. Donella Meadows recommended that to effect organisational change you must “arrange the structures and conditions to reduce the probability of destructive behaviours and to encourage the possibility of beneficial ones“, and a pipeline containing a Repeatable Reliable Process is an excellent starting point – but it is not the end. Visualise your pipeline, educate people on the unseen inefficiencies caused by your organisational structure, and encourage an Everybody Is Responsible mentality.

Updating a Pipeline

Pipeline updates must minimise risk to protect the Repeatable Reliable Process

We want to quickly deliver new features to users, and in Continuous Delivery Dave Farley and Jez Humble showed that “to achieve these goals – low cycle time and high quality – we need to make frequent, automated releases“. The pipeline constructed to deliver those releases should be no different and frequently, automatically released into Production itself. However, this conflicts with the Continuous Delivery principle of Repeatable Reliable Process – a single application release mechanism for all environments, used thousands of times to minimise errors and build confidence – leading us to ask:

Is the Repeatable Reliable Process principle endangered if a new pipeline version is released?

To answer this question, we can use a risk impact/probability graph to assess if an update will significantly increase the risk of a pipeline operation becoming less repeatable and/or reliable.

Pipeline Risk

This leads to the following assessment:

  1. An update is unlikely to increase the impact of an operation failing to be repeatable and/or reliable, as the cost of failure is permanently high due to pipeline responsibilities
  2. An update is unlikely to increase the probability of an operation failing to be repeatable, unless the Published Interface at the pipeline entry point is modified. In that situation, the button push becomes more likely to fail, but not more costly
  3. An update is likely to increase the probability of an operation failing to be reliable. This is where stakeholders understandably become more risk averse, searching for a suitable release window and/or pinning a particular pipeline version to a specific artifact version throughout its value stream. These measures may reduce risk for a specific artifact, but do not reduce the probability of failure in the general case

Based on the above, we can now answer our original question as follows:

A pipeline update may endanger the Repeatable Reliable Process principle, and is more likely to impact reliability than repeatability

We can minimise the increased risk of a pipeline update by using the following techniques:

  • Change inspection. If change sets can be shown to be benign with zero impact upon specific artifacts and/or environments, then a new pipeline version is less likely to increase risk aversion
  • Artifact backwards compatibility. If the pipeline uses a Artifact Interface and knows nothing of artifact composition, then a new pipeline version is less likely to break application compatibility
  • Configuration static analysis. If each defect has its root cause captured in a static analysis test, then a new pipeline version is less likely to cause a failure
  • Increased release cadence. If the frequency of pipeline releases is increased, then a new pipeline version is more likely to possess shallow defects, smaller feedback loops, and cheaper rollback

Finally, it is important to note that a frequently-changing pipeline version may be a symptom of over-centralisation. A pipeline should not possess responsibility without authority and should devolve environment configuration, application configuration, etc. to separate, independently versioned entities.

Pipeline Pattern: Artifact Container

A pipeline should be decoupled from artifact content

Note – this pattern was previously known as Binary Interface

In a Continuous Delivery pipeline, a simple Commit stage implementation may equate an application artifact with the compiled artifact(s) e.g. a JAR or a WAR:

Binaries in Single Application Pipeline

This approach may suffice for a single application pipeline, but the coupling between start/stop behaviour and artifact file type means that details of java -jar, $CATALINA_HOME/bin/startup.sh, etc. seep into the pipeline start/stop stages and Operations documentation for manually starting/stopping artifacts. This becomes more of an issue when a pipeline manages multiple applications comprised of different web server technologiesdifferent build tools, and/or different programming languages:

Each new artifact type introduced into the pipeline requires a notable increase in complexity, as conditional behaviour must be incorporated into different pipeline stages and Operations must retain knowledge of multiple start/stop methods. This threatens the Continuous Delivery principle of Repeatable Reliable Process and is a significant barrier to pipeline scalability.

The solution is to introduce a Artifact Container as the output of the Commit Stage, so that artifacts appear identical to the pipeline:

The advantage of this strategy is that it minimises the amount of application-specific knowledge that can leak into the pipeline, empowering development teams to use whatever tools they deem necessary regardless of release management. A change in web server, build tool, or programming language should not necessitate a new pipeline version.

Pipeline Antipattern: Deployment Build

Continuous Integration “Has A” Continuous Delivery is the wrong way around

Eric Minick has written a thought-provoking assessment of Continuous Delivery and Continuous Integration tooling, which includes a variant of The Golden Hammer:

“When all you have is a Continuous Integration system, everything looks like a build”

This leads to an antipattern Eric and I refer to as Deployment Build, in which application deployments are tacked onto a Continuous Integration system by treating them as pseudo-builds. While this approach may be cheap to set up, it creates a number of problems:

  • Ambiguous language – mis-communication is more likely when a deployment button is mis-labelled as a build
  • Noisy user interface – endless buttons such as “Deploy Apples To QA”, “Deploy Apples To Production”, and “Deploy Oranges To QA” hinder feedback
  • Lax security – all downstream servers must be accessible including Production
  • Increased risk – a system failure will impede Operations as well as Development

Eric describes how Deployment Build drove UrbanCode to create uDeploy independent of AntHillPro, and ThoughtWorks Go has Continuous Delivery at its heart. Jenkins now has a Continuous Delivery plugin, although to say Continuous Integration “has a” Continuous Delivery capability is incorrect. The correct relationship is the inverse.

Pipeline Pattern: Aggregate Artifact

Aggregate Artifacts can incrementally deliver complex applications

When pipelining inter-dependent applications, the strength of the pipeline architecture directly correlates to the assembly cost and scalability of the packaging solution. If the Uber-Artifact approach is tacitly accepted as a poor implementation choice, is there an alternative?

The inherent value of any packaging solution is the version manifest mapping of package name/version to constituent artifacts, and there is no reason why that manifest cannot be managed as an artifact itself. In terms of Domain-Driven Design a version manifest is a naturally occurring Aggregate, with the package name/version equating to an Aggregate Root and the constituent artifacts represented as Entities, suggesting a name of Aggregate Artifact.

In an Aggregation Pipeline, the multiple pipelines of an Integration Pipeline are collapsed into a single pipeline with multiple commit stages. A successful commit of a constituent artifact triggers the commit of an Aggregate Artifact containing the new constituent version to the binary repository. At a later date the release stage fetches the aggregate artifact and examines the pipeline metadata for each constituent. Each constituent already known to the target environment is ignored, while the previously unknown constituents are released.

There are a number of advantages to this approach:

  • Consistent release mechanism. Whether a artifact is released independently or as part of an aggregate, the same process can be used
  • No duplication of artifact persistence. Committing an aggregate artifact to the binary repository does not necessitate the re-persistence of its constituents
  • High version visibility. An aggregate artifact is human and machine readable and can be published in multiple formats e.g. email, PDF/HTML release notes
  • Lightweight incremental release process. As an aggregate artifact is a manifest a version diff with earlier releases is easy to implement

As Aggregate Artifact persistence can be as low-tech as a properties file, the cost of the aggregate commit stage is extremely low. This means that a single Aggregate Artifact can scale to support many constituents (of which some may be Aggregate Artifacts themselves), and that failure scenarios can be easily handled.

For example, if a release of Fruit Basket 1.0 fails with the successful constituent Apples 23 and the unsuccessful constituent Oranges 49, then Stop The Line applies to Fruit Basket 1.0 and Oranges 49. Once a fix has been committed for Oranges 49, a new Fruit Basket 1.1 aggregate containing Oranges 50 and the previously successful Apples 23 can be quickly created and incrementally released to the environment.

« Older posts Newer posts »

© 2024 Steve Smith

Theme by Anders NorénUp ↑