On Tech

Tag: Multi-Demand

Multi-Demand Operations

How can Multi-Demand Operations eliminate handoffs and adhere to ITIL? Why are Service Transition, Change Management, and Production Support activities inimical to Continuous Delivery? How can such Policy Rules can be turned into ITIL-compliant Policy Guidelines that increase flow?

This is part 5 of the Strategising for Continuous Delivery series

Know Operations activities

When an organisation has IT As A Cost Centre, its IT department will consist of siloed Delivery and Operations groups. This is based on the outdated COBIT notion of sequential Plan-Build-Run activities, with Delivery teams building applications and Operations teams running them. If the Operations group has adopted ITIL Service Management, its Run activities will include:

  • Service Transition – perform operational readiness checks for an application prior to live traffic
  • Change Management – approve releases for an application with live traffic 
  • Tiered Production Support – monitor for and respond to production incidents for applications with live traffic  

Well-intentioned, hard working Operations teams in IT As A Cost Centre will be incentivised to work in separate silos to implement these activities as context-free, centralised Policy Rules.

See rules as constraints

Policy Rules from Operations will inevitably inject delays and rework into a technology value stream, due to the handoffs and coordination costs involved.  One of those Policy Rules will likely constrain throughput for all applications in a high demand group, even if it has existed without complaint in lower demand groups for years.

Service Transition can delay an initial live launch by weeks or months. Handing over an application from Delivery to Operations means operational readiness is only checked at the last minute. This can result in substantial rework on operational features when a launch deadline looms, and little time is available. Furthermore, there is little incentive for Delivery teams to assess and improve operability when Operations will do it for them.

Change Management can delay a release by days or weeks. Requesting an approval means a Change Advisory Board (CAB) of Operations stakeholders must find the time to meet and assess the change, and agree a release date. An approval might require rework in the paperwork, or in the application changeset. Delays and rework are exacerbated during a Change Freeze, when most if not all approvals are suspended for days at a time. In Accelerate, Dr. Nicole Forsgren et al prove a negative correlation between external approvals and throughput, and conclude “it is worse than having no change approval process at all”.

Tiered Production Support can delay failure resolution by hours or days. Raising a ticket incurs a progression from a Level 1 service desk to Level 2 support agents, and onto Level 3 Delivery teams until the failure is resolved. Non-trivial tickets will go through one or more triage queues until the best-placed responder is found. A ticket might involve rework if repeated, unilateral reassignments occur between support levels, teams, and/or individuals. This is why Jon Hall argues in ITSM and why three-tier support should be replaced with Swarming “the current organizational structure of the vast majority of IT support organisations is fundamentally flawed”.

These Policy Rules will act as Risk Management Theatre to varying degrees in different demand groups. They are based on the misguided assumption that preventative controls on everyone will prevent anyone from making a mistake. They impede knowledge sharing, restrict situational awareness, increase opportunity costs, and actively contribute to Discontinuous Delivery.

Example – MediaTech

At MediaTech, an investment in re-architecting videogames-ui and videogames-data has increased videogames-ui deployment frequency to every 10 days. Yet the Website Services demand group has a target of 7 days, and using the Five Focussing Steps reveals Change Management is the constraint for all applications in the Website Services technology value stream.

A Multi-Demand lens shows a Change Management policy inherited from the lower demand Supplier Integrations and Heritage Apps demand groups. All Website Services releases must have an approved Normal Change, as has been the case with Supplier Integrations and Heritage Apps for years. Normal Changes have a lead time of 0-4 days. This is the most time-consuming activity in Operations, due to the handoffs between approver groups. It is the constraint on Website Services like videogames-ui.

Create ITIL guidelines

Siloed Operations activities are predicated on high compute costs, and the high transaction cost of a release. That may be true for lower demand applications in an on-premise estate. However, Cloud Computing and Continuous Delivery have invalidated that argument for high demand applications. Compute and transaction costs can be reduced to near-zero, and opportunity costs are far more significant.

The intent behind Service Transition, Change Management, and Production Support is laudable. It is possible to re-design such Policy Rules into Policy Guidelines, and implement ITIL principles according to the throughput target of a demand group as well as its service management needs. Those Policy Rules can be replaced with Policy Guidelines, so high demand applications have equivalent lightweight activities while lower demand applications retain the same as before.

Converting Operations Policy Rules into Policy Guidelines will be more palatable to Operations stakeholders if a Multi-Demand Architecture is in place, and hard dependencies have previously been re-designed to shrink failure blast radius. A deployment pipeline for high demand applications that offers extensive test automation and stable deployments is also important.

Multi-Demand Service Transition

Service Transition can be replaced by Delivery teams automating a continual assessment of operational readiness, based on ITIL standards and Operations recommendations. Operational readiness checks should include availability, request throughput, request latency, logging dashboards, monitoring dashboards, and alert rules.

There should be a mindset of continual service transition, with small batch sizes and tight production feedback loops used to identify leading signals of inoperability before a live launch. For example, an application might have automated checks for the presence of a Four Golden Signals dashboard, and Service Level Objective alerts based on Request Success Rate.

Multi-Demand Change Management

Change Management can be streamlined by Delivery teams automating change approval creation and auditing. ITIL has Normal and Emergency Changes for irregular changes. It also has Standard Changes for repeatable, low risk changes which can be pre-approved electronically. Standard Changes are entirely compatible with Continuous Delivery.

Regular, low risk changes for a high demand application should move to a Standard Change template. Low risk, repeatable changes would be pre-approved for live traffic as often as necessary. The criteria for Standard Changes should be pre-agreed with Change Management. Entry criteria could be 3 successful Normal Changes, while exit criteria could be 1 failure.

Irregular, variable risk changes for high demand applications should move to team-approved Normal Changes. The approver group for low and medium risk changes would be the Delivery team, and high risk changes would have Delivery team leadership as well. Entry criteria could be 3 successful Normal Changes and 100% on operational readiness checks.

A Change Freeze should be minimised for high demand applications. For 1-2 weeks before a peak business event, there could be a period of heightened awareness that allows Standard Changes and low-risk Normal Changes only. There could be a 24 hour Change Freeze for the peak business event itself, that allows Emergency Changes only.

The deployment pipeline should have traceability built in. A change approval should be linked to a versioned deployment, and the underlying code, configuration, infrastructure, and/or schema changes. This should be accompanied by a comprehensive engineering effort from Delivery teams for ever-smaller changesets, so changes can remain low risk as throughput increases. This should include Expand-Contract, Decouple Release From Launch, and Canary Deployments for zero downtime deployments.

Multi-Demand Production Support

Tiered Production Support can be replaced by Delivery teams adopting You Build It, You Run It. A Level 1 service desk should remain for any applications with direct customer contact. Level 2 and Level 3 support should be performed by Delivery team engineers on-call 24/7/365 for the applications they build. 

Logging dashboards, monitoring dashboards, and alert rules should be maintained by engineers, and alert notifications should be directed to the team. In working hours, a failure should be prioritised over feature development, and be investigated by multiple team members. Outside working hours, a failure should be handled by the on-call engineer. Teams should do their own incident management and post-incident reviews.

You Build It, You Run It maximises incentives for Delivery teams to build operability into their applications from the outset of development. Operational accountability should reside with the product owner. They should have to prioritise operational features against user features, from a single product backlog. There should be an emphasis on reliable live traffic over feature development, cross-functional collaboration within and between teams, and a cross-pollination of skills. 

Example – MediaTech

At MediaTech, a prolonged investment is made in Operations activities for Website Services. The Service Transition and  Tiered Production Support teams are repurposed to concentrate solely on lower demand, on-premise applications. Website Services teams take on continual service transition and You Build It, You Run It themselves. This provokes a paradigm shift in how operability is handled at MediaTech, as Website Services teams start to implement their own telemetry and share their learnings when failures occur.

Change Management agree with the Website Services teams that any application with a deployment pipeline and automated rollback can move to Standard Change after 3 successful Normal Changes. In addition, agreement is reached on experimental, team-approved Normal Changes. Applications with the Standard Change entry criteria and have passed all operational checks no longer require CAB approval for irregular changes.

The elimination of handoffs and rework between Website Services and Operations teams means videogames-ui and videogames-ui deployment frequency can be increased to every 5 days. The applications are finally in a state of Continuous Delivery, and the next round of improvements can begin elsewhere in the MediaTech estate.

This is part 5 of the Strategising for Continuous Delivery series

  1. Strategising for Continuous Delivery
  2. The Bimodal Delusion
  3. Multi-Demand IT
  4. Multi-Demand Architecture
  5. Multi-Demand Operations

Acknowledgements

Thanks to Thierry de Pauw for reviewing this series.

Multi-Demand Architecture

How can Multi-Demand Architecture accelerate reliability and delivery flow? Why should Policy Rules be based on Continuous Delivery predictors? What is the importance of a loosely-coupled architecture? How can architectural Policy Rules benefit Continuous Delivery and reliability

This is Part 4 of the Strategising for Continuous Delivery series

Increase flow with policies

Policy Rules are not inherently bad. Some policies should be established across all demand groups, to drive Continuous Delivery adoption:

  • Software management should be based on Work In Progress (WIP) limits to reduce batch sizes, visual displays, and production feedback
  • Development should involve comprehensive version control, a loosely-coupled architecture, Trunk Based Development, and Continuous Integration
  • Testing should include developer-driven automated tests, tester-driven exploratory testing, and self-service test data

These practices have been validated in Accelerate as statistically significant predictors of Continuous Delivery. A loosely-coupled architecture is the most important, with Dr. Forsgren et al stating “high performance is possible with all kinds of systems, provided that systems – and the teams that build and maintain them – are loosely coupled”.

Design rules for loose coupling

Team and application  architectures aligned with Conway’s Law enable applications to be deployed and tested independently, even as the number of teams and applications in an organisation increases. An application should represent a Bounded Context, and be an independently deployable unit.

The reliability level of an application cannot exceed the lowest reliability level of its hard dependencies. In particular, the reliability of an application in a lower demand group may be limited by an on-premise runtime environment. Therefore, a Policy Rule should be introduced to reduce coupling between applications, particularly those in different demand groups.

Data should be stored in the same demand group as its consumers, with an asynchronous push if it continues to be mastered in a lower demand group. Interactions between applications should be protected with stability patterns such as Circuit Breakers and Bulkheads. This will allow teams to shift from Optimising For Robustness to Optimising For Resilience, and achieve new levels of reliability.

Example – MediaTech

At MediaTech, there is a commitment to re-architecting video game dataflows. An asynchronous data push is built from videogames-data to a new videogames-details service, which transforms the data format and stores it in a cloud-based database. When this is used by videogames-ui, a reliability level of 99.9% is achieved. Reducing requests into the MediaTech data centre also improves videogames-ui latency and videogames-data responsiveness.

Unlock testing guidelines

Reducing coupling between applications in different demand groups also allows for context-free Policy Rules to be replaced with context-rich Policy Guidelines. Re-designing a policy previously inherited from a lower demand group can eliminate constraints in a high demand group, and result in dramatic improvements in delivery flow. A Policy Rule that all applications must do End-To-End Testing can be replaced with a Policy Guideline that high demand applications do Contract Testing, while lower demand applications continue to do End-To-End Testing. Such a Policy Guideline could be revisited later on for lower demand applications unable to meet their own throughput target.

At MediaTech, the End-To-End Testing between videogames-ui and videogames-data is stopped. Website Services teams take on more testing responsibilities, with Contract Testing used for the videogames-data asychronous data push. Eliminating testing handoffs increases videogames-ui deployment frequency to every 10 days, but every 7 days remains unattainable due to operational handoffs.

This is part 4 of the Strategising for Continuous Delivery series

  1. Strategising for Continuous Delivery
  2. The Bimodal Delusion
  3. Multi-Demand IT
  4. Multi-Demand Architecture
  5. Multi-Demand Operations

Acknowledgements

Thanks to Thierry de Pauw for reviewing this series.

Multi-Demand IT

What is Multi-Demand IT? How does it provide the means to drive a Continuous Delivery programme with incremental investments, according to product demand?

This is Part 3 of the Strategising for Continuous Delivery series

Introduction

Multi-Demand IT is a transformation strategy that recommends investing in groups of technology value streams, according to their product demand. While Bimodal IT recommends upfront, capital investments based on an architectural division of applications, Multi-Demand favours gradual investments in Continuous Delivery across an IT estate based on product Cost of Delay.

A technology value stream is a sequence of activities that converts product ideas into value-adding changes. A demand group is a set of applications in one or more technology value streams, with a shared throughput target that must be met for Continuous Delivery to be achieved. There may also be individual reliability targets for applications within a group, based on their criticality levels.

Uncover demand groups

An IT department should have at least three demand groups representing high, medium, and low throughput targets. This links to Dr. Nicole Forsgren’s research in The Role of Continuous Delivery in IT and Organizational Performance, and Simon Wardley’s Pioneers, Settlers, and Town Planners model in The Only Structure You’ll Ever Need. Additional demand groups representing very high and very low throughput targets may emerge over time. Talented, motivated people are needed to implement Continuous Delivery within the unique context of each demand group.

Multi-Demand creates a Continuous Delivery investment language. Demand groups make it easier to prioritise which applications are in a state of Discontinuous Delivery, and need urgent improvement. The aim is to incrementally invest until Continuous Delivery is achieved for all applications in a demand group.

Applications will rarely move between demand groups. If market disruption or upstream dependents cause a surge in product demand, a rip and replace migration will likely be required as a higher demand group will have its own practices, processes, and tools. When product demand has been filled for an application, its deployment target is adjusted for a long tail of low investment. The new deployment target will retain the same lead time as before, with a lower interval. This ensures the application remains launchable on demand.

A high or medium demand group should contain a single technology value stream. This means all applications with similar demand undergo the same activities and tasks. This reduces cognitive load for teams, and ensures all applications will benefit from a single flow efficiency gain. A low demand group is more likely to have multiple technology value streams, especially if some of its applications are part of a legacy estate.

Example – MediaTech

Assume MediaTech adopts Multi-Demand for its IT transformation. There is a concerted effort to assess technology value streams, and forecast product demand. As a result, the following demand groups are created:

videogames-ui is in the sole Website Services technology value stream, while videogames-data is in one of the Heritage Applications technology value streams.

Create Multi-Demand policies

A demand group will have a policy set which determines its practices, processes, and tools. Inspired by Cynefin, a policy can be a:

  • Policy Fix: single group, such as heightened permissions for teams in a specific group
  • Policy Rule: multi-group single implementation, such as mandatory use of a central incident management system for all groups
  • Policy Guideline: multi-group multi-implementation, such as mandatory test automation with different techniques in each group

A policy will shape one or more activities and tasks within a technology value stream. Each demand group should have a minimal set of policies, as Little’s Law dictates the higher the throughput target, the fewer activities and tasks must exist. Furthermore, applying the Theory Of Constraints to Continuous Delivery shows throughput in a technology value stream will likely be constrained by the impact of a single policy on a single activity.

At MediaTech, the Multi-Demand lens shows videogames-data is in a state of Continuous Delivery while videogames-ui is in Discontinuous Delivery. This is due to the inheritance of End-To-End Testing, CAB meetings, and central production support policies from Heritage Apps, which has lower product demand and a very different context.

Policy Rules should be treated with caution, as they ignore the context and throughput target of a particular demand group. A Policy Rule can easily incur handoffs and rework that constrain throughput in a high demand group, even if it has existed for lower demand groups for years. This can be resolved by turning a Policy Rule into a Policy Guideline, and re-designing an activity per-demand. For example, End-To-End Testing might be in widespread use for all medium and low demand applications. It will likely need to be replaced with Contract Testing or similar with high demand applications.

This is Part 3 of the Strategising for Continuous Delivery series

  1. Strategising for Continuous Delivery
  2. The Bimodal Delusion
  3. Multi-Demand IT
  4. Multi-Demand Architecture
  5. Multi-Demand Operations

Acknowledgements

Thanks to Thierry de Pauw for reviewing this series.

The Bimodal delusion

Why is Bimodal IT so fundamentally flawed? Why is it just a rehash of brownfield versus greenfield IT? What is the delusion that underpins it?

This is Part 2 of the Strategising for Continuous Delivery series

Introduction

Bimodal IT is a notoriously bad method of IT transformation. In 2014, Simon Mingay and Mary Mesaglio of Gartner recommended in How to Be Digitally Agile Without Making a Mess that organisations split their IT departments in two. The authors proposed a Mode 1 for predictability and stability of traditional backend applications, and a Mode 2 for exploration and speed of digital frontend services. They argued this would allow an IT department to protect high risk, low change systems of record, while experimenting with low risk, high change systems of engagement.

Example – MediaTech

For example, a MediaTech organisation has an on-premise application estate with separate development, testing, and operations teams. Product stakeholders demand an improvement from monthly to weekly deployments and from 99.0% to 99.9% reliability, so a commitment is made to Bimodal. Existing teams continue to work in the Mode 1 on-premise estate, while new teams of developers and testers start on Mode 2 cloud-based microservices.

This includes a Mode 2 videogames-ui team, who work on a new frontend that synchronously pulls data from a Mode 1 videogames-data backend application.

Money for old rope

Bimodal is a transformation strategy framed around technology-centric choices, that recommends capital investment in systems of engagement only. It is understandable why these choices might appeal to IT executives responsible for large, mixed estates of applications. Saying Continuous Delivery is only for digital frontend services can be a rich source of confirmation bias for people accustomed to modernisation failures.

However, the truth is Bimodal is just money for old rope. The Bimodal division between Mode 1 and Mode 2 is the same brownfield versus greenfield dichotomy that has existed since the Dawn Of Computer Time. Bimodal has the exact same problems:

  • Mode 1 teams will find it hard to recruit and retain talented people
  • Mode 1 teams will trap the domain knowledge needed by Mode 2 teams
  • Mode 2 teams will depend on Mode 1 teams
  • Mode 2 services will depend on Mode 1 applications

The dependency problems are critical. Bimodal architecture is predicated on frontend services distinct from backend applications, yet the former will inevitably be coupled to the latter. A Mode 2 service will have a faster development speed than its Mode 1 dependencies, but its deployment throughput will be constrained by inherited Mode 1 practices such as End-To-End Testing and heavyweight change management. Furthermore, the reliability of a Mode 2 service can only equal its least reliable Mode 1 dependency.

At MediaTech, the videogames-ui team are beset by problems:

  • Any business logic change in videogames-ui requires End-To-End Testing with videogames-data
  • Any failure in videogames-data prevents customer purchases in videogames-ui
  • Mode 1 change management practices still apply, including CABs and change freezes
  • Mode 1 operational practices still apply, such as a separate operations team and detailed handover plans pre-release

As a result, the videogames-ui team are only able to achieve fortnightly deployments and 99.0% reliability, much to the dissatisfaction of their product manager.

The delusion

This is the Bimodal delusion – that stability and speed are a zero-sum game. As Jez Humble explains in The Flaw at the Heart of Bimodal IT, “Gartner’s model rests on a false assumption that is still pervasive in our industry: that we must trade off responsiveness against reliability”. Peer-reviewed academic research by Dr. Nicole Forsgren et al such as The Role of Continuous Delivery in IT and Organizational Performance has proven this to be categorically false. Increasing deployment frequency does not need to have a negative impact on costs, quality, or reliability.

This is Part 2 of the Strategising for Continuous Delivery series

  1. Strategising for Continuous Delivery
  2. The Bimodal Delusion
  3. Multi-Demand IT
  4. Multi-Demand Architecture
  5. Multi-Demand Operations

Acknowledgements

Thanks to Thierry de Pauw for reviewing this series.

Strategising for Continuous Delivery

What strategy should an IT department adopt to incrementally and iteratively transform itself? How should different methods of software development, testing, and operations be managed at the same time?

Introduction

Organisations that wish to remain competitive in the years to come must explore new offerings, expand successful differentiators, and exploit established products. A 21st century, digital first organisational model of IT As A Business Differentiator focussed on cloud computing, smart mobile devices, and big data analytics is required.

This is tremendously difficult when an organisation has the 20th century, pre-Internet IT As A Cost Centre organisational model. This refers to disparate Product and IT departments, in which IT is a cost centre with fixed scope, fixed resource, and fixed deadline projects. Segregated development, testing, and operations teams mired in long-term Discontinuous Delivery are ill-equipped to rapidly build and run products.

Strategy is choice

An organisation-wide transformation from IT As A Cost Centre to IT As A Business Differentiator is an arduous, multi-year journey. It is hard to know where to invest in Continuous Delivery when an organisation has a large, mixed estate of well-established on-premise applications and emergent cloud applications. Such applications can act as significant revenue generators, regardless of deployment frequency and runtime environment. This means visionary leadership and a sense of urgency are required from IT executives, and a Continuous Delivery strategy.

A strategy is not a vision, a plan, a list of best practices, or an affirmation of the status quo. A strategy is a unified set of choices including a desire to succeed, a declaration on where to succeed, and a statement of how to succeed. It is a commitment to hard choices, amongst options with asymmetric value.

An effective Continuous Delivery strategy can be used to establish a culture of Continuous Improvement, powered by the Improvement Kata. It allows a programme of radical, far-reaching changes to be built around the choices made. This helps people to understand, and be motivated by changes to how teams work, how they interact, and the processes and tools they use.

On that basis, how can an IT department devise a Continuous Delivery strategy? How can it incrementally and iteratively transform itself?

This is part 1 of the Strategising for Continuous Delivery series

  1. Strategising for Continuous Delivery
  2. The Bimodal Delusion
  3. Multi-Demand IT
  4. Multi-Demand Architecture
  5. Multi-Demand Operations

Acknowledgements

Thanks to Thierry de Pauw for reviewing this series.

© 2024 Steve Smith

Theme by Anders NorénUp ↑