Amazon (hardback, kindle)

InformIT (pdf, epub, mobi)

中文 (in Chinese)
日本語 (in Japanese)
한국말 (in Korean)
português

You can also see a list of all my publications and talks, including slides, on the Publications page.

Deployment pipeline anti-patterns

I was visiting a prospect a few weeks ago when I was delighted to run in to Kingsley Hendrickse, a former colleague at ThoughtWorks who left to study martial arts in China. He’s now back in London working as a tester. We were discussing the deployment pipeline pattern, which he somewhat sheepishly informed me he wasn’t a fan of. Of course I took the scientific view that there couldn’t possibly be anything wrong with the theory, and that the problem must be with the implementation.

Kingsley’s problem was that his team had implemented a deployment pipeline such that it was only possible to self-service a new build into his exploratory testing environment once the acceptance tests had been run against it. This typically took an hour or two. So when he found a bug and it was fixed by a developer, he had to wait ages before he could deploy the build with the fix into his testing environment to check it.

This problem results from a combination of two anti-patterns that are common when creating a deployment pipeline: insufficient parallelization, and over-constraining your pipeline workflow.

Insufficient parallelization

The deployment pipeline can be broadly separated into two parts – the automated part and the manual part. The automated part consists of the automated tests that get run against every commit, which prove (assuming the tests are good enough) that the application is production-ready. In an ideal world with lots of processing power (and near-zero power consumption) this part of the pipeline would consist of a single stage that built the system and executed all these tests in parallel.

However if you have a reasonably sized app with good test coverage, running all the tests (including acceptance tests) on a single box will take over a day. Even with a build grid and parallelization, you’re unlikely to get it down to a few minutes, which is an acceptable length of time for the development team to get feedback on their changes. Thus the pipeline gets split into two stages – a commit stage and an acceptance stage.

The commit stage contains mainly unit tests and gives you a high level of confidence you haven’t introduced any regressions with your latest change. It runs in a few minutes (ten is the absolute maximum). The commit stage also runs any analysis tools for producing internal code quality reports (e.g. test coverage and cyclomatic complexity), and produces an environment-independent packaged version of your application that is used for all later stages of the pipeline (including release). The acceptance stage contains all the rest of the automated tests and usually takes longer – an hour or two.

Both of these stages should be parallelized so they execute as fast as possible. For example it is usually possible to have the commit stage run across several boxes: one to build packages, one to do analysis, and a few to run the commit tests. Similarly, you can run acceptance tests in parallel – on the Mingle project, at the time of writing, they have 3,282 end-to-end acceptance tests that run across 53 boxes. The whole stage takes just over an hour to run1.

The heuristic is:

  • Make your pipeline wide, not long: reduce the number of stages as much as possible, and parallelize each stage as much as you can.
  • Create more stages if necessary to optimize feedback.

Inflexible workflow

Even if Kingsley’s team had followed the pattern above and aggressively parallelized their tests, it’s possible they might not have got the lead time from check-in to availability to deploy to manual testing down sufficiently. But they also made another mistake. They only allowed a build to be deployed to manual testing after the acceptance test stage had passed. Presumably this was to prevent manual testers from wasting their time on builds that weren’t known to be good.

Some of the time, this is reasonable. However, often it isn’t – and the tester will know whether or not they want to see the acceptance tests pass, so he or she should get to choose whether or not to wait.

When people design pipelines, they are usually deciding in their head what the ‘ideal’ process should look like, and often they think linearly. The result might look something like figure 1, a linear obstacle course that builds must overcome to prove their fitness for release.

Figure 1: Linear pipeline

However the problem with this design is that it prevents teams from optimizing their process. For example, this design makes it difficult to manage emergencies. If you need to push a fix out quickly, you might re-organize your team on the fly, parallelizing tasks that might normally be performed in series, such as testing capacity and performing exploratory testing. This is impossible with the pipeline design above.

So the same rule applies – the pipeline should fan out as soon as it makes sense to do so. In the case of exploratory testing environments, all builds that pass the commit tests should be available. In the case of staging and production environments, all builds that pass both commit and acceptance tests. Arguably – unless you’re using the blue-green deployments pattern in which staging becomes production – builds should pass through staging before they can be deployed to production. But if you have comprehensive end-to-end acceptance tests that are run in a production-like environment, you might allow builds to be deployed directly to production.

Figure 2: Optimized pipeline

Of course all deployments can be made subject to approvals. The important thing is not to conflate your workflow with your approval process by requiring builds to go through multiple different stages serially. Instead, make it easy for the people doing the approval to see which parts of the delivery process each build has been through, and what the results were, so they have the information they need to make decisions – such as which build to deploy, or whether a particular build should be deployed – at their fingertips.

Go showing which environments a build has been deployed to

There are cases where the arrangement shown in Figure 2 is not sufficiently linear – for example, when integrating several applications and then promoting them through staging and live in a deployment train. However in general the heuristic is:

  • Make your pipeline wide, not long: allow for resources to be redistributed so manual work can be performed in parallel if required.
  • Prefer visibility to locking down: give people the information they need to make informed decisions, rather than constraining them.

Conclusion

As part of your practice of continuous improvement, evolve your deployment pipeline to enable teams to become more efficient. In general, aim to make pipelines as wide and short as possible through aggressive parallelization, and by avoiding chaining deployments so that builds must pass through multiple stages before they become available for release. Rather, make it easy for the person performing the deployment to judge what can – and should – be deployed.

And of course follow the Deming cycle: measure the average cycle time for builds to pass through the pipeline, and the lead times to each environment, to judge the effect each change you make has on the efficiency of your delivery process, and continue to optimise accordingly.


1 My colleagues created an open source project called TLB which you can use to run your tests in parallel across multiple boxes.

  • http://www.threeriversinstitute.org/blog Kent Beck

    Another potential optimization is to prioritize tests so brand-new and recently-failed tests run first. This greatly increases the information value of the first fraction of the testing time, as most tests pass most of the time (the number of failures over the lifetime of a test is power-law distributed). This is what JUnit Max does to improve average response time.

    • http://github.com/itspanzi Pavan Sudarshan

      This is what TLB (Test load balancer – the one Jez mentions in the footnote) does as well on CI. This means whatever tests failed in the previous run, get run first in the current run.

  • Pingback: Frank Carver's Punch Barrel / Deployment pipeline anti-patterns

  • http://policystat.com Wes Winham

    One way of minimizing the developer => tester cycle time (and widening the pipeline) is to enable testers to run the system on their machine with whatever version they’d like. That allows you to keep your shared test/stable deployments healthy without hard-coding a lag between a commit and anyone being able to use it.

    Using scripts or a tool like Vagrant to keep provisioning easy, a tester has full flexibility to do out of band manual testing when needed. Since you’re likely already doing the work to allow developers to run a local version of the system, it seems like a good idea to go the extra step to make that something that non-developers can use.

  • Glenn Brown

    Our organization is trying to find ways to get out of the first anti-pattern by increasing the amount of development and qa environments and allowing for independent, parallel deployments.

    One way I have thought of handling this is to have feature team specific dev and qa environments where builds and deployments are modular and built for speed – only deploy the aspects of the app that the current feature team regularly works on while allowing them to kick off a full app build and deploy as needed.

    I’m wondering if this is something other teams have tried out in the past.

  • Pingback: Diabol » Deplyment pipeline och Continuous delivery

  • Pingback: CI, CD and ADL | 50mz.com

  • Pingback: Application Build and Continuous Integration Patterns | BeiJing Parking

  • http://www.developertesting.net Ben

    Tests taking up to or over a day are far from typical right?

    I’ve worked on projects with tens of thousands of tests (unit, integration, acceptance) and have never seen a test suite that takes more than a few hours to run.

    Parallelisation within the test frameworks is of course an option here if your tests are taking too long.

  • jez

    @Ben

    I would expect that on any reasonably complex system, a comprehensive suite of automated end-to-end acceptance tests would take more than a day to run if you ran them on a single box. Both IMVU and Mingle fall into this category (Mingle has over 3,000 Selenium tests). Ultimately the litmus test for your automated test coverage is this: when the tests pass, am I confident beyond reasonable doubt that I can push to production without breaking anything?

    When you have big suites of tests, you want to run them in parallel on a grid / cloud. Mingle uses a grid of about 60 boxes to get feedback in less than an hour; IMVU gets feedback on their tests in 15m or so I think.

    Go provides this functionality out of the box (see “test intelligence”) and tells you exactly which tests were broken by which check-ins. You can achieve something similar with some extra work using the open source tools.

    And of course you have to make sure your tests are designed to be run in parallel, which can be the hard bit if you didn’t plan for this from the beginning.

  • http://www.thinkinginagile.com Rachael

    Thanks Jez
    One other anti pattern I have observed is the parallelization is done where the commit build and the long running test build (typically the functional build) is run parallely. In the meanwhile the commit build fails. However the team continues with the Functional build. The linkage between commit and functional build is not established well.

    Some other anti pattern and issue which I have come across is that team not having the right test suite organization structure. UT, Functional and Non functional test cases are not grouped properly. Sometimes Unit Test Cases are actually not unit test cases. So when it is run in the commit build, it obviously takes more time. Also parallelization becomes a big challenge since the test cases are not grouped properly.
    Just a thought. Thanks