Get the book

Amazon (hardback, kindle)

InformIT (pdf, epub, mobi)

中文 (in Chinese)
日本語 (in Japanese)

You can also see a list of all my publications and talks, including slides, on the Publications page.

Upcoming talks

Get the software


There’s No Such Thing as a “Devops Team”

Translations: 한국말

“it’s possible for good people, in perversely designed systems, to casually perpetrate acts of great harm on strangers, sometimes without ever realising it.” — Ben Goldacre, Bad Pharma, p. xi

In a fit of rage caused by reading yet another email in which one of our customers proposed creating a “devops team” so as to “implement” devops, I tweeted that “THERE IS NO SUCH THING AS A DEVOPS TEAM.” Like all slogans, there’s plenty of heat to go with the light, so here’s the scoop: the Devops movement addresses the dysfunction that results from organizations composed of functional silos. Thus, creating another functional silo that sits between dev and ops is clearly a poor (and ironic) way to try and solve these problems. Devops proposes instead strategies to create better collaboration between functional silos, or doing away with the functional silos altogether and creating cross-functional teams (or some combination of these approaches).

Why are Functional Silos Problematic?

Functional silos often get created in reaction to a problem (which they inevitably exacerbate). At the beginning of an interview with Elisabeth Hendrickson I posted recently, she discusses working at a product company which was suffering a series of quality problems. As a result, they hired a VP of QA who set up a QA division. The net result of this, counterintuitively, was to increase the number of bugs. One of the major causes of this was that developers felt that they were no longer responsible for quality, and instead focussed on getting their features into “test” as quickly as they could. Thus they paid less attention to making sure the system was of high quality in the first place, which in turn put more stress on the testers. This created a death spiral of increasingly poor quality, which led to increasing stress on the testers, and so on. Elisabeth wrote this up in a paper called “Better testing – worse quality?” back in 2001.

The fundamental problem is this: Bad behavior arises when you abstract people away from the consequences of their actions. Functional silos abstract people away from the consequences of their actions. In the example above, developers are abstracted away from the consequences of writing buggy code.

The essence of Devops, I believe, is to design a system in which people are held responsible for the consequences of their actions – and indeed, one in which the right thing to do is also the easiest thing to do.

There are two steps involved in doing this:

  1. Make people aware of the consequences of their actions. You can do this by having developers rotate through operations teams, by having operations people attend developer standups and showcases, running lunch and learn sessions, having people blog, or just by going and grabbing lunch with someone working in a different functional silo to yours.
  2. Make people responsible for the consequences of their actions. This is where things get serious. You can achieve this by having developers carry pagers, or own the service level agreements for the products and services they build (for example, the dev team is L3 support, and is on the hook for the uptime of the service).
A major reason people can’t move to step two in the plan is that most large organizations just aren’t set up in a way that makes this possible. The culprit here is the fact that software development efforts are usually run as if they were civil engineering projects. When a project is complete, the system gets tossed over the wall to operations to run and maintain as part of a “business as usual” effort, and all the people in the project team get reallocated to new work. The project model is fundamentally flawed as a way of doing software development – software development should be treated as product development instead.

The best way of all to make people responsible for the consequences of their actions is to create cross-functional teams for each product or service. As Werner Vogels, CTO of Amazon, says: “you build it, you run it.” (It’s worth reading this excellent interview in full).

A really bad way to try and solve this problem is to insert another layer of indirection between the dev and ops team, and call it a “devops team”. This is what I mean when I am arguing against creating a “devops team” – in addition to the existing dev and ops teams – whose job is to be on the hook for the deployment of the system (these teams were traditionally called “release management” before devops became trendy).

Why Segregation of Duties Doesn’t Work

Sometimes, people argue that this model is impossible because of some law or regulation (for example, Sarbanes-Oxley, PCI-DSS) or some framework (ITIL, COBIT) which mandates segregation of duties. Segregation of duties is essentially the idea that the fox shouldn’t guard the henhouse: that the job of the testing or operations group is to act as a set of checks and balances to prevent fraud or buggy code created by developers getting into production.

It’s important to point out first of all that this approach doesn’t work, for exactly the reasons discussed in Elisabeth Hendrickson’s paper. It’s an example of what I call “risk management theatre” (by analogy with security theatre) – like the TSA’s enhanced airport security, it “accomplish[es] nothing at enormous cost”, giving the impression that you’re managing the risk of making changes to the production environment, while actually making the situation worse.

A colleague of mine discusses a (thankfully retired) change management process at a large European manufacturer which involves developers filling in a spreadsheet with seven tabs which then gets emailed to a change manager in another country who has to decide whether or not to approve it. The change manager has no clue what’s written in the spreadsheet, and talks to the developers to understand if the change is risky and what mitigation strategies are in place. The developers know this, and do the minimum possible amount of work to fill in the spreadsheet. The change manager knows the developers are not doing the most thorough job with the spreadsheet, but it makes no difference to them, so long as the spreadsheet gets submitted.

This is not risk management, it’s risk management theatre.

And this argument – that collaboration between silos, or even cross-functional teams, is forbidden by regulation or “best practice” – is an example of what we in the consulting industry call a bullshit smokescreen. So let me be clear about this. Sarbanes-Oxley, ITIL and COBIT nowhere mandate segregation of duties. COBIT v5 doesn’t even have a control called “segregation of duties”. PCI-DSS does require segregation of duties in its current form, but that doesn’t mean people can’t collaborate. I recently filmed Michael Rembetsy, director of operations engineering at Etsy, talking about how they implement segregation of duties at Etsy in order to achieve PCI-DSS compliance.

The Role of Operations

OK so I lied when I said there’s no such thing as a devops team.

For developers to take responsibility for the systems they create, they need support from operations to understand how to build reliable software that can be continuous deployed to an unreliable platform that scales horizontally. They need to be able to self-service environments and deployments. They need to understand how to write testable, maintainable code. They need to know how to do packaging, deployment, and post-deployment support.

Somebody needs to support the developers in this, and if you want to call the people who do that the “devops team”, then I’m OK with that. The crucial thing is this: the “devops team” is not on the hook for the systems that get built, or for deploying them, or writing the build and deployment scripts, or for the operation of those systems. Nor should there be “devops specialists” on development teams doing this work: this is core developer work, the same as writing code, and developers need to own it.

Here’s what the devops team does in this model:

  • Builds a platform that allows developers to self-service environments for testing and production (and deployments to those environments) and provides metrics to the organization as a whole. This platform is a product, and the team that builds it is doing product development, which (in particular) means the people who use the platform are your customers.
  • Provides a toolchain that developers can use to build, test, deploy and run their systems.
  • Coaches teams working to move to this model and provides support and training for the platform and toolchain.

Really this is all part of the work of operations. But if you want to call the people who do it your “devops team” then that’s cool too.


Further Reading

  • I am all in favour of change management, so long as it is done in a lightweight manner, as described here.
  • My colleague Joanne Molesky and I wrote a paper which talks about devops, continuous delivery and risk management in an enterprise context for Cutter IT Journal. You can download it for free here.
  • I gave a talk which expands on a number of these issues at GOTO Aarhus in 2011. Video | Slides
  • http://www.ingineering.it/ Jeff Sussna

    Great post. Need for awareness + responsibility is key. Re the role of operations, I wouldn’t call it a “devops” team. I think doing so just kicks the devops misunderstanding can down the road. Instead, I’d call it something like a “delivery platform” team, since what they’re doing is building and support a delivery platform. Note that this team might or might not apply good DevOps practices themselves.

  • GonzoI

    Oddly enough, this sounds a lot like what small shops do for sheer lack of employees to divide duties among. And just as you outline, our customers are far happier with our code product than what any vendor has provided us – when we otherwise follow good coding practices.

    Unfortunately, success derived from this does go to some people’s heads with the argument “we’re a small shop, we don’t need that overhead” or “it has to work, not be pretty under the hood”. It’s important to remember that owning your code and your mistakes does not mean cowboy coding is acceptable again.

  • http://twitter.com/EricMinick Eric Minick

    I was ready to be angry and shout, “Of course a DevOps team can make sense” but you laid it out wonderfully in the finally section. A DevOps infrastructure / platform team that provides this stuff as service internally is an awesome pattern.

    I don’t think it’s “DevOps” but a release management group that owns the end-to-end delivery and drags Dev and Ops into the same room is also a huge improvement over the status quo at most shops. It’s better that someone owns things end-to-end than nobody, and if that someone can facilitate better Dev-Ops collaboration, all the better. Again it fails the purity tests for DevOps, but a person between the groups is better than a wall.

  • Mark Colburn

    What you describe as “Devops” is very similar to what Google has as Site Reliability Engineering. (http://research.google.com/pubs/pub32583.html) SRE is the team that is responsible for maintaining large scale services and making sure that they are reliable and maintainable. However, SRE does not take over from a development team until a certain bar is met to ensure that a given service is supportable, maintainable, documented, and reliable.

    Once that bar is met, SRE become primarily responsible for the running of the system in production. This frees up developers to go back to working on features. SRE engineers are a combination of system administrators and developers, and continue to work closely with the Development team to keep the system working optimally. They will do architectural reviews with the engineering team, and may alos suggest optimizations or availability/reliability changes to the system. Those features may be implemented by either Engineering or SRE.

    The process works very well at Google, but the appropriate cultural and political structure and safeguards need to be in place, and it make not work as well in other locations.

    • jryding

      Google also has the same sort of organization for how its Testing group works with Development. Engineers in the testing group are not employed to write test cases for products in Google, their job is to improve the testing infrastructure and framework that products write tests with. Now this may include product code refactoring to make it more testable, or actual test framework code – but the point is that they are not on the hook to write the test cases.

      On top of this, the product teams actually do not get access to these Test Engineerings until they have proven to the test organization that they take automated testing seriously.

      I recommend checking out “How Google Tests Software” (http://www.amazon.com/Google-Tests-Software-James-Whittaker/dp/0321803027) if you want to learn more about this topic. It’s a great read for DevOps and people who care about software quality.

  • http://twitter.com/klangberater Alexander Grosse

    oh Jez, not that example again :-) I am still embarrassed when reading it… Good article!

    • http://continuousdelivery.com/ Jez Humble

      You know, I am so grateful to you for that story. No need to be embarrassed – it wasn’t your fault! Thanks again…

  • http://dev2ops.org Damon Edwards

    Spot on, Jez.

    Another great point for the case that DevOps is really a management problem!

    • http://continuousdelivery.com/ Jez Humble

      Thanks. You can get a bunch of the stuff in step 1 done without management (although of course it’s easier with). Step 2 definitely requires it in spades.

  • http://twitter.com/onCommit onCommit

    Well said.

  • Rob Mullen

    I’m all about the concept of devops, especially when abstracted to the fundamental problem of holding people accountable to the consequences of their actions.

    What I’m wondering is what happens down the road, when products are moved into maintenance mode? Currently we have teams that will support these products in sustaining mode, do they become completely responsible for the uptime of all the apps that they inherit? If each team was empowered to solve their problems as they see fit, we could have a very difficult maintenance problem on our hands.

    It’s worth thinking about where devops practices puts us 5 years from now.

  • Pingback: The Ship Show | Challenges Faced by the Enterprise Git Architect

  • Pingback: Sysadmin Sunday 103 - Server Density Blog

  • Pingback: A Smattering of Selenium #124 « Official Selenium Blog

  • http://www.facebook.com/vanbachbn Johnny Nguyen

    On top of this, the product teams actually do not get access to these Test Engineerings until they have proven to the test organization that they take automated testing seriously.I recommend checking out:http://www.youtube.com/watch?v=y-5JxiDL1Lc

  • Pingback: Weekend WebOps #1 | Monitive Blog

  • Pingback: The Ship Show | When Your CI Tool Starts Looking Like a Sledgehammer

  • Pingback: There’s No such thing as a ‘DevOps Team’: Some thoughts at Mark Needham

  • Pingback: Rerun: Making shell scripts even more useful (and a bit cool, again) - dev2ops

  • Pingback: What is DevOps? | Vince Rivellino

  • Pingback: MEE SQA Blog » Blog Archive » November 2012 mensming Twitter Posts

  • Pingback: Links – Early January | Learning Agile and Lean

  • Pingback: Sysadmin / ops role, will it suck? | We Are Team Rubber

  • http://twitter.com/gilhoffer Gil Hoffer

    Couldn’t agree more.
    I think a big bulk of the problem is that today in most organizations (mostly in those where the cloud is not yet highly utilized yet), building self-service tools and toolchains for build, test, deploy and run of complex systems is highly challenging, and the entry barrier quite high.
    This basically pushes many companies towards an older fashion manual and segregated release-management like process, which is the opposite of what the DevOps movement is preaching about, all while using the DevOps title as a fig leaf for being inherently non-agile in these processes.

    I think this is starting to change, and will change more drastically in the years to come, as more and more tools, platforms and frameworks will emerge and make these operations much easier to the every day developer.

    Gil
    http://www.ravellosystems.com

  • http://twitter.com/chris_topinka Chris Topinka

    Collaboration becomes much more difficult when you’re trying to convince a larger audience and are required to run up and back down the pole as many times as someone that isn’t really involved in building the system needs to see it. Collaboration works more naturally when smaller, cross functional teams can make decisions independently in an adequately tested environment and can exchange their skill sets on a daily basis.

  • Pingback: DevOps Blogs You Should Be Reading - Episode 1 | Stackify

  • http://twitter.com/plutora Plutora Inc

    Refreshing to read your post Jez.
    CI and CD have been around for ages and so have teamwork across silos. The ability to communicate and collaborate as opposed to just using ITSM tools and ticketing systems is always going to be a struggle for enterprise size clients. Hence “Put the info in the ticket” mentality as opposed to old fashion communication.

  • http://betaprogram.com/ Vik Chaudhary

    I just did a search for all people in my LinkedIn network who currently have “devops” in their title, and I found 262. Out of 1754 contacts, that’s 15%. Looks like the data belies the assertion.