Amazon (hardback, kindle)

InformIT (pdf, epub, mobi)

中文 (in Chinese)
日本語 (in Japanese)
한국말 (in Korean)
português

You can also see a list of all my publications and talks, including slides, on the Publications page.

Upcoming talks

There’s No Such Thing as a “Devops Team”

Translations: 한국말

“it’s possible for good people, in perversely designed systems, to casually perpetrate acts of great harm on strangers, sometimes without ever realising it.” — Ben Goldacre, Bad Pharma, p. xi

In a fit of rage caused by reading yet another email in which one of our customers proposed creating a “devops team” so as to “implement” devops, I tweeted that “THERE IS NO SUCH THING AS A DEVOPS TEAM.” Like all slogans, there’s plenty of heat to go with the light, so here’s the scoop: the Devops movement addresses the dysfunction that results from organizations composed of functional silos. Thus, creating another functional silo that sits between dev and ops is clearly a poor (and ironic) way to try and solve these problems. Devops proposes instead strategies to create better collaboration between functional silos, or doing away with the functional silos altogether and creating cross-functional teams (or some combination of these approaches).

Why are Functional Silos Problematic?

Functional silos often get created in reaction to a problem (which they inevitably exacerbate). At the beginning of an interview with Elisabeth Hendrickson I posted recently, she discusses working at a product company which was suffering a series of quality problems. As a result, they hired a VP of QA who set up a QA division. The net result of this, counterintuitively, was to increase the number of bugs. One of the major causes of this was that developers felt that they were no longer responsible for quality, and instead focussed on getting their features into “test” as quickly as they could. Thus they paid less attention to making sure the system was of high quality in the first place, which in turn put more stress on the testers. This created a death spiral of increasingly poor quality, which led to increasing stress on the testers, and so on. Elisabeth wrote this up in a paper called “Better testing – worse quality?” back in 2001.

The fundamental problem is this: Bad behavior arises when you abstract people away from the consequences of their actions. Functional silos abstract people away from the consequences of their actions. In the example above, developers are abstracted away from the consequences of writing buggy code.

The essence of Devops, I believe, is to design a system in which people are held responsible for the consequences of their actions – and indeed, one in which the right thing to do is also the easiest thing to do.

There are two steps involved in doing this:

  1. Make people aware of the consequences of their actions. You can do this by having developers rotate through operations teams, by having operations people attend developer standups and showcases, running lunch and learn sessions, having people blog, or just by going and grabbing lunch with someone working in a different functional silo to yours.
  2. Make people responsible for the consequences of their actions. This is where things get serious. You can achieve this by having developers carry pagers, or own the service level agreements for the products and services they build (for example, the dev team is L3 support, and is on the hook for the uptime of the service).
A major reason people can’t move to step two in the plan is that most large organizations just aren’t set up in a way that makes this possible. The culprit here is the fact that software development efforts are usually run as if they were civil engineering projects. When a project is complete, the system gets tossed over the wall to operations to run and maintain as part of a “business as usual” effort, and all the people in the project team get reallocated to new work. The project model is fundamentally flawed as a way of doing software development – software development should be treated as product development instead.

The best way of all to make people responsible for the consequences of their actions is to create cross-functional teams for each product or service. As Werner Vogels, CTO of Amazon, says: “you build it, you run it.” (It’s worth reading this excellent interview in full).

A really bad way to try and solve this problem is to insert another layer of indirection between the dev and ops team, and call it a “devops team”. This is what I mean when I am arguing against creating a “devops team” – in addition to the existing dev and ops teams – whose job is to be on the hook for the deployment of the system (these teams were traditionally called “release management” before devops became trendy).

Why Segregation of Duties Doesn’t Work

Sometimes, people argue that this model is impossible because of some law or regulation (for example, Sarbanes-Oxley, PCI-DSS) or some framework (ITIL, COBIT) which mandates segregation of duties. Segregation of duties is essentially the idea that the fox shouldn’t guard the henhouse: that the job of the testing or operations group is to act as a set of checks and balances to prevent fraud or buggy code created by developers getting into production.

It’s important to point out first of all that this approach doesn’t work, for exactly the reasons discussed in Elisabeth Hendrickson’s paper. It’s an example of what I call risk management theatre (by analogy with security theatre) – like the TSA’s enhanced airport security, it “accomplish[es] nothing at enormous cost”, giving the impression that you’re managing the risk of making changes to the production environment, while actually making the situation worse.

A colleague of mine discusses a (thankfully retired) change management process at a large European manufacturer which involves developers filling in a spreadsheet with seven tabs which then gets emailed to a change manager in another country who has to decide whether or not to approve it. The change manager has no clue what’s written in the spreadsheet, and talks to the developers to understand if the change is risky and what mitigation strategies are in place. The developers know this, and do the minimum possible amount of work to fill in the spreadsheet. The change manager knows the developers are not doing the most thorough job with the spreadsheet, but it makes no difference to them, so long as the spreadsheet gets submitted.

This is not risk management, it’s risk management theatre.

And this argument – that collaboration between silos, or even cross-functional teams, is forbidden by regulation or “best practice” – is an example of what we in the consulting industry call a bullshit smokescreen. So let me be clear about this. Sarbanes-Oxley, ITIL and COBIT nowhere mandate segregation of duties. COBIT v5 doesn’t even have a control called “segregation of duties”. PCI-DSS does require segregation of duties in its current form, but that doesn’t mean people can’t collaborate. I recently filmed Michael Rembetsy, director of operations engineering at Etsy, talking about how they implement segregation of duties at Etsy in order to achieve PCI-DSS compliance.

The Role of Operations

OK so I lied when I said there’s no such thing as a devops team.

For developers to take responsibility for the systems they create, they need support from operations to understand how to build reliable software that can be continuous deployed to an unreliable platform that scales horizontally. They need to be able to self-service environments and deployments. They need to understand how to write testable, maintainable code. They need to know how to do packaging, deployment, and post-deployment support.

Somebody needs to support the developers in this, and if you want to call the people who do that the “devops team”, then I’m OK with that. The crucial thing is this: the “devops team” is not on the hook for the systems that get built, or for deploying them, or writing the build and deployment scripts, or for the operation of those systems. Nor should there be “devops specialists” on development teams doing this work: this is core developer work, the same as writing code, and developers need to own it.

Here’s what the devops team does in this model:

  • Builds a platform that allows developers to self-service environments for testing and production (and deployments to those environments) and provides metrics to the organization as a whole. This platform is a product, and the team that builds it is doing product development, which (in particular) means the people who use the platform are your customers.
  • Provides a toolchain that developers can use to build, test, deploy and run their systems.
  • Coaches teams working to move to this model and provides support and training for the platform and toolchain.

Really this is all part of the work of operations. But if you want to call the people who do it your “devops team” then that’s cool too.


Further Reading

  • I am all in favour of change management, so long as it is done in a lightweight manner, as described here.
  • My colleague Joanne Molesky and I wrote a paper which talks about devops, continuous delivery and risk management in an enterprise context for Cutter IT Journal. You can download it for free here.
  • I gave a talk which expands on a number of these issues at GOTO Aarhus in 2011. Video | Slides
  • http://www.ingineering.it/ Jeff Sussna

    Great post. Need for awareness + responsibility is key. Re the role of operations, I wouldn’t call it a “devops” team. I think doing so just kicks the devops misunderstanding can down the road. Instead, I’d call it something like a “delivery platform” team, since what they’re doing is building and support a delivery platform. Note that this team might or might not apply good DevOps practices themselves.

  • GonzoI

    Oddly enough, this sounds a lot like what small shops do for sheer lack of employees to divide duties among. And just as you outline, our customers are far happier with our code product than what any vendor has provided us – when we otherwise follow good coding practices.

    Unfortunately, success derived from this does go to some people’s heads with the argument “we’re a small shop, we don’t need that overhead” or “it has to work, not be pretty under the hood”. It’s important to remember that owning your code and your mistakes does not mean cowboy coding is acceptable again.

  • http://twitter.com/EricMinick Eric Minick

    I was ready to be angry and shout, “Of course a DevOps team can make sense” but you laid it out wonderfully in the finally section. A DevOps infrastructure / platform team that provides this stuff as service internally is an awesome pattern.

    I don’t think it’s “DevOps” but a release management group that owns the end-to-end delivery and drags Dev and Ops into the same room is also a huge improvement over the status quo at most shops. It’s better that someone owns things end-to-end than nobody, and if that someone can facilitate better Dev-Ops collaboration, all the better. Again it fails the purity tests for DevOps, but a person between the groups is better than a wall.

  • Mark Colburn

    What you describe as “Devops” is very similar to what Google has as Site Reliability Engineering. (http://research.google.com/pubs/pub32583.html) SRE is the team that is responsible for maintaining large scale services and making sure that they are reliable and maintainable. However, SRE does not take over from a development team until a certain bar is met to ensure that a given service is supportable, maintainable, documented, and reliable.

    Once that bar is met, SRE become primarily responsible for the running of the system in production. This frees up developers to go back to working on features. SRE engineers are a combination of system administrators and developers, and continue to work closely with the Development team to keep the system working optimally. They will do architectural reviews with the engineering team, and may alos suggest optimizations or availability/reliability changes to the system. Those features may be implemented by either Engineering or SRE.

    The process works very well at Google, but the appropriate cultural and political structure and safeguards need to be in place, and it make not work as well in other locations.

    • jryding

      Google also has the same sort of organization for how its Testing group works with Development. Engineers in the testing group are not employed to write test cases for products in Google, their job is to improve the testing infrastructure and framework that products write tests with. Now this may include product code refactoring to make it more testable, or actual test framework code – but the point is that they are not on the hook to write the test cases.

      On top of this, the product teams actually do not get access to these Test Engineerings until they have proven to the test organization that they take automated testing seriously.

      I recommend checking out “How Google Tests Software” (http://www.amazon.com/Google-Tests-Software-James-Whittaker/dp/0321803027) if you want to learn more about this topic. It’s a great read for DevOps and people who care about software quality.

  • http://twitter.com/klangberater Alexander Grosse

    oh Jez, not that example again :-) I am still embarrassed when reading it… Good article!

    • http://continuousdelivery.com/ Jez Humble

      You know, I am so grateful to you for that story. No need to be embarrassed – it wasn’t your fault! Thanks again…

  • http://dev2ops.org Damon Edwards

    Spot on, Jez.

    Another great point for the case that DevOps is really a management problem!

    • http://continuousdelivery.com/ Jez Humble

      Thanks. You can get a bunch of the stuff in step 1 done without management (although of course it’s easier with). Step 2 definitely requires it in spades.

  • http://twitter.com/onCommit onCommit

    Well said.

  • Rob Mullen

    I’m all about the concept of devops, especially when abstracted to the fundamental problem of holding people accountable to the consequences of their actions.

    What I’m wondering is what happens down the road, when products are moved into maintenance mode? Currently we have teams that will support these products in sustaining mode, do they become completely responsible for the uptime of all the apps that they inherit? If each team was empowered to solve their problems as they see fit, we could have a very difficult maintenance problem on our hands.

    It’s worth thinking about where devops practices puts us 5 years from now.

  • Pingback: The Ship Show | Challenges Faced by the Enterprise Git Architect

  • Pingback: Sysadmin Sunday 103 - Server Density Blog

  • Pingback: A Smattering of Selenium #124 « Official Selenium Blog

  • http://www.facebook.com/vanbachbn Johnny Nguyen

    On top of this, the product teams actually do not get access to these Test Engineerings until they have proven to the test organization that they take automated testing seriously.I recommend checking out:http://www.youtube.com/watch?v=y-5JxiDL1Lc

  • Pingback: Weekend WebOps #1 | Monitive Blog

  • Pingback: The Ship Show | When Your CI Tool Starts Looking Like a Sledgehammer

  • Pingback: There’s No such thing as a ‘DevOps Team’: Some thoughts at Mark Needham

  • Pingback: Rerun: Making shell scripts even more useful (and a bit cool, again) - dev2ops

  • Pingback: What is DevOps? | Vince Rivellino

  • Pingback: MEE SQA Blog » Blog Archive » November 2012 mensming Twitter Posts

  • Pingback: Links – Early January | Learning Agile and Lean

  • Pingback: Sysadmin / ops role, will it suck? | We Are Team Rubber

  • http://twitter.com/gilhoffer Gil Hoffer

    Couldn’t agree more.
    I think a big bulk of the problem is that today in most organizations (mostly in those where the cloud is not yet highly utilized yet), building self-service tools and toolchains for build, test, deploy and run of complex systems is highly challenging, and the entry barrier quite high.
    This basically pushes many companies towards an older fashion manual and segregated release-management like process, which is the opposite of what the DevOps movement is preaching about, all while using the DevOps title as a fig leaf for being inherently non-agile in these processes.

    I think this is starting to change, and will change more drastically in the years to come, as more and more tools, platforms and frameworks will emerge and make these operations much easier to the every day developer.

    Gil
    http://www.ravellosystems.com

  • http://twitter.com/chris_topinka Chris Topinka

    Collaboration becomes much more difficult when you’re trying to convince a larger audience and are required to run up and back down the pole as many times as someone that isn’t really involved in building the system needs to see it. Collaboration works more naturally when smaller, cross functional teams can make decisions independently in an adequately tested environment and can exchange their skill sets on a daily basis.

  • Pingback: DevOps Blogs You Should Be Reading - Episode 1 | Stackify

  • http://twitter.com/plutora Plutora Inc

    Refreshing to read your post Jez.
    CI and CD have been around for ages and so have teamwork across silos. The ability to communicate and collaborate as opposed to just using ITSM tools and ticketing systems is always going to be a struggle for enterprise size clients. Hence “Put the info in the ticket” mentality as opposed to old fashion communication.

  • http://betaprogram.com/ Vik Chaudhary

    I just did a search for all people in my LinkedIn network who currently have “devops” in their title, and I found 262. Out of 1754 contacts, that’s 15%. Looks like the data belies the assertion.

    • http://continuousdelivery.com/ Jez Humble

      I have no problem with people saying they have devops skills. What I am objecting to is taking the people with those skills and putting them into a team where their job is to try and fix all the problems caused by organizational silos, rather than using them to grow devops capability in the rest of the organization.

      • http://www.ingineering.it/ Jeff Sussna

        Jez,

        How would you define ‘devops skills’? IME most of the time ‘devops’ on the resume really means Chef/Puppet/Jenkins.

        • http://continuousdelivery.com/ Jez Humble

          As gun.io has it, “a developer with some experience and knowledge as a system administrator, or possibly a system administrator with some experience and knowledge as a programmer.”

          Ultimately, it’s a mindset thing – somebody who understands that abstractions are leaky, and who doesn’t consider some part of the creation and operation of a system to be “somebody else’s job”. Someone who is as happy editing an IP routing table or optimizing a database schema as they are refactoring code or writing automated tests. That’s not to say he or she is an expert in all these things – that would be impossible – but who has dabbled in all of them and has an understanding of the principles behind – and interrelationships between – all of these things. Somebody who has a good handle on systems thinking and has developed heuristics for navigating complex systems.

          In an interview, Jesse Robbins says a standard interview question at Amazon was to ask what happens when a user hits the Amazon.com homepage. IIRC if you couldn’t fill at least one hour with the answer to that single question, you failed.

          • http://www.ingineering.it/ Jeff Sussna

            Ah, so “a developer who’s worked at a SaaS startup” #ducks :-).

          • http://continuousdelivery.com/ Jez Humble

            That was my career path :-)

  • Pingback: What Is a DevOps Engineer? | Puppet Labs | Puppet Labs

  • Pingback: The Business Case for DevOps | Matthew Skelton

  • Pingback: What Is a DevOps Engineer?

  • la6470

    I have been a sysadmin in a fairly large organization for many years and we had automated builds and configuration through kickstart, jump start and a series of bash script. We have tripwire for detecting configuration changes and various alerting mechanism. Nobody had any problem with it. Until someone mentioned DevOps and now a simple change request across multiple systems takes weeks , as the DevOps team is back logged and the change is not in their voodoo agile board and what not…previously we could just write a shell script and push it to multiple systems using simple shell multiplexer or BMC in a couple of hours. Obviously our customers are now thrilled with the DevOps team :) . This is what happens when you have too many unemployed developers ….ha ha ha.

  • Pingback: Rodolfo Pilas » ¿Qué es un DevOps?

  • Pingback: Interview with Hendrik Volkmer at Cloudbau during DevOps Days in Berlin - Dell TechCenter - TechCenter - Dell Community

  • Pingback: Interview with Hendrik Volkmer at Cloudbau during DevOps Days in Berlin | ServerGround.net

  • Pingback: M-A-O-L » There’s No Such Thing as a “Devops Team”

  • Pingback: CITCON Turin Registrations Open! | Integrating the world....continuously

  • Pingback: Calling DevOps teams an antipattern is an antipattern | markosrendell's Blog

  • Pingback: The Debate About DevOps Teams | Serena Blog

  • http://joeyguerra.com/ Joey Guerra

    I was so ready to jump on the band wagon when I read the post title because devops is more of a philosophy than a role and I disagree with the tactic of creating a separate team called devops. But as I read, I got the feeling that the post is saying that the software engineers should just take more responsibility of the system and that the operational engineers are just there to support the software engineers and that’s where I realized that there IS such thing as a devops team. However, it’s not what you think.

    Devops is a cross functional team of operational, software, and test engineers who, as a team, are responsible for building, deploying and monitoring a system. So in that regard, the devops team is most certainly on the hook for the systems that get built, deploying them, writing the build deployment scripts and the operations of them. Not just the software engineers, but the team as a whole. I think part of the problem is that typically operational and test engineers are viewed as shared resources, providing a service. And the software engineers are project or product resources, effectively dividing them. If we start viewing the systems we build as products and the people building them as a team, then we can start referring to that team as our devops team. Which would then make sense to me. I’d be on board with that.

  • junisouz

    Recently I was REALLY low on cash and debts were eating me from all sides! That was UNTIL I decided to make money on the internet! I went to surveymoneymaker dot net, and started filling in surveys for cash, and surely I’ve been far more able to pay my bills! I’m so glad, I did this!! – 2eh0

  • PJ Wysota

    Almost got me… You are right at most points… but two:
    1. Processes, regardless of how You call them (ITIL, Kanban, “Shitty face” or whatever – happen at organization. The larger organization the more people are involved, and You are faced with situation where You have to trust or You have to control. Both ways are good… untill shit happens – failure or blocking the organization due to overloaded control. Furthermore it is up to organization how they want they process to look like. If organization is going for simplicity – processes may be simple (even with SOX involved). But if they want more control – be prepared for multistaged forms and workflows… It’s always about business need, not developer’s wishes.
    2. I cannot imagine ChM not being able to asses the business/service risk. For me that means he/she does not care about what’s going on… Although I’ve seen a lot in my career, I cannot imagine person, taking care of process (regardless under which framework or concept) that cannot asses risks related with his/her job. FIRE HIM!!! Although, on second thought – it might have happen – just due to simple not understanding what Change Management is actually. From both CHM and developers sides. The simplest test for ChM is any big failure due to human mistake – developer would say it happens. Service oriented person would say – the process went wrong, if we were able to put into production human error. Changes, regardless of way of processing, are to provide SAFE, and OPERATIONAL solutions. Why? Due to fact that most of services, for which You would consider CHM to be in place are bringing the money into organization. e-commerce platforms, webs, e-mails (possibility to communicate). Failure in operations mean failure in earning pennies, means – someone will not get his salary.

    and some of my perspective for ChM, devops etc.
    1. Releases are changes
    2. Any change attached to release is also a change
    3. Changes need to be on the safe side – planned (or at least thought of in meaning of how? why? what? when?) untill we do them just to bring stg to life
    4. Regardless of model U work with, first rule is: THINK
    5. Second rule: DONT RUSH, even if they push U
    6. Dont do 10 things at the time – the oldest command rules of Roman Empire Legions was that one commander can manage 5 units max. ABove – he gets lost.
    7. If You tke the action – U take responsibility – expect to be blamed, if anything goes wrong
    8. Understanding what U doing and what impact does it have is part of Ur f..ng job,
    9. If anyone is using things that You want to change – inform prior to change
    10. If anyone is making money via things You want to change – involve him – discuss, plan together (get acceptance for ur actions – both of U will be blamed then)
    11.Expect unexpected – although tests are for pussies, IT is the only human activity where shit is happening more often just for no reason – testing may show U this sentence is ALWAYS true
    12. Simple things are often complex – so dont work with complex ones (they are impossible to handle)
    13 (last). Processes are to be performed by PEOPLE, and supported by machines and SW. No process can be performed automatically. That’s why do not trust anything U dont control.

  • Pingback: DevOps engineers, the marriage counsellors of business | This and that

  • Pingback: Ready for Devops | Dan Kennedy

  • Pingback: Always Agile · Organisation Antipattern: Release Testing

  • Pingback: Always Agile Consulting · Organisation Antipattern: Release Testing

  • Pingback: DevOps engineers, the marriage counsellors of business | This and that

  • Pingback: DevOps Today: Man Or Methodology? | StackStorm

  • Pingback: Enterprise DevOps Adoption Isn’t Mandatory — but Neither Is Survival - Mohammed Waseem