There's No Such Thing as a "Devops Team"
Published 19 October 2012Translations: 한국말
"it's possible for good people, in perversely designed systems, to casually perpetrate acts of great harm on strangers, sometimes without ever realising it." -- Ben Goldacre, Bad Pharma, p. xi
In a fit of rage caused by reading yet another email in which one of our customers proposed creating a "devops team" so as to "implement" devops, I tweeted that "THERE IS NO SUCH THING AS A DEVOPS TEAM." Like all slogans, there's plenty of heat to go with the light, so here's the scoop: the Devops movement addresses the dysfunction that results from organizations composed of functional silos. Thus, creating another functional silo that sits between dev and ops is clearly a poor (and ironic) way to try and solve these problems. Devops proposes instead strategies to create better collaboration between functional silos, or doing away with the functional silos altogether and creating cross-functional teams (or some combination of these approaches).
Why are Functional Silos Problematic?
Functional silos often get created in reaction to a problem (which they inevitably exacerbate). At the beginning of an interview with Elisabeth Hendrickson I posted recently, she discusses working at a product company which was suffering a series of quality problems. As a result, they hired a VP of QA who set up a QA division. The net result of this, counterintuitively, was to increase the number of bugs. One of the major causes of this was that developers felt that they were no longer responsible for quality, and instead focussed on getting their features into "test" as quickly as they could. Thus they paid less attention to making sure the system was of high quality in the first place, which in turn put more stress on the testers. This created a death spiral of increasingly poor quality, which led to increasing stress on the testers, and so on. Elisabeth wrote this up in a paper called "Better testing - worse quality?" back in 2001.
The fundamental problem is this: Bad behavior arises when you abstract people away from the consequences of their actions. Functional silos abstract people away from the consequences of their actions. In the example above, developers are abstracted away from the consequences of writing buggy code.
The essence of Devops, I believe, is to design a system in which people are held responsible for the consequences of their actions - and indeed, one in which the right thing to do is also the easiest thing to do.
There are two steps involved in doing this:
- Make people aware of the consequences of their actions. You can do this by having developers rotate through operations teams, by having operations people attend developer standups and showcases, running lunch and learn sessions, having people blog, or just by going and grabbing lunch with someone working in a different functional silo to yours.
- Make people responsible for the consequences of their actions. This is where things get serious. You can achieve this by having developers carry pagers, or own the service level agreements for the products and services they build (for example, the dev team is L3 support, and is on the hook for the uptime of the service).
The best way of all to make people responsible for the consequences of their actions is to create cross-functional teams for each product or service. As Werner Vogels, CTO of Amazon, says: "you build it, you run it." (It's worth reading this excellent interview in full).
A really bad way to try and solve this problem is to insert another layer of indirection between the dev and ops team, and call it a "devops team". This is what I mean when I am arguing against creating a "devops team" - in addition to the existing dev and ops teams - whose job is to be on the hook for the deployment of the system (these teams were traditionally called "release management" before devops became trendy).
Why Segregation of Duties Doesn't Work
Sometimes, people argue that this model is impossible because of some law or regulation (for example, Sarbanes-Oxley, PCI-DSS) or some framework (ITIL, COBIT) which mandates segregation of duties. Segregation of duties is essentially the idea that the fox shouldn't guard the henhouse: that the job of the testing or operations group is to act as a set of checks and balances to prevent fraud or buggy code created by developers getting into production.
It's important to point out first of all that this approach doesn't work, for exactly the reasons discussed in Elisabeth Hendrickson's paper. It's an example of what I call risk management theatre (by analogy with security theatre) - like the TSA's enhanced airport security, it "accomplish[es] nothing at enormous cost", giving the impression that you're managing the risk of making changes to the production environment, while actually making the situation worse.
A colleague of mine discusses a (thankfully retired) change management process at a large European manufacturer which involves developers filling in a spreadsheet with seven tabs which then gets emailed to a change manager in another country who has to decide whether or not to approve it. The change manager has no clue what's written in the spreadsheet, and talks to the developers to understand if the change is risky and what mitigation strategies are in place. The developers know this, and do the minimum possible amount of work to fill in the spreadsheet. The change manager knows the developers are not doing the most thorough job with the spreadsheet, but it makes no difference to them, so long as the spreadsheet gets submitted.
This is not risk management, it's risk management theatre.
And this argument - that collaboration between silos, or even cross-functional teams, is forbidden by regulation or "best practice" - is an example of what we in the consulting industry call a bullshit smokescreen. So let me be clear about this. Sarbanes-Oxley, ITIL and COBIT nowhere mandate segregation of duties. COBIT v5 doesn't even have a control called "segregation of duties". PCI-DSS does require segregation of duties in its current form, but that doesn't mean people can't collaborate. I recently filmed Michael Rembetsy, director of operations engineering at Etsy, talking about how they implement segregation of duties at Etsy in order to achieve PCI-DSS compliance.
The Role of Operations
OK so I lied when I said there's no such thing as a devops team.
For developers to take responsibility for the systems they create, they need support from operations to understand how to build reliable software that can be continuous deployed to an unreliable platform that scales horizontally. They need to be able to self-service environments and deployments. They need to understand how to write testable, maintainable code. They need to know how to do packaging, deployment, and post-deployment support.
Somebody needs to support the developers in this, and if you want to call the people who do that the "devops team", then I'm OK with that. The crucial thing is this: the "devops team" is not on the hook for the systems that get built, or for deploying them, or writing the build and deployment scripts, or for the operation of those systems. Nor should there be "devops specialists" on development teams doing this work: this is core developer work, the same as writing code, and developers need to own it.
Here's what the devops team does in this model:
- Builds a platform that allows developers to self-service environments for testing and production (and deployments to those environments) and provides metrics to the organization as a whole. This platform is a product, and the team that builds it is doing product development, which (in particular) means the people who use the platform are your customers.
- Provides a toolchain that developers can use to build, test, deploy and run their systems.
- Coaches teams working to move to this model and provides support and training for the platform and toolchain.
Really this is all part of the work of operations. But if you want to call the people who do it your "devops team" then that's cool too.
Further Reading
- I am all in favour of change management, so long as it is done in a lightweight manner, as described here.
- My colleague Joanne Molesky and I wrote a paper which talks about devops, continuous delivery and risk management in an enterprise context for Cutter IT Journal. You can download it for free here.
- I gave a talk which expands on a number of these issues at GOTO Aarhus in 2011. Video | Slides