Make Large Scale Changes Incrementally with Branch By Abstraction

Published 05 May 2011

Translations: 中文 | 한국말

Many development teams are used to making heavy use of branches in version control. Distributed version control systems make this even more convenient. Thus one of the more controversial statements in Continuous Delivery is that you can't do continuous integration and use branches. By definition, if you have code sitting on a branch, it isn't integrated. One common case when it seems obvious to use branches in version control is when making a large-scale change to your application. However there is an alternative to using branches: a technique called branch by abstraction.

Branch by abstraction: a pattern for making large-scale changes to your application incrementally on mainline.

The example Paul Hammant provides in his original blog entry on this technique is moving from Hibernate to iBatis. As it so happens, Go, the continuous integration and agile release management platform I work on, is presently moving from iBatis to Hibernate, and has been doing so for over a year now. We are also slowly moving our UI over from Velocity and JsTemplate to JRuby on Rails.

Both of these changes are being done slowly and incrementally, at the same time as developing new features, while checking in to mainline on our Mercurial repository multiple times a day. How do we do it?

Moving from iBatis to Hibernate

The team decided to move from iBatis to Hibernate for two reasons: first, we were able to use its ORM efficiently since we had control of our database schema, which saved us writing lots of custom SQL, and second, because its second-level cache helped performance.

Of course we didn't want to move the whole codebase over at once. So as we started adding new functionality that required new calls to the database, we added these new calls using Hibernate, moving over old calls that used iBatis as required.

It is relatively straightforward to update the persistence logic incrementally because the Go codebase uses a standard layered architecture, with controllers using services that in turn use repositories. Because all the database access code is encapsulated in repository classes using the repository pattern, it's a simple case of incrementally changing one repository class at a time from using iBatis to Hibernate. The service layer has no idea of the underlying persistence framework.

My colleague Pavan K S says "a major requirement of branch by abstraction is the discipline that your developers never add to the old style of things. That means you would, as a rule of thumb, not add an iBatis query - even if its easier or faster to do so. You have to take that hit and do it in Hibernate. That is the only way you can make sure you are progressing. One way to enforce this is to fail a build when there is a new iBatis query added. You can only ever reduce the count and not increase it."

Moving from Velocity and JsTemplate to JRuby on Rails

We also wanted to move over from a Java-based UI stack to a JRuby on Rails stack, both because it was much easier to write tests for this stack, and because it speeded up UI development. Again, this change was made incrementally. When we created a new page in the application, we would create it using the JRuby on Rails stack, linking to the new page from the rest of the application once it was ready.

We would also move pages over whenever we wanted to make substantial changes to them. Again, the new version of the page would be developed using the new stack, and then we'd switch URIs in the rest of the application to point to the new version of the page once it was ready. At this point, we would remove the old version of the page. So while most of the UI in Go is now implemented using JRuby on Rails, there are still a couple of pages that use the old Java stack.

However you'd never know by looking at the pages, because they have identical styling. You have to look at the URI. Any URI that starts /go/tab is routing through the old Velocity stack. All the other URIs are routed via Rack to JRuby on Rails, which in turn calls through to the same Java service layer that the old UI also uses.

How branch by abstraction works

Branch by abstraction involves making large-scale changes to your system incrementally as follows:

  1. Create an abstraction over the part of the system that you need to change.
  2. Refactor the rest of the system to use the abstraction layer.
  3. Create new classes in your new implementation, and have your abstraction layer delegate to the old or the new classes as required.
  4. Remove the old implementation.
  5. Rinse and repeat the previous two steps, shipping your system in the meantime if desired.
  6. Once the old implementation has been completely replaced, you can remove the abstraction layer if you like.

[caption id="attachment_328" align="alignnone" width="417" caption="Branch by Abstraction"]Branch by Abstraction[/caption]

Martin Fowler points out that variations on these steps are possible: "In the simplest case you build the entire abstraction layer, refactor everything to use it, build the new implementation and then flick the switch. But there's various ways to break it up. You may not build the whole abstraction layer, just a subset of functionality, migrate that and then do a another hunk of functionality (providing new and old can co-exist.) Otherwise you may shift some calling code onto the abstraction and have that implemented both ways before you move the rest."

In the iBatis/Hibernate example, the abstraction layer is the repository layer, which hides the implementation details of which persistence framework is being used. In the JRuby on Rails example, the abstraction layer is the servlet engine, which can dispatch either to the JRuby on Rails framework (using Rack) or to standard Java servlets by matching on the URI.

While Go is a relatively small project - it has less than ten developers, and it's only been going for a few years - the same principles apply on projects of all sizes, and teams in ThoughtWorks have used this pattern successfully even on large and distributed projects.

Admittedly, branch by abstraction can add more overhead to the development process, especially in the case that the codebase is poorly structured. You need to think hard and move a bit slower in order to make changes incrementally in this fashion. But in many cases the upside is worth the extra effort, and the larger the restructuring, the more important it is to consider using branch by abstraction.

The key benefit of branch by abstraction is that your code is working at all times throughout the re-structuring, enabling continuous delivery. That means your release schedule is completely decoupled from your architectural changes, and thus you can stop working on the restructuring at any point to do something else that is higher priority, such as putting out a release with an exciting new feature you just thought up.

It's important to have an exit strategy for branch by abstraction. When you have the freedom not to push a large-scale change all the way through, it's very tempting just to leave it half-done once the most critical bits of migration have been completed. But having multiple technologies in play makes the system harder to maintain and means the team has to understand all of the moving parts that are in play. This may be an acceptable trade-off, but it should be visible to the whole team.

Branch by abstraction compared with branching in version control

Branch by abstraction is a somewhat misleading name, because of course it represents an alternative to using branching in version control when making large-scale changes to your system. Teams often use version control branches to make large-scale changes so that they can continue to develop functionality and fix bugs on mainline. The problem of course is that the merge back to mainline is guaranteed to be painful, and the amount of pain is a function both of how big the change you want to make is, and the amount of work you do on mainline in the meantime1.

That means that the stronger the forces are that push you towards using a version control branch, the more painful it is going to be at the end when you have to merge. If you're also using branches for features, the situation is made even worse. In general, using branches for features or large-scale changes is a bad idea for several reasons, of which the most important are that it prevents both continuous delivery and refactoring2. Martin Fowler has excellent articles on why feature branching is bad, and how to use feature toggles as an alternative.

That doesn't mean that all branching in version control is bad. It's OK to branch in order to spike out an idea that you're going to throw away. It's also OK to branch upon releasing, provided you only use the release branch for small, critical bug-fixes. However teams who are practicing continuous deployment usually don't bother with this, since it's typically easier to fix any problems on mainline and roll forward than it is to roll back, because the delta between releases is so small.

The only other time it might be permissible to use branching is if your codebase uses the big ball of mud pattern. In this scenario even creating an abstraction layer can be hard. In order to do this, you must first find a "seam" (typically in the form of a set of interfaces if you're using a statically typed OO language) that you can put the abstraction layer over. If a seam was not readily available, you would normally create one through a series of refactorings, but if that's not possible for some reason then you might have to resort to creating a branch to get into a position to do this. Of course, this is an extreme move.

Relationship to other patterns

Relationship to refactoring. Refactoring has been defined as "a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior". In this sense, both of the examples given above of branch by abstraction are also examples of refactorings. Crucially though branch by abstraction is effectively a programme of related refactorings, which taken together result in a large-scale change to the architecture of the application. Along with the ability to release your software at any time, the ability to refactor is perhaps the most important benefit of developing on mainline.

Relationship to feature toggles. People often confuse branch by abstraction with feature toggles. Both are patterns that allow you to make changes to your system incrementally on mainline. The difference is that feature toggles are intended to allow the development of new features, while keeping those features invisible to users when the system is running. Feature toggles are thus used at deploy time or run time to choose whether a particular feature or set of features is visible in the application.

Branch by abstraction is a pattern for making large-scale changes to an application incrementally, and is thus a development technique. Branch by abstraction can of course be combined with feature toggles, such that - for example - you could switch between the iBatis and Hibernate implementations of a particular set of data access calls to compare the performance of them at runtime. But typically, the choice of implementation is chosen by the developers and either hard-coded or baked in at build time, perhaps through your dependency injection configuration.

Relationship to strangler application. The strangler application pattern involves incrementally replacing a whole system (usually legacy) with a completely new one. Thus it operates at a higher level of abstraction than branch by abstraction, which is for incrementally changing the implementation of a component of your system. The lines between the two start to blur if you have a service-oriented architecture.

Isn't this just good object-oriented design? Yes. Code which follows the SOLID principles makes it very easy to apply this pattern, in virtue in particular of following the dependency inversion principle and the interface segregation principle (ISP). The ISP is important because it provides a nice level of granularity at which to switch out implementations. As my colleague David Rice points out, branch by abstraction is the only sensible way to change the implementation of some particular component. Martin Fowler makes the same point by sometimes defining a component as some part of a system that can be swapped out for another implementation.


1 Another argument that is sometimes put forward is that distributed version control systems make merging so easy we shouldn't be afraid of branching. This is misleading for two reasons. Firstly, as Martin Fowler points out, automated merge tools are incapable of catching semantic conflicts. Second, the longer the branch exists, the harder it is to merge, even with the best tools in the world. You don't have to look too far on GitHub to find projects with forks that everyone would like to see merged, but have diverged so far from mainline that merging them would require substantial amounts of integration work.

2 Of course there are exceptions to every rule. Branches (other than for releases and spikes) are OK if you are working in a small, experienced team, and the branches are very short lived (less than one day).