Make Large Scale Changes Incrementally with Branch By Abstraction
Published 05 May 2011Many development teams are used to making heavy use of branches in version control. Distributed version control systems make this even more convenient. Thus one of the more controversial statements in Continuous Delivery is that you can't do continuous integration and use branches. By definition, if you have code sitting on a branch, it isn't integrated. One common case when it seems obvious to use branches in version control is when making a large-scale change to your application. However there is an alternative to using branches: a technique called branch by abstraction.
Branch by abstraction: a pattern for making large-scale changes to your application incrementally on mainline.
The example Paul Hammant provides in his original blog entry on this technique is moving from Hibernate to iBatis. As it so happens, Go, the continuous integration and agile release management platform I work on, is presently moving from iBatis to Hibernate, and has been doing so for over a year now. We are also slowly moving our UI over from Velocity and JsTemplate to JRuby on Rails.
Both of these changes are being done slowly and incrementally, at the same time as developing new features, while checking in to mainline on our Mercurial repository multiple times a day. How do we do it?
Moving from iBatis to Hibernate
The team decided to move from iBatis to Hibernate for two reasons: first, we were able to use its ORM efficiently since we had control of our database schema, which saved us writing lots of custom SQL, and second, because its second-level cache helped performance.
Of course we didn't want to move the whole codebase over at once. So as we started adding new functionality that required new calls to the database, we added these new calls using Hibernate, moving over old calls that used iBatis as required.
It is relatively straightforward to update the persistence logic incrementally because the Go codebase uses a standard layered architecture, with controllers using services that in turn use repositories. Because all the database access code is encapsulated in repository classes using the repository pattern, it's a simple case of incrementally changing one repository class at a time from using iBatis to Hibernate. The service layer has no idea of the underlying persistence framework.
Moving from Velocity and JsTemplate to JRuby on Rails
We also wanted to move over from a Java-based UI stack to a JRuby on Rails stack, both because it was much easier to write tests for this stack, and because it speeded up UI development. Again, this change was made incrementally. When we created a new page in the application, we would create it using the JRuby on Rails stack, linking to the new page from the rest of the application once it was ready.
We would also move pages over whenever we wanted to make substantial changes to them. Again, the new version of the page would be developed using the new stack, and then we'd switch URIs in the rest of the application to point to the new version of the page once it was ready. At this point, we would remove the old version of the page. So while most of the UI in Go is now implemented using JRuby on Rails, there are still a couple of pages that use the old Java stack.
However you'd never know by looking at the pages, because they have identical styling. You have to look at the URI. Any URI that starts /go/tab is routing through the old Velocity stack. All the other URIs are routed via Rack to JRuby on Rails, which in turn calls through to the same Java service layer that the old UI also uses.
How branch by abstraction works
Branch by abstraction involves making large-scale changes to your system incrementally as follows:
- Create an abstraction over the part of the system that you need to change.
- Refactor the rest of the system to use the abstraction layer.
- Create new classes in your new implementation, and have your abstraction layer delegate to the old or the new classes as required.
- Remove the old implementation.
- Rinse and repeat the previous two steps, shipping your system in the meantime if desired.
- Once the old implementation has been completely replaced, you can remove the abstraction layer if you like.
[caption id="attachment_328" align="alignnone" width="417" caption="Branch by Abstraction"][/caption]
In the iBatis/Hibernate example, the abstraction layer is the repository layer, which hides the implementation details of which persistence framework is being used. In the JRuby on Rails example, the abstraction layer is the servlet engine, which can dispatch either to the JRuby on Rails framework (using Rack) or to standard Java servlets by matching on the URI.
While Go is a relatively small project - it has less than ten developers, and it's only been going for a few years - the same principles apply on projects of all sizes, and teams in ThoughtWorks have used this pattern successfully even on large and distributed projects.
Admittedly, branch by abstraction can add more overhead to the development process, especially in the case that the codebase is poorly structured. You need to think hard and move a bit slower in order to make changes incrementally in this fashion. But in many cases the upside is worth the extra effort, and the larger the restructuring, the more important it is to consider using branch by abstraction.
The key benefit of branch by abstraction is that your code is working at all times throughout the re-structuring, enabling continuous delivery. That means your release schedule is completely decoupled from your architectural changes, and thus you can stop working on the restructuring at any point to do something else that is higher priority, such as putting out a release with an exciting new feature you just thought up.
Branch by abstraction compared with branching in version control
Branch by abstraction is a somewhat misleading name, because of course it represents an alternative to using branching in version control when making large-scale changes to your system. Teams often use version control branches to make large-scale changes so that they can continue to develop functionality and fix bugs on mainline. The problem of course is that the merge back to mainline is guaranteed to be painful, and the amount of pain is a function both of how big the change you want to make is, and the amount of work you do on mainline in the meantime1.
That means that the stronger the forces are that push you towards using a version control branch, the more painful it is going to be at the end when you have to merge. If you're also using branches for features, the situation is made even worse. In general, using branches for features or large-scale changes is a bad idea for several reasons, of which the most important are that it prevents both continuous delivery and refactoring2. Martin Fowler has excellent articles on why feature branching is bad, and how to use feature toggles as an alternative.
That doesn't mean that all branching in version control is bad. It's OK to branch in order to spike out an idea that you're going to throw away. It's also OK to branch upon releasing, provided you only use the release branch for small, critical bug-fixes. However teams who are practicing continuous deployment usually don't bother with this, since it's typically easier to fix any problems on mainline and roll forward than it is to roll back, because the delta between releases is so small.
The only other time it might be permissible to use branching is if your codebase uses the big ball of mud pattern. In this scenario even creating an abstraction layer can be hard. In order to do this, you must first find a "seam" (typically in the form of a set of interfaces if you're using a statically typed OO language) that you can put the abstraction layer over. If a seam was not readily available, you would normally create one through a series of refactorings, but if that's not possible for some reason then you might have to resort to creating a branch to get into a position to do this. Of course, this is an extreme move.
Relationship to other patterns
Relationship to refactoring. Refactoring has been defined as "a change made to the internal structure of software to make it easier to understand and cheaper to modify without changing its observable behavior". In this sense, both of the examples given above of branch by abstraction are also examples of refactorings. Crucially though branch by abstraction is effectively a programme of related refactorings, which taken together result in a large-scale change to the architecture of the application. Along with the ability to release your software at any time, the ability to refactor is perhaps the most important benefit of developing on mainline.
Relationship to feature toggles. People often confuse branch by abstraction with feature toggles. Both are patterns that allow you to make changes to your system incrementally on mainline. The difference is that feature toggles are intended to allow the development of new features, while keeping those features invisible to users when the system is running. Feature toggles are thus used at deploy time or run time to choose whether a particular feature or set of features is visible in the application.
Branch by abstraction is a pattern for making large-scale changes to an application incrementally, and is thus a development technique. Branch by abstraction can of course be combined with feature toggles, such that - for example - you could switch between the iBatis and Hibernate implementations of a particular set of data access calls to compare the performance of them at runtime. But typically, the choice of implementation is chosen by the developers and either hard-coded or baked in at build time, perhaps through your dependency injection configuration.
Relationship to strangler application. The strangler application pattern involves incrementally replacing a whole system (usually legacy) with a completely new one. Thus it operates at a higher level of abstraction than branch by abstraction, which is for incrementally changing the implementation of a component of your system. The lines between the two start to blur if you have a service-oriented architecture.
Isn't this just good object-oriented design? Yes. Code which follows the SOLID principles makes it very easy to apply this pattern, in virtue in particular of following the dependency inversion principle and the interface segregation principle (ISP). The ISP is important because it provides a nice level of granularity at which to switch out implementations. As my colleague David Rice points out, branch by abstraction is the only sensible way to change the implementation of some particular component. Martin Fowler makes the same point by sometimes defining a component as some part of a system that can be swapped out for another implementation.
1 Another argument that is sometimes put forward is that distributed version control systems make merging so easy we shouldn't be afraid of branching. This is misleading for two reasons. Firstly, as Martin Fowler points out, automated merge tools are incapable of catching semantic conflicts. Second, the longer the branch exists, the harder it is to merge, even with the best tools in the world. You don't have to look too far on GitHub to find projects with forks that everyone would like to see merged, but have diverged so far from mainline that merging them would require substantial amounts of integration work.
2 Of course there are exceptions to every rule. Branches (other than for releases and spikes) are OK if you are working in a small, experienced team, and the branches are very short lived (less than one day).