Large-scale continuous integration requires code modularity

Where large development teams and codebases are involved, code modularity is a key enabler for continuous delivery. At a high level this shouldn’t be too terribly surprising—it’s easier to move a larger number of smaller pieces through the deployment pipeline than it is to push a single bigger thing through.

But it’s instructive to take a closer look. In this post we’ll examine the relationship between continuous integration (which sits at the front end of the continuous delivery deployment pipeline) and code modularity. Code modularity helps at other points in the pipeline—for example, releases—but we’ll save that for perhaps a future post.

The impact of too many developers working against a single codebase

For a given codebase, continuous integration (CI) scales poorly with the number of developers. Fundamentally there are a couple of forces at work here: with increasing developers, (1) the size of the codebase increases, and (2) commit velocity increases. These forces conspire in some nasty ways to create an painful situation around builds. Let’s take a look:

  • Individual builds take longer. As the size of the codebase increases, it obviously takes longer to compile, test, deploy, generate reports and so forth.
  • More broken builds. Even if developers are disciplined about running private builds before committing, any given commit has a nonzero chance of breaking the build. So the more commits, the more broken builds.
  • A broken build has a greater impact. In a “stop the line” shop, more developers means more people blocked when somebody breaks the build. In other shops, people just keep committing on top of broken builds, making it more difficult to actually fix the build. Either way it’s bad.
  • Increased cycle times. After a certain point, the commit velocity and individual build times, taken together, become sufficiently high that the CI builds run constantly throughout the workday. In effect the build infrastructure is unavailable to service commits on a near-real-time basis, which means that developers must wait longer to get feedback on commits. It also means that when builds do occur, they involve multiple stacked-up commits, making it less clear exactly whose change broke the build. This again increases feedback cycle times. (Note that there are some techniques outside of modularization that can help here, such as running concurrent builds on a build grid.) Once the feedback cycle takes more than about ten or fifteen minutes, developers stop paying attention to build feedback.
  • Individual commits become more likely to break the build. Even though the global commit velocity increases, individual developers may commit less often because committing is a painful and risky activity. Changelist sizes increase, which makes any given commit more likely to result in a broken build.
  • Delayed integration. Painful and risky builds create an incentive to develop against branches and merge later, which is exactly the opposite of continuous integration. Integrations involving such branches consume disproportionately more time.
  • General disruption of development activities. Ultimately the problems above become very serious indeed: developers spend a lot of time blocked, and the situation becomes a huge and costly distraction for both developers and management.
  • Difficult to make improvements. When everybody is working on the same codebase, it’s harder to see where the problems are. It could be that a certain foundational bit of the architecture is especially apt to break the build, but there aren’t enough tests in place for it. (Meanwhile some other highly stable part of the system is consuming the “build budget” with its comprehensive test suite.) Or perhaps certain teams are have better processes in place (e.g., a policy of running private builds prior to committing) than others. Or it may be that some individual developers are simply careless about their commits. It’s hard to know, and thus difficult to manage and improve.

There are various possible responses to the challenges above. One can, for example, scale the build infrastructure either vertically (e.g., more powerful build servers) or horizontally (e.g., build grids to eliminate build queuing). Another tactic is to manage test suites and tests themselves more carefully: individual tests shouldn’t run too long, test suites shouldn’t run too long, etc. Make sure people are using doubles (stubs, mocks, etc.) where appropriate. Etc. But such responses, while genuinely useful, are more like optimizations than root cause fixes. Vertical scaling eventually hits a wall, and horizontal scaling can become expensive if resources are treated as if they’re free, which often happens with virtualized infrastructure. Limiting test suite run times is of course necessary, but if it’s done over too broad a scope, it results in insufficient coverage.

The root cause is too many cooks in the kitchen.

Enable continuous integration by modularizing the codebase

It would be incorrect to draw the conclusion that continuous integration works only for small teams. CI works just fine even with large teams developing to large codebases. The trick is to break up the codebase so that everybody isn’t committing against the same thing. But what does that mean?

Here’s what it doesn’t mean: it doesn’t mean that each team should branch the codebase and work off of branches until it’s time to merge. This just creates huge per-branch change inventory that has to be integrated at the end (or more likely toward the middle) of the release. Again this is the opposite of continuous integration.

Instead, it’s the codebase itself that needs to be broken up. Instead of one large app or system with a single source repo and everybody committing against the trunk, the app or system should be modularized. If we can carve that up into services, client libraries, utility libraries, components, or whatever, then we should do that. There’s no one-size-fits-all prescription for deciding when a module should get its own source repo (as opposed, say, to having a bunch of Maven modules in a single source repo), but we can apply judgment based on the coherence and size of the code as well as the number of associated developers.

Modularizing the code helps with the various continuous integration problems we highlighted above by reducing the size of the build, reducing the commit velocity, and removing incentives to delay integration. It has other important advantages outside of continuous integration, such as decoupling teams from a release planning perspective, making it possible to be more surgical when doing production rollbacks, and so forth. But the advantages to continuous integration are huge.

Note that code modularization brings its own challenges. Code modules require interfaces, which in turn require coordination between teams. SOA/platform approaches will likely require significant architectural attention to address issues of service design, service evolution, governance and so forth. Moreover there will need to be systems integration testing to ensure that all the modules play nicely together, especially when they are evolving at different rates in a loosely coupled fashion. But the costs here are enabling in nature, with a return on investment: greater architectural integrity and looser coupling between teams. The costs we highlighted earlier in the post are pure technical debt.

This entry was posted in Architecture, Continuous integration, Devops principles and tagged , , . Bookmark the permalink.

Leave a reply