Why you need a devops platform

Until fairly recently, I thought of devops mostly in terms of various sorts of automation: build, test, deployment, operations and SDLC. And while it’s true that automation is key to devops, I think it’s secondary to something even more fundamental.

That something is integration.

Integration is the foundation of devops

If automation is the how, then integration is the what. Let’s look at the three key categories.

Process integration. Even the name “devops” gives it away: we’re trying to integrate processes across development and operations. We want an integrated deployment pipeline that carries developer commits into production with the right lifecycle workflow automation in place. When ops has a critical patch to deploy, we want to see the patch testing process coordinated with the development test process. We want to control configuration drift using a combination of infrastructure provisioning, middleware deployment, app deployment and deployment blueprinting. All of these are examples where it makes sense to create processes that span the traditional boundary between development and ops.

Tool integration. So process integration is the goal, but to make this possible, we need tool integration. When developers commit code, we want the continuous integration server to launch a build. The build automation needs to be able to invoke test automation (commit tests, integration tests and so forth). When the CI server finishes a build, we want it to drop the artifact in an artifact repository to support cross-team CI (in the case of libraries) or deployments into target environments (in the case of apps and services). The development automation needs to be able to grab the artifacts, invoke VM provisioning APIs (like EC2), launch installers, run automated smoke tests and so forth. Promotion into downstream environments requires knowing what the entry and exit gates are for each environment, as well as knowing whether the build in question “passed” any given environment.

Data integration. Besides tool integration, we need data integration. Ideally the same team membership data that drives deployment ACLs drives the list of subject matter experts in the operational runbook. The list of apps in the deployment tool should be the same one that the build tool sees, the same one that the configuration repositories see, the same one that the NOC sees in the monitoring tools and so forth.

Automation is important in establishing the integrations, but it’s secondary in that it supports the integrations. We could have a manual process in which we copy data from one system into another system, and that would count as an integration when you take a system view. But it’s a slow and error-prone integration. Automation makes it a lot better.

By now you may agree that integration is vital to devops. But why do we need a platform? Aren’t ad hoc, point-to-point integrations good enough?

Ad hoc integration doesn’t scale

From the previous discussion we can see that there’s a lot of integration that has to happen at the process, tool and data levels. When dealing with just a few ad hoc integrations, it may be acceptable to establish point-to-point integrations with bailing wire and duct tape. But integration is pervasive to a fully-baked devops effort, and as you start scaling up, you’ll quickly find yourself needing something more systematic for the data integrations alone.

Here’s what happens when you’re not sufficiently systematic in your approach.

I’m the lead architect for a fairly large devops effort. We recently built an asset inventory that quickly established solid data by virtue of its closed loop design. Within the space of months there were several systems pulling data out of it, including:

  • a CMDB
  • a change management system
  • a deployment automation system
  • a VM sprawl management system
  • a patching system
  • two separate monitoring/diagnostics systems
  • a runbook application
  • an incident logging system
  • several management dashboards
  • an IT reporting system
  • a department goal-tracking system

I may be forgetting a couple. Suffice it to say that even in my wildest dreams I didn’t expect to see so many integrations with our asset inventory in such a short timeframe. So when we wrote the asset inventory’s web service, we didn’t initially pay much attention to service design and implementation issues such as versioning, fixed schemas, pagination, rate limiting, separating domain objects from DTOs and so on. We just attached JAXB annotations to the domain objects (i.e., brittle code-first design with tight coupling to a rapidly evolving domain model) and let ‘er rip.

Even early on we were pretty regularly breaking integrations with the management dashboards (one of the first integrations), but since the guys doing the management dashboards were on our team, we were able to manage the problem internally.

Once people outside our team began attaching their systems to our inventory, however, it was a totally different story. Schema changes in the asset inventory would take out two or three client systems in one blow. People were sending angry e-mails right and left with escalations to VPs complaining that the broken integration was going to cause this or that project schedule to slip. It didn’t matter that we had always advertised the web service as “brittle” and not even an API. There were enough people using it that the distinction between “putting something quick and dirty out there” and a formal contract—signed in blood—was completely academic. People expected the thing to stabilize, and they even started sending in requests (requirements!) for how the API should work.

That’s when it dawned on us that it wasn’t enough to go around calling our systems a platform. It had to actually be a platform.

Devops integrations require a platform

We knew at the outset that over time, our primary “users” would be automated rather than human. The shift is still underway, so we’re in the process of stabilizing APIs, establishing infrastructure (e.g., a message bus to coordinate CRUD operations on CIs across standalone tools) and so forth. But forces are definitely driving us toward a platform-based design. I don’t necessarily want to try to define “platform” here, but here are some of the platform-ish things that became important once we started seeing increased adoption of our tools and data:

  • communicating through service interfaces—even the UIs should use service interfaces where possible since this helps automation do much of what people can do
  • versioned APIs and associated design/implementation approaches (e.g., separate domain objects from DTOs)
  • well-defined schemas
  • service authentication and authorization
  • message privacy and integrity, both in transit and at rest
  • performance and availability protections (e.g., pagination, rate-limiting, circuit breakers, etc.)
  • messaging infrastructure to coordinate independent tools (commercial, open source, in-house)
  • integrated testing

Some of these may not be important right out of the gate, but you should keep them in mind so that your design at the very least doesn’t make them hard to add in when the time is right.

I’d be very interested to hear experiences that others have had in migrating toward a platform-based design.

This entry was posted in Devops principles and tagged , , , , , , . Bookmark the permalink.

Leave a reply