Peter Streef

Domain Service

Putting it together

How many microservices should a team own? I believe the answer is one. In this post I explain why and how my team went from running 20+ Microservices to 1 Domain service, achieve a much nicer developer experience and save big on cloud costs.

🌱 Starting small Micro

Ever since getting serious about microservices at Rabobank I wondered if we are doing it right. There wasn’t much guidance on an organisational level. The introduction was basically “Hey let’s go do this microservice thing 3-2-1 go!” and after that silence. I found out that there is a name for it: scattershot adoption.

Another anti-pattern that I’ve encountered is Scattershot adoption which occurs when multiple application development teams attempt to adopt the microservice architecture without any coordination. Chris Richardson’s blog

To be fair the runtime environment (Pivotal cloud foundry) was properly set up for us, but for pipelines, observability and the deeper questions like “What is micro?” teams were all scrambling to figure it out.

In the end it took us way too long to standardize even the slightest, but when we started cooporating things started to go well. When our microservice count hit around 10, our CI/CD was running quite smoothly.

🌲 Scaling up

10 microservices might sound like a lot already, but we we’re far from done. With the introduction of the CRUFD framework we we’re adding new configuration flows and supporting systems left and right. By the time we surpassed 20 microservices things really started to creak, and the end was not yet in sight.

I started to get a bit of an understanding then what it meant to be focussing on the micro part too much. The situation was not suistainable so I decided to do something about it.

🙈 The bad and the ugly of microservices

There was much room for improvement in our system, but to determine what to do I first had to identify what was really holding us back.

libraries

When dealing with 20+ microservices in a single team, you will be hard pressed to do anything other than maintenance without extracting cross-cutting code into libraries. This should reduce the size of each services codebase and can make testing easier. However, it also comes with a trade-off.

When a change is made in these libraries it is often only tested in the service the change is initally required for. When upgrading other services odds are that there is something that conflicts.

This problem came up a lot in my team and cost a significant amount of time to re-write, re-test and re-release these libraries.

Distributed monolith

Usually teams are build around a specific domain, and when microservices are built with high granularity a domain level feature change will often spans across multiple services. This indicates you are building what is called a Distributed monolith.

Distributed Monolith is a system that resembles the microservices architecture but is tightly coupled within itself like a monolithic application.

As a result the feature ends up in multiple Pull Requests in multiple repositories which makes the changes cumbersome to review and release.

Distributed failures

When something goes wrong in a microservice architecture (especially when using synchronous communication) errors are often propogated through multiple services. Not going into how annoying that is for users, it is also annoying for developers, as it makes it hard to figure out where the real problem is. As a result a lot of time is spent searching logs and traces to figure out where the root of a problem can be found.

Running cost

When you want to achieve High Availability you should run multiple instances of each application in multiple regions. This means at least 4 instances for each production application.

My team was running somewhere close to 130 instances on production alone, while (during to the nature of Pension products) we had only a few concurrent users at a time. This is obviously a huge waste of resources.

🪓 Scaling down

A simple step to scaling down the number of services is to merge ones that are closely related. After doing this a few times successfully and seeing the advantages far outway trade-offs I suggested to try something a bit more radical: “Why don’t we just merge everything into 1 service?”

To put a little more weight behind the suggestion I argued that we were not getting any advantages that microservices supposedly bring by having things split within our team.

I based my arguments on some of the “key benefits” a google search will yield:

  • Lower Costs & Increased Efficiency: By splitting services in a team overhead (and costs) go up and efficiency will likely go down.
  • Increased Agility and Scalability: Agility we did not see, Scalability we did not use.
  • Easier Maintenance and Updating: Maintenance is hard if you have to do simple library updates in every service.
  • Faster Time to Market: Not when the change spans multiple services.
  • Improved Fault Tolerance: Only if a service being down is actually not exposing something that is required.
  • Increased Modularity: You dont need microservices to have modularity, so this is not really an advantage worth talking about.
  • Deployed Independently: Big advantage if teams dont share services, but for multiple services per team it does not hold up.

The magic number

I had my suspisions that 1 might just be the perfect number of “micro” services for a team. I found out later that this is exactly what is recommended when starting out.

A team should ideally own just one service since that’s sufficient to ensure team autonomy and loose coupling and each additional service adds complexity and overhead. A team should only deploy its code as multiple services if it solves a tangible problem, such as significantly reducing lead time or improving scalability or fault tolerance. microservices.io

With these arguments I convinced my team that this would be worth trying.

🪐 Project: Domain service

So it was decided, we were going to merge all our services into a single deployable unit. We did not want to call it a monolith as we did not want to confuse people too much.

My collegue found this blogpost by Nick Tune and with his suggestion we started using Domain service as a name for our project. The most important point there to get away from the “micro” part of the name to take focus to what matters: the domain.

Even though the application architecture did not change much during the project, the system architecture (primarily the build system) changed drastically.

Monorepo

All our applications, libraries and tool projects were merged into a single git monorepo with a separate build pipeline per project in Azure Devops. This was actually already done before we started talking about the domain service, but it is an important requirement for the domain service project, so it’s still worth noting.

Gradle composit build

An important goal was improved developer experience, and a downside we expected to see was that when loading all 20+ services into the IDE it would be too slow to work efficiently. So we opted to keep the ability to load a specific project with its dependencies into the IDE and avoid it being slowed down by the projects that did we did not need. Gradle composit build allows you to combine builds of separate projects and define builds based on which projects are loaded. It also works very well with build caching.

Another advantage of the composit build is the ability to load binary dependencies as project dependencies and directly seeing the effect of code changes by triggering builds of all depending projects. This solved the library re-release problem and made updating cross cutting code much easier.

🔬 Microservice tuesdays

Because not everyone was as convinced as us, we chose to execute the project as one big experiment. Which meant we had to have an easy way to go back, while not standing in the way of constant progress (read: building new features). We did not want to have multiple long lived branches or keep solving merge conflicts so we decided switching between running separate microservices and a single domain service should be trivial, later referred to as “Microservice Tuesdays” for the fact that we could switch deployable units every day if we would want to.

Spring context

The trick to doing this is to use different spring contexts. Every microservice has it’s own application module with a @SpringBootApplication class and application configuration files. These modules loaded in service modules which had all the relevant spring components, services and configurations.

Next to that we have an application module for the Domain service, which loaded all the same service modules.

Depending on a property passed to the gradle build we can either build multiple jars from the microservice application modules or a single one from the domain service module. The pipelines were set up to then deploy whichever are being built and so a new build with 1 property change was enough to run one or the other configuration.

localhost

There are some challenges to running multiple services or one. Mainly: HTTP. When running microservices most communication was over REST API using feign clients. Since we wanted to keep the ability to run microservices, we made a fairly simple configuration change. When running the domain-service we would just loop back all service URLS to http://localhost:8080. While this is a bit of a rediculous overhead to run in production it made it very easy to prove our case. After which it should be easy to remove both client and controller and tie directly into the layer underneith.

🤑 Show me the money!

When all was said and done the domain-service ran like clockwork. We did not end up merging everything we ran, as some things made sense to keep separate (monitoring tools for instance). We did however reduce our compute cloud costs by 80%. From around 130 instances of applications to only 30 consisting of 5 applications with 4 prod and 2 dev instances.

Also the developer experience improved immensely. It became super easy to update and test libraries, re-releasing no longer happened and due to the build caching we were never building or testing too much.

The only real downside to this move was that if we would want to give away a part of our functionality to another team we would have to extract it, but by keeping the modules nicely separated (with build time enforcement of the seperation) we can ensure this can never be too hard.