Part 3 of 5
By: Dan Cohn, Sabre Labs
One of the main reasons to develop software in a monorepo is for ease of reuse – not just source code reuse but also the community use of linters, code beautification tools, commit templates, editor configurations, and more. If every project has its own build tool, it becomes more difficult to share libraries, create a common continuous integration (CI) pipeline, and switch between projects. Supporting fewer build tools means greater consistency and less complexity. But is it realistic to require everyone to use the same tool? And which one should it be?
By “build tool,” I’m referring to a build automation utility. There are many to choose from such as Make, CMake, Maven, Gradle, and Grunt. These are essential tools for software developers. From large applications to small microservices, the build process is often a lot more complicated than compiling some source files to produce a binary executable. Build tools orchestrate the downloading and/or compilation of source code dependencies, run automated tests, package various assets for deployment, and may perform other functions as well.
Why We Chose Bazel
When our monorepo was born, we chose Bazel as our primary build tool. Bazel is the open-source version of Google’s in-house Blaze tool. One of the attractive qualities of Bazel is that it can build code written in a variety of languages. It isn’t designed for a specific language in the way that Maven is mainly intended for Java, and CMake is well-suited for C/C++. In fact, Bazel is completely extensible, enabling it to build just about anything that lends itself to a hermetic approach. Hermetic means that builds shouldn’t rely on tools installed outside of the build environment, and build results are consistent regardless of the state of the system on which the builds are run.
In addition to being language-agnostic, Bazel is reliable, fast, scalable, and relatively easy to use. Unlike other build tools, notably Maven and Gradle, Bazel doesn’t rely on a wide-ranging ecosystem of plugins. It utilizes a common “workspace” that is typically shared among all projects in a monorepo. Bazel also includes a powerful query language that can analyze build dependencies, a capability that is extremely useful when creating a centralized build system for a monorepo (a topic we will cover in a future article in this series).
Bazel also has built-in support for remote artifact caching and remote build execution. Remote caching is easy to set up, especially when using Google Cloud Storage. With a remote cache we greatly improve build times on our continuous integration (CI) system, especially for C++ applications because they are built from sources down to the last transitive dependency, including third-party libraries. The remote artifact cache is also available to our developers, meaning that each version of an artifact is built just once and then shared across developer workstations. We are not yet taking advantage of remote execution, but it will become important to us in the future when we want to support larger and faster build jobs.
Nothing is perfect
Although Bazel has many benefits and is a good match for monorepo-based development, it has its share of drawbacks. For starters, very few developers have experience with Bazel and must therefore learn it from scratch. Of course, picking up a new tool or framework is part and parcel of being a software engineer, so this shouldn’t be a major impediment. The larger challenge is managing a common set of third-party dependencies. With Bazel, this is known as the “WORKSPACE” file. It imports the Bazel rules and external packages required by all projects in the monorepo. This generally means everyone must agree on a collective set of dependencies and, more importantly, the versions of these dependencies.
Is it necessary to have a single WORKSPACE for the entire repo? Technically, no, but as soon as you divide a monorepo into multiple workspaces, you no longer have a true monorepo. Anything outside a given workspace is treated as an “external” dependency, meaning that Bazel must import it from another workspace or download it from a central repository. Such an import may not even be possible due to conflicts in shared dependencies – the so-called “dependency hell” problem. In addition, you must manage each workspace independently and manually propagate global changes across all workspaces.
Another challenge is that some languages are better suited to Bazel than others. We found NodeJS to be a poor fit for Bazel, especially in a monorepo. This is partly because NPM, the standard build tool for Node projects, is both a build automation tool and package management tool. Bazel has build rules for NodeJS, but they require all projects in the workspace to use a common set of third-party Node modules and versions. Have you ever come across two Node applications with identical dependency lists? Not likely. Additionally, there is no way to declare an “internal” dependency between one Node module and another in the same repo. As a result, there is very little benefit in building NodeJS projects with Bazel. There is an adoption cost but no real value.
Solving the NodeJS problem
Solving the NodeJS problem
Use of these rules (in a “BUILD.bazel” file) looks something like this:
As you can see, this creates a simple adapter so that Bazel can build Node modules as nature intended (that is, with the “npm” tool). The advantage is that we can build any project in the monorepo with the bazel build command, and we can identify sources and dependencies with the bazel query command. This is critically important for CI, a topic to be covered in part 5 of this series.
The Java conundrum
To build Java projects, Bazel relies on a special set of rules known as rules_jvm_external to resolve third-party dependency versions and download Maven artifacts. This adds a layer of complexity but also creates an environment that is more familiar to many Java developers. On the downside, however, centralized management of Java dependencies is no small feat, and Bazel must spend time (often several minutes) installing Maven dependencies whenever there is a change (which is often in a large monorepo).
Unfortunately, these were not the only obstacles we encountered when we began converting existing Java applications to use Bazel. The migration process can be complex and time-consuming, especially for projects that rely on one or more of the many plug-ins available for Maven. Speaking of plug-ins, the Bazel plug-in for IntelliJ IDEA offered a poor user experience as compared with the tight integration between IntelliJ and Maven (or Gradle). With most of our developers using IntelliJ as their primary code editor/IDE (integrated development environment), this quickly became a significant pain point. (There are signs of improvement in this area with the announcement a few months ago of a collaboration between Bazel and JetBrains to improve and maintain the plug-in.)
An additional consideration for us is that almost all our Java applications rely on the Spring framework and Spring Boot in particular. Why does this matter? Well, disparate teams developed these applications over many years, meaning that each one has its own set of required Spring packages and versions. The full list of transitive dependencies can run into the hundreds. Moving from one set of versions to another is rarely an easy job. In fact, it often borders on impossible when considering other business priorities and commitments.
You can probably guess where this is going. After much angst among our developers and contemplation among our architects, we decided to reverse course on using Bazel as the sole build tool for Java apps. Instead, we’re taking the same “wrapper script” approach for Java as we did for Node. Maven is the build tool of choice for Java based on its prevalence at Sabre. Bazel runs Maven builds and tests “under the hood” using the “mvn” tool. It also tracks internal dependencies within the monorepo to ensure that the appropriate apps are rebuilt whenever a shared module or API is updated.
There are advantages and disadvantages to this non-traditional approach. Java developers are overwhelmingly pleased with the decision and report being more productive. Projects have independence in terms of dependency choices and the timing of upgrades. We maintain consistency across Java projects in the monorepo by building them with the same commands and the same JDK (for the most part). Many of them depend on a Maven BOM (bill of materials) containing a shared list of package versions for both internal and external dependencies. This BOM is versioned to allow for a controlled transition from one “release” to another, an activity that is not easily managed with rules_jvm_external.
On the flip side, we’ve lost the benefits of centralized dependency management, such as uniformity across projects and the accompanying ability to reuse Sabre internal libraries with no risk of transitive dependency conflicts. We’ve also sacrificed hermiticity of builds and remote artifact caching, two of the aforementioned features of Bazel. Hermiticity is less critical for us since we have a standardized developer environment for users of the monorepo. Everyone has the same version of Maven and the Java compiler. When it comes to artifact caching, Maven has its own local repository cache, but there is definitely room to improve build performance by using a remote caching extension like this one.
Returning to the original question: Is it realistic to use one build tool for all apps and services in a multi-language monorepo? The answer is somewhat nuanced. We’ve found Bazel to be a great unifier – an umbrella under which all builds can be performed through a common interface. However, Bazel may not be the best choice for every language or situation. The following table summarizes the choices we’ve made for various languages.
|Language||Native Build Tool||Bazel Wrapper?|
The next article in this series will delve into container-based development and how it provides a consistent environment for developers across the company.