Monorepo – Our Experience
> Moving to a monorepo didn't change much, and what minor changes it made have been positive.
I'm not sure that this statement in the summary jives with this statement from the next section: > In the previous, separate repository world, this would've been four separate pull requests in four separate repositories, and with comments linking them together for posterity.
>
> Now, it is a single one. Easy to review, easy to merge, easy to revert.
IMO, this is a huge quality of life improvement and prevents a lot of mistakes from not having the right revision synced down across different repos. This alone is a HUGE improvement where a dev doesn't accidentally end up with one repo in this branch and forgot to pull this other repo at the same branch and get weird issues due to this basic hassle.When I've encountered this, we've had to use another repo to keep scripts that managed this. But this was also sometimes problematic because each developer's setup had to be identical on their local file system (for the script to work) or we had to each create a config file pointing to where each repo lived.
This also impacts tracking down bugs and regression analysis; this is much easier to manage in a mono-repo setup because you can get everything at the same revision instead of managing synchronization of multiple repos to figure out where something broke.
Repository boundaries are affected far more by the social structure of your organisation than anything technical.
Do you want hard boundaries between teams — clear responsibilities with formal ceremony across boundaries, but at the expense of living with inflexibility?
Do you want fluidity in engineering, without fixed silos and a flat org structure that encourages anyone to take on anything that’s important to the business right now, but with the much bigger overhead of needing strong people leaders capable of herding the chaos?
I’m sure there are dozens of other examples of org structures and how they are reflected in code layout, repo layout, shared directories, dropboxes, chat channels, and email groups etc.
I love monorepos but I'm not sure if Git is the right tool beyond certain scale. Where I work doing a simple `git status` takes seconds due to the size of the repo. There has been various attempts to solve Git performance but so far this is nothing close to what I experienced at Google.
The Git team should really invest in tooling for very large repos. Our repo is around 10M files and 100M lines of code and no amount of hacks on top of Git (cache, sparse checkout etc etc) is not really solving the core problem.
Meta and Google have really solved this problem internally but there is no real open source solution that works for everyone out there.
Without indicating my personal feelings on monorepo vs polyrepo, or expressing any thoughts about the experience shared here, I would like to point out that open-source projects have different and sometimes conflicting needs compared to proprietary closed-source projects. The best solution for one is sometimes the extreme opposite for the other.
In particular many build pipelines involving private sources or artifacts become drastically more complicated than their those of publicly available counterparts.
Doing modular right is harder than doing monolithic right.
But if you do it right, the advantage you get is that you get to pick which versions of your dependencies you use; while quite often you just want to use the latest, being able to pin is also very useful.
The classic micro/multi repo mistake is reaching for more repos when you really need better tooling and permissions on single repo. People have probably wasted millions of engineer-hours across the industry with multiple repos, all because GitHub doesn't have expressive path-level permissions.
To me, monorepo vs multi-repo is not about the code organization, but about the deployment strategy. My rule is that there should be a 1:1 relation between a repository and a release/deployment.
If you do one big monolithic deploy, one big monorepo is ideal. (Also, to be clear, this is separate from microservice vs monolithic app: your monolithic deploy can be made up of as many different applications/services/lambdas/databases as makes sense). You don't have to worry about cross-compatibility between parts of your code, because there's never a state where you can deploy something incompatible, because it all deploys at once. A single PR makes all the changes in one shot.
The other rule I have is that if you want to have individual repos with individual deployments, they must be both forward- and backwards-compatible for long enough that you never need to do a coordinated deploy (deploying two at once, where everything is broken in between). If you have to do coordinated deploys, you really have a monolith that's just masquerading as something more sophisticated, and you've given up the biggest benefits of both models (simplicity of mono, independence of multi).
Consider what happens with a monorepo with parts of it being deployed individually. You can't checkout any specific commit and mirror what's in production. You could make multiple copies of the repo, checkout a different commit on each one, then try to keep in mind which part of which commit is where -- but this is utterly confusing. If you have 5 deployments, you now have 4 copies of any given line of code on your system that are potentially wrong. It becomes very hard to not accidentally break compatibility.
TL;DR: Figure out your deployment strategy, then make your repository structure mirror that.
Ok, but the more interesting part - how did you solve the CI/CD part and how does it compare to a multirepo?
a lot of comments here seem to think that mono-repo has to mean something about deployment. I just don't want to have to run git fetch and 5 different repos to get everything I need and that's good enough reason for me to use one.
As DevOps/SRE type person that occasionally gets stuck with builds, Monorepos world well if company will invest in the build process. However, many companies don't do well in this area and Monorepo blast radius becomes much bigger so individual repos it is. Also, depending on the language, building private repo is easy enough to keep all common libraries in.
I think the big issue around monorepo is when a company puts completely different projects together inside a single repo.
In this article almost everything makes sense to me (because that's what I have been doing most of my career) but they put their OTP app inside which suddenly makes no sense. And you can see the problem in the CI they have dedicated files just for this App and probably very few common code with the rest.
IMO you should have one monorepo per project (api, frontend, backend, mobile, etc. as long as it's the same project) and if needed a dedicated repo for a shared library.
Monorepos have their advantages, as pointed out, one place to review, one place to merge.
But it can also breed instability, as you can upgrade other people's stuff without them being aware.
There are ways around this, which involve having a local module store, and building with named versions. Very similar to a bunch of disparate repos, but without getting lost in github (github's discoverability was always far inferior to gitlab)
However it has its draw backs namely that people can hold out on older versions than you want to support.
All the pitfalls of a monorepo can disappear with some good tooling and regular maintenance, so much so that devs may not even realize that they are using one. The actual meat of the discussion is – should you deploy the entire monorepo as one unit or as multiple (micro)services?
Started to use a monorepo + worktrees to keep related but separated developments all together with different checkouts. Anybody else on the same path?
We're transitioning from a SVN monorepo to Git. We've considered doing a kind of best-of-both-worlds approach.
Some core stuff into separate libraries, consumed as nuget packages by other projects. Those libraries and other standalone projects in separate repos.
Then a "monorepo" for our main product, where individual projects for integrations etc will reference non-nuget libraries directly.
That is, tightly coupled code goes into the monorepo, the rest in separate repos.
Haven't taken the plunge just yet tho, so not sure how well it'll actually work out.
monorepos are appropriate for a single project with many sub parts but one or two artifacts on any given release build. But they fall apart when you have multiple products in the monorepo, each with different release schedules.
As soon as you add a second separate product that uses a different subset of any code in the repo, you should consider breaking up the monorepo. If the code is "a bunch of libraries" and "one or more end user products" it becomes even more imperative to consider breaking down stuff..
Having worked on monorepos where there are 30+ artifacts, multiple ongoing projects that each pull the monorepo in to different incompatible versions, and all of which have their own lifetime and their own release cycle - monorepo is the antithesis of a good idea.
I like to use the monorepo tools without the monorepo repo. If that makes any god damn sense. I use NX at my job and the monorepo was getting out of hand, 6 hour pipeline builds, 2 hours testing, etc. So I broke the repo into smaller pieces. This wouldn't have been possible if I wasn't already using the monorepo tools universally through the project but it ended up working well.
Some thoughts:
1) Comparing a photo storage app to the Linux kernel doesn't make much sense. Just because a much bigger project in an entirely different (and more complex) domain uses monorepos, doesn't mean you should too.
2) What the hell is a monorepo? I feel dumb for asking the question, and I feel like I missed the boat on understanding it, because no one defines it anymore. Yet I feel like every mention of monorepo is highly dependent on the context the word is used in. Does it just mean a single version-controlled repository of code?
3) Can these issues with sync'ing repos be solved with better use of `git submodule`? It seems to be designed exactly for this purpose. The author says "submodules are irritating" a couple times, but doesn't explain what exactly is wrong with them. They seem like a great solution to me, but I also only recently started using them in a side project
Every monorepo I've ever met (n=3) has some kind of radioactive DMZ that everybody is afraid to touch because it's not clear who owns it but it is clear from its quality that you don't want to be the last person who touched it because then maybe somebody will think that you own it. It's usually called "core" or somesuch.
Separate repos for each team means that when two teams own components that need to interact, they have to expose a "public" interface to the other team--which is the kind of disciplined engineering work that we should be striving for. The monorepo-alternative is that you solve it in the DMZ where it feels less like engineering and more like some kind of multiparty political endeavor where PR reviewers of dubious stakeholder status are using the exercise to further agendas which are unrelated to the feature except that it somehow proves them right about whatever architectural point is recently contentious.
Plus, it's always harder to remove something from the DMZ than to add it, so it's always growing and there's this sort of gravitational attractor which, eventually starts warping time such that PR's take longer to merge the closer they are to it.
Better to just do the "hard" work of maintaining versioned interfaces with documented compatibility (backed by tests). You can always decide to collapse your codebase into a black hole later--but once you start on that path you may never escape.