If All the World Were a Monorepo
Posted4 months agoActive3 months ago
jtibs.substack.comTechstoryHigh profile
calmmixed
Debate
60/100
MonorepoPackage ManagementR Programming Language
Key topics
Monorepo
Package Management
R Programming Language
The article discusses how CRAN, R's package repository, operates similarly to a monorepo by testing and enforcing compatibility across dependent packages, sparking a discussion on the pros and cons of this approach.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4d
Peak period
35
Day 4
Avg / period
18.3
Comment distribution73 data points
Loading chart...
Based on 73 loaded comments
Key moments
- 01Story posted
Sep 16, 2025 at 4:33 AM EDT
4 months ago
Step 01 - 02First comment
Sep 19, 2025 at 9:19 PM EDT
4d after posting
Step 02 - 03Peak activity
35 comments in Day 4
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 30, 2025 at 12:50 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45259623Type: storyLast synced: 11/20/2025, 3:38:03 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
That's the objective function of Hastie et al's GLM. I had a good chuckle when I realized the author's last name is Tibshirani. If you know you know.
When you propose a change to something that other things depend on, it makes sense to test those dependents for a regression; this is not earth shattering.
If you want to change something which breaks them, you have to then do it in a different way. First provide a new way of doing something. Then get all the dependencies that use the old way to migrate to the new way. Then when the dependents are no longer relying on the old way, you can push out a change which removes it.
Almost every other package repo works differently and publishing packages which would break other packages is more than common, it is the standards way to publish updates.
>Then get all the dependencies that use the old way to migrate to the new way. Then when the dependents are no longer relying on the old way, you can push out a change which removes it.
This is actually how no software repo works as it is actually insane.
- Breaking things obviously insane rather than not breaking things.
- Staged obsolescence is sane compared to breaking things, and compared to installations carrying multiple versions of the same package.
There being single version of every package in any given installation, and every package carefully managed for backwards compatibility, removing features only when everything has migrated off them, is utterly sane.
There may be exceptions. (The idea that there are never exceptions is insane). Suppose that it is discovered that there is no secure way of using some API, and there is a real threat: it has to be removed.
I think that a good way forward would be to identify the packages which use the API, and contact all the maintainers for an emergency conference.
The package system should also be capable of applying its own patches to packages; if something must break for backwards compatibility, the package system should provide patches to fix the broken packages until the upstreams develops their own fixes.
In almost every case it is the sole duty of the people using the packages to ensure they adhere to whatever standards they desire. This is how packaging software works in basically every case.
>The package system should also be capable of applying its own patches to packages; if something must break for backwards compatibility, the package system should provide patches to fix the broken packages until the upstreams develops their own fixes.
This is just you not understanding software. This is obviously not possible, it also is not desirable.
I like the perspective presented in this article, I think CRAN is taking an interesting approach. But this is nuts and bolts. Explicitly saying you're compatible with any future breaking changes!? You can't possibly know that!
I get that a lot of R programmers might be data scientists first and programmers second, so many of them probably don't know semver, but I feel like the language should guide them to a safe choice here. If CRAN is going to email you about reverse dependencies, maybe publishing a package with a crazy semver expression should also trigger an email.
I kind of like it in a way. In a lot of eco systems it's easy for package publishers to be a bit lazy with compatibility which can push a huge amount of work on package consumers. R seems similar to go in this regard, where there is a big focus on not breaking compatibility which then means they are conservative about adding new stuff until they're happy to support it for a long time.
In a true monorepo — the one for the FreeBSD base system, say — if you make a PR that updates some low-level code, then the expectation is that you 1. compile the tree and run all the tests (so far so good), 2. update the high-level code so the tests pass (hmm), and 3. include those updates in your PR. In a true centralized monorepo, a single atomic commit can affect vertical-slice change through a dependency and all of its transitive dependents.
I don’t know what the equivalent would be in distributed “meta-monorepo” development ala CRAN, but it’s not what they’re currently doing.
(One hypothetical approach I could imagine, is that a dependency major-version release of a package can ship with AST-rewriting-algorithm code migrations, which automatically push both “dependency-computed” PRs to the dependents’ repos, while also pushing those same patches as temporary forced overlays onto releases of dependent packages until such time as the related PRs get merged. So your dependents’ tests still have to pass before you can release your package — but you can iteratively update things on your end until those tests do pass, and then trigger a simultaneous release of your package and your dependent packages. It’s then in your dependents’ court to modify + merge your PR to undo the forced overlay, asynchronously, as they wish.)
Jane Street has something similar called a "tree smash" [1]. When someone makes a breaking change to their internal dialect of OCaml, they also push a commit updating the entire company monorepo.
It's not explicitly stated whether such migrations happen via AST rewrites, but one can imagine leveraging the existing compiler infrastructure to do that.
[1]: https://signalsandthreads.com/future-of-programming/#3535
ideally yes. However, such a monorepo can become increasingly complex as the software being maintained becomes larger and larger (and/or more and more people work on it).
You end up with massive changes - which might eventually become something that a single person cannot realistically contain within their brain. Not to mention clashes - you will have people making contradictory/conflicting changes, and there will have to be some sort of resolution mechanism outside (or the "default" one, which is first come first served).
Of course, you could "manage" this complexity by attributing api boundary/layers, and these api changes are deemed to be important to not change too often. But that simply means you're a monorepo only in name - not too different from having different repos with versioned artefacts with a defined api boundary.
That said you need of course some tooling to somehow discover all the callers reliably and do those migrations on a large scale.
Easier to do if all the code is owned by one org but harder if you can’t reliably tell who’s using your APIs.
However having centralized migrations is really saving a lot of work for the org.
You have visibility into who is using what and you still get to do an atomic update commit even if a commit will touch multiple boundaries - I would say that's a big difference. I hated working with shared repos in big companies.
Also the other problem of a big monorepo is that nothing ever dies. Let's say you have a library and there are 1000 client programs or other libraries of your API. Some of them are pretty popular and some of them are fringe.
However when you are changing the API they all have the same weight. You have to fix them all. In the non monorepo case the fringe clients will eventually die or their maintainer will invest on them and update them. It's like capitalism vs communism with central planning and all.
Bazel has the concept of visibility where while you are developing something in the tree, you may explicitly say who can use it (like trial version).
But the point is, if something is build, it must be tested, and coverage should catch what is build, but not tested, but also should catch what is build and tested but not really used a lot.
But why remove it, if it takes no time to build & test (?), and if it takes more time to test, it's usually on your team to start your own testing env, and not rely on the general presubmit/preflight one, and because since the last capacity planning you have only that amount of budget, you'll soon realize - do we really need this piece of code & the tests?
I mean it's not perfect, there would be always something churning using time & money, but until it's pretty big problem it won't go away automatically (yet)
I mean, to use a different metaphor, an incremental rollout is all fine and dandy until the old code discovers that it cannot work with the state generated by the new code.
For example a web api that talks to a database but is deployed with more than one instance that will get rolling updates to the new version to avoid any downtime. There will be overlapping requests to both old and new code at the same time.
Or if you want to do a trial deployment of the new version to 10% of traffic for some period of time.
Or if it’s a mobile or desktop installed app that talks to a server where you have to handle people using the previous version well after you’ve rolled out an update.
Read the actionable part of the "dependency error" mail again:
> Please reply-all and explain: Is this expected or do you need to fix anything in your package? If expected, have all maintainers of affected packages been informed well in advance? Are there false positives in our results?
This is not a hard fail and demand that you go back and rewrite your package. It's also not a demand for you to go out on your own and write pull requests for all the dependent packages.
The only strict requirement is to notify the dependents and explain the reason of that change. Depending on the nature of the change, it's then something the dependents can easily fix themselves - or, if they can't, you will likely get feedback what you'd have to change in your package to make the migration feasible.
In the end, it's a request for developers to get up and talk to their users and figure out a solution together, instead of just relying on automation and deciding everything unilaterally. It's sad that this is indeed a novel concept.
(And hey, as a side effect: If breaking changes suddenly have a cost for the author, this might give momentum to actually develop those automated migration systems. In a traditional package repository, no one might even have seen the need for them in the first place)
> CRAN had also rerun the tests for all packages that depend on mine, even if they don’t belong to me!
Another way to frame this is these are the customers of your package's API. If you broke them you are required to ship a fix.
I see why this isn't the default (e.g. on GitHub you have no idea how many people depend on you). But the developer experience is much nicer like this. Google, for example, makes this promise with some of their public tools.
Outside the word of professional software developers, R is used by many academics in statistics, economics, social sciences etc. This rule makes it less likely that their research breaks because of some obscure dependency they don't understand.
Maybe there are some massive footguns I'm not aware of but python is mostly oriented around variables rather than pipelines so it never seems to flow as well as R
> But the migration had a steep cost: over 6 years later, there are thousands of projects still stuck on an older version.
This is a feature, not a bug. The pinning of versions allows systems to independently maintain their own dependency trees. This is how your Linux distribution actually remains stable (or used to, before the onslaught of "rolling release" distributions, and the infection of the "automatically updating application" into product development culture, which constantly leaves me with non-functional Mobile applications whereupon I am forced to update them once a week). You set the versions, and nothing changes, so you can keep using the same software, and it doesn't break. Until you choose to upgrade it and deal with all the breaking shit.
Every decision in life is a tradeoff. Do you go with no version numbers at all, always updating, always fixing things? Or do you always require version numbers, keeping things stable, but having difficulty updating because of a lack of compatible versions? Or do you find some middle ground? There are pros and cons to all these decisions. There is no one best way, only different ways.
No, that's the opposite of a monorepo (w/continuous integration). A monorepo w/continuous integration does not maintain any list of dependencies or relationships, by design. Every single commit is one global "version" which represents everything inside the repo. Everything in the repo at that commit, is only guaranteed to work with everything else in the repo in that commit. You use continuous integration (w/quality gates) to ensure this, by not allowing merges which could possibly break anything if merged.
Maintaining a DAG of dependencies is a version pinning strategy, the opposite of the continuous integration version-less method. It is intended for external dependencies that do not exist in the current repository - which is why it's used for multi-repos, not monorepos.
But as I originally pointed out, you can have a monorepo where everything is version-pinned (not using continuous integration). It's just not the usual example.
Each component within the monorepo will declare which other components it depends on. When a change occurs, the CI system figures out which components have changed, and then runs tests/build/etc for those components and all their dependencies. That way, you don't need to build the world every time, you just rebuild the specific parts that might have changed.
I think that specific concept (maintaining a single "world" repository but only rebuilding the parts that have changed in each iteration) is what the author is talking about here. It doesn't have to be done via a monorepo, but it's a very common feature in larger monorepos and I found the analogy helpful here.
What a monorepo gives you on top of that is that you can change the dependents in the same PR
I think that the laws of physics dictate that there is. If your developers are spawning the galaxy, the speed of development is slower with continuous development than with pinning deps.
Saying “We don’t know.” feels more wrong to me than “We know.” (emphasis on the periods).
The term clickbait comes to mind.
So rolling releases are like an unstable/testing branch, with more effort put into keeping it from breaking. So you get new software all the time. The downside is, you also don't get to opt-out of an upgrade, which can be pretty painful when the upgrade breaks something you're used to.
The actual trade off is end user experience and ease, vs package developer experience and ease. It is not about updating R or a package; it is when somebody tries to create or run a project not getting into a clash of dependencies for reasons that can hardly be controlled by either them or the package developer.
I would hope the other aspirational software distribution systems (pip, npm, et al) ALSO do that, but according to this article, I guess they don't? Not shocked , to be honest
Say I have software written that runs just fine, but has not been updated to the latest runtime of Python or Node (as per your example). Perhaps a dependency I use has a broken recent version, but the old version I use works fine. You remove the package, now it breaks my software. This would effectively make it so that all libraries / dependencies that are "abandoned" by the author or inactive, would be deleted, which then results in all the software that used them to also break.
Unless I misunderstood something?
This system is unworkable.
When a third party dep is broken or needs a workaround, just include a patch in the build (or fork). Then those patches can be upstreamed asynchronously without slowing down development.
The way a software developer thinks about a package is totally different to the way someone trying to perform statistical analysis thinks about packages.
This is the same for CTAN, the name is no coincidence. The packages are for users and not developers.
The downside is that dependees have to manually change their dependency and you get proliferation of packages with informal relationships.
Automated tests, compilation by the package publisher, and enforcement of portability flags and SemVer semantics.
Meanwhile I read the material and t absolutely feels like a cult - "R is fun" is like something they say to persuade themselves they are not in a cult
> In the years since, my discomfort has given away to fascination. I’ve come to respect R’s bold choices, its clarity of focus, and the R community’s continued confidence to ‘do their own thing’.
I would love to see a follow-up article about the key insights that the author took away from diving more deeply into R.
I then discovered that there are often bugs with many of the python stats packages. Many python numerical packages also have the reputation of changing how things work "under the hood" from version to version. This means that you can't trust your output after a version change.
Given all of the above, I can see why "serious" data scientists stick with R and this article is just another reason why.
That essentially makes the high level project a monorepo while giving you the option to work on the submodule on its own.
Wait a second. Another package failed your MAJOR version upgrade because you changed your API? Not following semver is crazy for any package manager to enforce.
I mean, just look at how many projects use “curl and bash” as their distribution method even though the project repositories they could use instead don’t even require anything nearly as onerous as the reverse dependency checks described in this article. If the minimal requirements the current repos have are enough to push projects to alternate distribution, I can’t imagine what would happen if it was added.
Zero wouldn't have been surprising to me, nor would several hundred, but two... what a conveniently actionable number.
It has me wanting to give names to some of my hacks and publish them as packages so that people are more more aware when their changes are breaking changes. On the other hand, if I do something weird, I don't necessity want to burden others with maintaining it. Tradeoffs...
2 more comments available on Hacker News