Sourcefs: a 2h+ Android Build Becomes a 15m Task with a Virtual Filesystem

about 1 month ago

1 reply

We actually picked a fairly conservative number - there are even larger automotive codebases today.

For example, Mercedes’ MB.OS: “is powered by more than 650 million lines of code” - see: https://www.linkedin.com/pulse/behind-scenes-mbos-developmen...

menaerus

about 1 month ago

650M LoC is certainly not a single codebase that you can "checkout" and "build". Also, the figure is a little bit hard to believe.

api

about 1 month ago

1 reply

Could you just do the build in /dev/shm?

about 1 month ago

No. `/dev/shm` would just be a build in `tmpfs`.

Though from what I gather form the story, part of the spedup comes from how android composes their build stages.

I.e. speeding up by not downloading everything only helps if you don't need everything you download. And adds up when you download multiple times.

I'm not sure they can actually provide a speedup in a tight developer cycle with a local git checkout and a good build system.

about 1 month ago

2 replies

While it looks like at least some of the team are ex-googlers, this isn't the srcfs we know from piper (Google internal tools).

Looks like it's similar in some ways. But they also don't tell too much and even the self-hosting variant is "Talk to us" pricing :/

about 1 month ago

1 reply

WDYM this seems very familiar. At commit deadbeef I don't need to materialize the full tree to build some subcomponent of the monorepo. Did I miss something?

And as for pricing... are there really that many people working on O(billion) lines of code that can't afford $TalkToUs? I'd reckon that Linux is the biggest source of hobbyist commits and that checks out on my laptop OK (though I'll admit I don't really do much beyond ./configure && make there...)

about 1 month ago

Oh yea, this is "srcfs the idea" but not "srcfs the project".

I.e. this isn't something battel tested for hundreds of thousands of developers 24/7 over the last years. But a simple commercial product sold by people that liked what they used.

Well, since android is their flagship example, anyone that wants to build custom android releases for some reason. With the way things are, you don't need billions of code of your own code to maybe benefit from tools that handle billions of lines of code.

about 1 month ago

3 replies

Google or Meta needs to open source their magic VFSes. Maybe Meta is closest with EdenFS.

mattnewton

about 1 month ago

1 reply

I have thought about this, but also wondered if it would be as magic without the highly paid team of fantastic SRE and maintainers, and the ridiculous amount of disk and compute available to them.

about 1 month ago

1 reply

I imagine it would be as magic as blaze vs bazel out in the wild. That is, you need still someone(s) to do a ton of hard work to make it work right but when it does you do get the magic.

about 1 month ago

2 replies

You’re absolutely right - SrcFS and EdenFS were inspirations for SourceFS.

The challenge with those systems is that they’re tightly coupled with the tools, infrastructure, and even developer distros used internally at Google and Meta, which makes them hard to generalize. SourceFS aims to bring that “Piper-like” experience to teams outside Google - but in a way that works with plain Git, Repo, and standard Linux environments.

Also, if I’m not mistaken, neither SrcFS nor EdenFS directly accelerate builds - most of that speed comes from the build systems themselves (Blaze/Buck). SourceFS goes a step further by neatly and simply integrating with the build system and caching/replay pretty much any build step.

The Android example we’ve shown is just one application - it’s a domain we know well and one where the pain is obvious - but we built SourceFS in a way where we can easily integrate with a new build system and speed up other big codebases.

Also you’re spot on that this problem mostly affects big organizations with complex codebases. Here without the infrastructure and SRE support the magic does not work (e.g. think the Redis CVE 10.0 of last week or the AWS downtime of this week) - and hence the “talk to us”.

We plan to gradually share more interesting details about how SourceFS works. If there’s something specific you’d like us to cover - let us know - and help us crowd source our blogpost pipeline :-).

Mattwmaster58

about 1 month ago

1 reply

It's a shame that AI is ruining certain phrases, the "You’re absolutely right" was appropriate but I've been trained reading so many AI responses to roll my eyes at that.

ternus

about 1 month ago

The saving grace was that it was followed by a single hyphen, not an em-dash.

about 1 month ago

That “something specific” would be to invent some magic so you can get the advantages of the system without having an entire team to back it up!

Thank you for the thoughtful response!

dijit

about 1 month ago

1 reply

Doesn't perforce have a VFS that works on Windows?

I think it was made by Microsoft; https://github.com/microsoft/p4vfs

[1] https://help.perforce.com/helix-core/server-apps/p4vfs/curre...

about 1 month ago

There is also an identically named VFS, this time from the Perforce company itself [1]

kevincox

about 1 month ago

I don't think it is very "magic".

The VFS gets you a few main benefits.

1. You can lazy download and checkout data as you need it. Assuming your build system is sufficiently realized to hide the latency this will save a lot of time as long as accessed data is significantly less then used data (say <80%).

2. You don't need to scan the filesystem to see what has changed. So things like commits, status checks and even builds checking for changes can target just what has actually changed.

Neither of these are particularly complex. The hardest part is integrating with the VCS and build system to take advantage of the change tracking. Git has some support for this (see fsmonitor) build I'm not aware of any build systems that do. (But you still get a lot of benefits without that.)

Ericson2314

about 1 month ago

2 replies

The headline times are a bit ridiculous. Are they trying to turn https://github.com/facebook/sapling/blob/main/eden/fs/docs/O... or some git fuse thing into a product?

zokier

about 1 month ago

4 replies

Well they also claim to be able to cache build steps somehow build-system independently.

> As the build runs, any step that exactly matches a prior record is skipped and the results are automatically reused

> SourceFS delivers the performance gains of modern build systems like Bazel or Buck2 – while also accelerating checkouts – all without requiring any migration.

Which sounds way too good to be true.

fukka42

about 1 month ago

1 reply

Seems viable if you can wrap each build stap with a start/stop signal.

At the start snapshot the filesystem. Record all files read & written during the step.

Then when this step runs again with the same inputs you can apply the diff from last time.

Some magic to automatically hook into processes and doing this automatically seems possible.

bananaquant

about 1 month ago

I think I got the magic part. You can store all build system binaries in the VFS itself. When any binary gets executed, VFS can return a small sham binary instead that just checks command line arguments, if they match, checks the inputs, and if they match, applies the previous output. If there is any mismatch, it can execute the original binary as usual and make the new output. Easy and no process hacking necessary.

about 1 month ago

1 reply

Yeah, I agree. This part is hand waved away without any technical description of how they manage to pull this off since knowing what is even a build step and what dependencies and outputs are are only possible at the process level (to disambiguate multi threaded builds). And then there’s build steps that have side effects which come up a lot with CMake+ninja.

rcxdude

about 1 month ago

2 replies

A fuse filesystem can get information about the thread performing the file access: https://man.openbsd.org/fuse_get_context.3

So they could in principle get a full list of dependencies of each build step. Though I'm not sure how they would skip those steps without having an interposer in the build system to shortcut it.

mook

about 1 month ago

1 reply

Didn't tup do something like that? https://gittup.org/tup/index.html Haven't looked at it in a while, no idea if it got adoption.

But initially the article sounded like it was describing a mix of tup and Microsoft's git vfs (https://github.com/microsoft/VFSForGit) mushed together. But doing that by itself is probably a pile of work already.

about 1 month ago

Yes, you are correct - SourceFS also caches and replays build steps in a generic way. It works surprisingly well, to the point where it’s hard to believe until you actually see it in action (here is a short demo video, but it probably isn't the best way to showcase it: https://youtu.be/NwBGY9ZhuWc?t=76 ).

We intentionally kept the blog post light on implementation details - partly to make it accessible to a broader audience, and partly because we will be posting gradually some more details. Sounds like build caching/replay is high on the desired blogpost list - ack :-).

The build-system integration used here was a one-line change in the Android build tree. That said, you’re right - deeper integration with the build system could push the numbers even further, and that’s something we’re actively exploring.

about 1 month ago

Yeah that’s what I meant. I bet you the build must be invoked through a wrapper script that interposes all executables launched within the product tree. Complicated but I think it could work. Skipping steps correctly is the hard part but maybe you do that in terms of knowing somehow the files that will be accessed ahead of time by that processes and then skipping the launch and materializing the output (they also mention they have to run it once in a sandbox to detect the dependencies). But still, side effects in build systems seem difficult to account for correctly; I bet you that’s why it’s a “contact us” kind of product - there’s work needed to make sure it actually works on your project.

about 1 month ago

I used to use a python program called ‘fabricate’ which did this. If you track every file a compiler opens, then id the same compiler is run with the same flags, and no input changed, you can just drop a cached copy of the outputs in place.

I’m actually disappointed this type of thing never caught on, it’s fairly easy on Linux to track every file a program accesses, so why do I need to write dependency lists?

MangoToupe

about 1 month ago

You could manage this with a deterministic vm, cf antithesis.

about 1 month ago

It seems like that plus some build output caching?

DuckConference

about 1 month ago

1 reply

Their performance claims are quite a bit ahead of the distributed android build systems that I've used, I'm curious what the secret sauce is.

cogman10

about 1 month ago

1 reply

Is it going to be anything more than just a fancier ccache?

about 1 month ago

2 replies

It’s definitely not ccache as they cover that under compiler wrapper. This works for Android because a good chunk of the tree is probably dead code for a single build (device drivers and whatnot). It’s unclear how they benchmark - they probably include checkout time of the codebase which artificially inflates the cost of the build (you only checkout once). It’s a virtual filesystem like what Facebook has open sourced although they claim to also do build caching without needing a dedicated build system that is aware of this and that part feels very novel

refulgentis

about 1 month ago

1 reply

Re: including checkout, it’s extremely unlikely. source: worked on Android for 7 years, 2 hr build time tracks to build time after checkout on 128 core AMD machine; checkout was O(hour), leaving only an hour for build if that was the case.

about 1 month ago

1 reply

Obviously this is the best-case, hyper-optimized scenario and we were careful not to inflate the numbers.

The machine running SourceFS was a c4d-standard-16, and if I remember correctly, the results were very similar on an equivalent 8-vCPU setup.

As mentioned in the blog post, the results were 51 seconds for a full Android 16 checkout (repo init + repo sync) and ~15 minutes for a clean build (make) of the same codebase. Note that this run was mostly replay - over 99 % of the build steps were served from cache.

about 1 month ago

Do you have any technical blog post how the filesystem is intercepting and caching build steps? This seems like a non-obvious development. The blog alludes to a sandbox step which I’m assuming is for establishing the graph somehow but it’s not obvious to understand where the pitfalls are (eg what if I install some system library - does this interception recognize when system libraries or tools have changed, what if the build description changes slightly, how does the invalidation work etc). Basically, it’s a bold claim to be able to deliver Blaze-like features without requiring any changes to the build system.

ndesaulniers

about 1 month ago

> This works for Android because a good chunk of the tree is probably dead code for a single build (device drivers and whatnot)

Device drivers would exist in kernel sources, not the AOSP tree.

about 1 month ago

1 reply

Meh, content marketing for a commercial biz. There are no interesting technical details here.

I was a build engineer in a previous life. Not for Android apps, but some of the low-effort, high-value tricks I used involved:

* Do your building in a tmpfs if you have the spare RAM and your build (or parts of it) can fit there.

* Don't copy around large files if you can use symlinks, hardlinks, or reflinks instead.

* If you don't care about crash resiliency during the build phase (and you normally should not, each build should be done in a brand-new pristine reproducible environment that can be thrown away), save useless I/O via libeatmydata and similar tools.

* Cross-compilers are much faster than emulation for a native compiler, but there is a greater chance of missing some crucial piece of configuration and silently ending up with a broken artifact. Choose wisely.

The high-value high-effort parts are ruthlessly optimizing your build system and caching intermediate build artifacts that rarely change.

about 1 month ago

2 replies

That’s all basic stuff, and none of it solves what this product claims to.

about 1 month ago

We hear you on the “we want more technical blogs” part - they’ll be coming once we get a breather. We kept this first post high-level to reach a broader audience. Thanks for reading!

about 1 month ago

Not as basic as you seem to think.

I was brought into a team of 70-ish engineers working across 4-5 products. Big enterprise products written by very bright programmers. But build systems and infrastructure were not their core competency. Their flagship application took 6 hours to build when I was hired. I got it down to 30-45 minutes using a combination of the techniques above and a revamped build infrastructure. When I finally left that position, the build was much more modular, so you could rebuild a small part of it and glue it into a bunch of existing artifacts and have a final product in just a few minutes.

vzaliva

about 1 month ago

2 replies

It sounds from the page that it is Android-source-code specific. Why? Could this work with any source code base?

rs186

about 1 month ago

1 reply

I think the page itself answers your question pretty well.

about 1 month ago

I posted a longer answer to a similar question above, if you're interested. Thanks!

everlier

about 1 month ago

If my understanding is correct, this only makes sense for codebases that do not fit in memory of a largest build box an organisation can run

about 1 month ago

1 reply

The world desperately needs a good open source VFS that supports Windows, macOS, and Linux. Waaaaay too many companies have independently reinvented this wheel. Someone just needs to do it once, open source it, and then we can all move on.

about 1 month ago

1 reply

This. Such a product also solves some AI problems by matting you version very large amounts of training data in a VCS like git, which can then be farmed out for distributed unit testing.

about 1 month ago

HuggingFace bought XetHub which is really cool. It’s built for massive blobs of weight data. So it’s not a general purpose VCS VFS. The world still needs the latter.

I’d be pretty happy if Git died and it was replacing with a full Sapling implementation. Git is awful so that’d be great. Sigh.

_1tan

about 1 month ago

1 reply

I want this but self hosted/integrated into our CI (Gitlab in our case).

about 1 month ago

Please fill in this form: https://www.source.dev/demo . We’re prioritizing cloud deployments but are keen to hear about your use case and see what we can do.

jeffrallen

about 1 month ago

1 reply

Tldr: your build system is so f'd that you have gigs of unused source and hundreds of repeated executions of the same build step. They can fix that. Or, you could, I dunno, fix your build?

about 1 month ago

2 replies

You could just have a mono-repo with a large amount of assets that aren't always relevant to pull.

Incremental builds and diff only pulls are not enough in a modern workflow. You either need to keep a fleet of warm builders or you need to store and sync the previous build state to fresh machines.

Games and I'm sure many other types of apps fall into this category of long builds, large assets, and lots of intermediate build files. You don't even need multiple apps in a repo to hit this problem. There's no simple off the shelf solution.

Dylan16807

about 1 month ago

1 reply

For a large amount of sometimes-relevant assets, is mapping them over NFS a bad solution? SourceFS also gets them across the network on demand, right?

And a fleet of warm builders seems pretty reasonable at that scale.

SourceFS sounds useful for extra smart caching but some of these problems do sound like they're just bad fixable configuration.

about 1 month ago

> And a fleet of warm builders seems pretty reasonable at that scale.

It's actually pretty hard. The more builders you have the older the workspace gets and scaling up or cycling machines causes the next builds to be super slow. Game engines end up making central intermediate asset caches like Unreal's UBA or Unity's Cache Server.

[1] https://technology.riotgames.com/news/supercharging-data-del...

about 1 month ago

Yes, games are a common case where your repo can be very large but building your code only requires access to a small subset of it.

For example, the League of Legends source repo is millions of files and hundreds of GB in size, because it includes things like game assets, vendored compiler toolchains for all of our target platforms, etc. But to compile the game code, only about 15,000 files and 600MB of data are needed from the repo.

That means 99% of the repo is not needed at all for building the code, and that is why we are seeing a lot of success using VFS-based tech like the one described in this blog. In this case, we built our own virtual filesystem for source code based on our existing content-defined patching tech (which we wrote about a while ago [1]). It's similar to Meta's EdenFS in that we built it on top of the ProjFS API on Windows and NFSv3 on macOS and Linux. We can mount a view into the multimillion-file repo in 3 seconds, and file data (which is compressed and deduplicated and served through a CDN) is downloaded transparently when a process requests it. We use a normal caching build system to actually run the build, in our case FASTBuild.

I recently timed it, and I can go from having nothing at all on disk to having locally built versions of the League of Legends game client and server in 20 seconds on a 32-core machine. This is with 100% cache hits, similar to the build timings mentioned in the article.

about 1 month ago

1 reply

Hey everyone. I’m Serban, co-founder of Source.dev. Thanks for the upvotes and thoughtful discussion. I’ll reply to as many comments as I can. Nothing means more to an early-stage team than seeing we’re building something people truly value - thanks from all of us at Source.dev!

View full discussion on Hacker News

about 1 month ago

While I’m sure it’s much more advanced, out of interest is this similar to the Python tool ‘fabricate’, which would use strace to track all files a program read, and wrote?

sudahtigabulan

about 1 month ago

This reminds me of ClearCase and its MVFS.

Builds were audited by somehow intercepting things like open(2) and getenv(3) invoked by a compiler or similar tool, and each produced object had an associated record listing the full path to the tool that produced it, its accurate dependencies (exact versions), and environment variables that were actually used. Anything that could affect the reproducibility was captured.

If an object was about to be built with the exact same circumstances as those in an existing record, the old object was reused, or "winked-in", as they called it.

It also provided versioning at filesystem level, so one could write something like file.c@@/trunk/branch/subbranch/3 and use it with any program without having to run a VCS client. The version part of the "filename" was seen as regular subdirectories, so you could autocomplete it even with ancient shells (I used it on Solaris).

ctoth

about 1 month ago

Once builds are "fast enough," there's no business case for the painful work of making the codebase comprehensible.

We're going to 1 billion LoC codebases and there's nothing stopping us!

yencabulator

about 1 month ago

Vagueposts from the marketing department are not appreciated.

MarsIronPI

about 1 month ago

> Fast builds are what truly makes a difference to developer productivity. With SourceFS builds complete over 9x faster on a regular developer machine. This sets a new standard as it enables developers to get their sword fighting time back and speeds-up the lengthy feedback loop on CI pipelines.

Objection! Long build times are better for sword-fighting time. The longer it takes, the more sword-fighting we have time for!

ID: 45668160Type: storyLast synced: 11/20/2025, 5:30:06 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN

Last activity about 1 month agoPosted Oct 22, 2025 at 8:39 AM EDT

Sourcefs: a 2h+ Android Build Becomes a 15m Task with a Virtual Filesystem

cdesai

154 points

62 comments

Mood

skeptical

Sentiment

mixed

Discussion Activity

Very active discussion

First comment

Peak period

Day 1

Avg / period

20.7

Comment distribution62 data points

Loading chart...

Based on 62 loaded comments

Key moments

01Story posted
Oct 22, 2025 at 8:39 AM EDT
about 1 month ago
Step 01
02First comment
Oct 22, 2025 at 11:36 AM EDT
3h after posting
Step 02
03Peak activity
59 comments in Day 1
Hottest window of the conversation
Step 03
04Latest activity
Oct 27, 2025 at 2:31 PM EDT
about 1 month ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (62 comments)

Showing 62 comments

theossuary

about 1 month ago

1 reply

Why tf does an electric vehicle need 500m+ lines of code

jeffbee

about 1 month ago

1 reply

Some people actually write tests.

about 1 month ago

1 reply

We actually picked a fairly conservative number - there are even larger automotive codebases today.

For example, Mercedes’ MB.OS: “is powered by more than 650 million lines of code” - see: https://www.linkedin.com/pulse/behind-scenes-mbos-developmen...

menaerus

about 1 month ago

650M LoC is certainly not a single codebase that you can "checkout" and "build". Also, the figure is a little bit hard to believe.

api

about 1 month ago

1 reply

Could you just do the build in /dev/shm?

about 1 month ago

No. `/dev/shm` would just be a build in `tmpfs`.

Though from what I gather form the story, part of the spedup comes from how android composes their build stages.

I.e. speeding up by not downloading everything only helps if you don't need everything you download. And adds up when you download multiple times.

I'm not sure they can actually provide a speedup in a tight developer cycle with a local git checkout and a good build system.

about 1 month ago

2 replies

While it looks like at least some of the team are ex-googlers, this isn't the srcfs we know from piper (Google internal tools).

Looks like it's similar in some ways. But they also don't tell too much and even the self-hosting variant is "Talk to us" pricing :/

about 1 month ago

1 reply

WDYM this seems very familiar. At commit deadbeef I don't need to materialize the full tree to build some subcomponent of the monorepo. Did I miss something?

about 1 month ago

Oh yea, this is "srcfs the idea" but not "srcfs the project".

I.e. this isn't something battel tested for hundreds of thousands of developers 24/7 over the last years. But a simple commercial product sold by people that liked what they used.

about 1 month ago

3 replies

Google or Meta needs to open source their magic VFSes. Maybe Meta is closest with EdenFS.

mattnewton

about 1 month ago

1 reply

I have thought about this, but also wondered if it would be as magic without the highly paid team of fantastic SRE and maintainers, and the ridiculous amount of disk and compute available to them.

about 1 month ago

1 reply

I imagine it would be as magic as blaze vs bazel out in the wild. That is, you need still someone(s) to do a ton of hard work to make it work right but when it does you do get the magic.

about 1 month ago

2 replies

You’re absolutely right - SrcFS and EdenFS were inspirations for SourceFS.

Mattwmaster58

about 1 month ago

1 reply

It's a shame that AI is ruining certain phrases, the "You’re absolutely right" was appropriate but I've been trained reading so many AI responses to roll my eyes at that.

ternus

about 1 month ago

The saving grace was that it was followed by a single hyphen, not an em-dash.

about 1 month ago

That “something specific” would be to invent some magic so you can get the advantages of the system without having an entire team to back it up!

Thank you for the thoughtful response!

dijit

about 1 month ago

1 reply

Doesn't perforce have a VFS that works on Windows?

I think it was made by Microsoft; https://github.com/microsoft/p4vfs

[1] https://help.perforce.com/helix-core/server-apps/p4vfs/curre...

about 1 month ago

There is also an identically named VFS, this time from the Perforce company itself [1]

kevincox

about 1 month ago

I don't think it is very "magic".

The VFS gets you a few main benefits.

2. You don't need to scan the filesystem to see what has changed. So things like commits, status checks and even builds checking for changes can target just what has actually changed.

Ericson2314

about 1 month ago

2 replies

The headline times are a bit ridiculous. Are they trying to turn https://github.com/facebook/sapling/blob/main/eden/fs/docs/O... or some git fuse thing into a product?

zokier

about 1 month ago

4 replies

Well they also claim to be able to cache build steps somehow build-system independently.

> As the build runs, any step that exactly matches a prior record is skipped and the results are automatically reused

> SourceFS delivers the performance gains of modern build systems like Bazel or Buck2 – while also accelerating checkouts – all without requiring any migration.

Which sounds way too good to be true.

fukka42

about 1 month ago

1 reply

Seems viable if you can wrap each build stap with a start/stop signal.

At the start snapshot the filesystem. Record all files read & written during the step.

Then when this step runs again with the same inputs you can apply the diff from last time.

Some magic to automatically hook into processes and doing this automatically seems possible.

bananaquant

about 1 month ago

about 1 month ago

1 reply

rcxdude

about 1 month ago

2 replies

A fuse filesystem can get information about the thread performing the file access: https://man.openbsd.org/fuse_get_context.3

So they could in principle get a full list of dependencies of each build step. Though I'm not sure how they would skip those steps without having an interposer in the build system to shortcut it.

mook

about 1 month ago

1 reply

Didn't tup do something like that? https://gittup.org/tup/index.html Haven't looked at it in a while, no idea if it got adoption.

about 1 month ago

about 1 month ago

about 1 month ago

I’m actually disappointed this type of thing never caught on, it’s fairly easy on Linux to track every file a program accesses, so why do I need to write dependency lists?

MangoToupe

about 1 month ago

You could manage this with a deterministic vm, cf antithesis.

about 1 month ago

It seems like that plus some build output caching?

DuckConference

about 1 month ago

1 reply

Their performance claims are quite a bit ahead of the distributed android build systems that I've used, I'm curious what the secret sauce is.

cogman10

about 1 month ago

1 reply

Is it going to be anything more than just a fancier ccache?

about 1 month ago

2 replies

refulgentis

about 1 month ago

1 reply

about 1 month ago

1 reply

Obviously this is the best-case, hyper-optimized scenario and we were careful not to inflate the numbers.

The machine running SourceFS was a c4d-standard-16, and if I remember correctly, the results were very similar on an equivalent 8-vCPU setup.

about 1 month ago

ndesaulniers

about 1 month ago

> This works for Android because a good chunk of the tree is probably dead code for a single build (device drivers and whatnot)

Device drivers would exist in kernel sources, not the AOSP tree.

about 1 month ago

1 reply

Meh, content marketing for a commercial biz. There are no interesting technical details here.

I was a build engineer in a previous life. Not for Android apps, but some of the low-effort, high-value tricks I used involved:

* Do your building in a tmpfs if you have the spare RAM and your build (or parts of it) can fit there.

* Don't copy around large files if you can use symlinks, hardlinks, or reflinks instead.

The high-value high-effort parts are ruthlessly optimizing your build system and caching intermediate build artifacts that rarely change.

about 1 month ago

2 replies

That’s all basic stuff, and none of it solves what this product claims to.

about 1 month ago

We hear you on the “we want more technical blogs” part - they’ll be coming once we get a breather. We kept this first post high-level to reach a broader audience. Thanks for reading!

about 1 month ago

Not as basic as you seem to think.

vzaliva

about 1 month ago

2 replies

It sounds from the page that it is Android-source-code specific. Why? Could this work with any source code base?

rs186

about 1 month ago

1 reply

I think the page itself answers your question pretty well.

about 1 month ago

I posted a longer answer to a similar question above, if you're interested. Thanks!

everlier

about 1 month ago

If my understanding is correct, this only makes sense for codebases that do not fit in memory of a largest build box an organisation can run

about 1 month ago

1 reply

about 1 month ago

1 reply

This. Such a product also solves some AI problems by matting you version very large amounts of training data in a VCS like git, which can then be farmed out for distributed unit testing.

about 1 month ago

HuggingFace bought XetHub which is really cool. It’s built for massive blobs of weight data. So it’s not a general purpose VCS VFS. The world still needs the latter.

I’d be pretty happy if Git died and it was replacing with a full Sapling implementation. Git is awful so that’d be great. Sigh.

_1tan

about 1 month ago

1 reply

I want this but self hosted/integrated into our CI (Gitlab in our case).

about 1 month ago

Please fill in this form: https://www.source.dev/demo . We’re prioritizing cloud deployments but are keen to hear about your use case and see what we can do.

jeffrallen

about 1 month ago

1 reply

Tldr: your build system is so f'd that you have gigs of unused source and hundreds of repeated executions of the same build step. They can fix that. Or, you could, I dunno, fix your build?

about 1 month ago

2 replies

You could just have a mono-repo with a large amount of assets that aren't always relevant to pull.

Incremental builds and diff only pulls are not enough in a modern workflow. You either need to keep a fleet of warm builders or you need to store and sync the previous build state to fresh machines.

Dylan16807

about 1 month ago

1 reply

For a large amount of sometimes-relevant assets, is mapping them over NFS a bad solution? SourceFS also gets them across the network on demand, right?

And a fleet of warm builders seems pretty reasonable at that scale.

SourceFS sounds useful for extra smart caching but some of these problems do sound like they're just bad fixable configuration.

about 1 month ago

> And a fleet of warm builders seems pretty reasonable at that scale.

[1] https://technology.riotgames.com/news/supercharging-data-del...

about 1 month ago

Yes, games are a common case where your repo can be very large but building your code only requires access to a small subset of it.

about 1 month ago

1 reply