Build Files Are the Best Tool to Represent Software Architecture
Posted3 months agoActive3 months ago
blogsystem5.substack.comTechstory
heatedmixed
Debate
80/100
Software ArchitectureBuild ToolsBazel
Key topics
Software Architecture
Build Tools
Bazel
The article argues that build files are the best way to represent software architecture, but the discussion is divided on whether this is a useful or redundant practice.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4d
Peak period
37
84-96h
Avg / period
8.6
Comment distribution43 data points
Loading chart...
Based on 43 loaded comments
Key moments
- 01Story posted
Oct 3, 2025 at 5:04 PM EDT
3 months ago
Step 01 - 02First comment
Oct 7, 2025 at 7:11 AM EDT
4d after posting
Step 02 - 03Peak activity
37 comments in 84-96h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 11, 2025 at 2:24 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45467751Type: storyLast synced: 11/20/2025, 6:45:47 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://docs.astral.sh/ruff/rules/unused-import/
As I already mentioned in another comment, you have:
Now you are editing or reviewing a change to foobar.py. How can you tell if depending on module2 is conceptually OK? You need to look at baz.py to prove to yourself that the dependency already exists. Or you need to know a priory that it's an OK thing to do.The problem appears when: 1. you have a significant number of people working on the project and 2. one of these "modules" has become too big internally where it's hard to make sense of its internal structure. Having a monorepo makes the problem even more likely. At that point, you'll probably want to start breaking up that gnarly module into pieces so that you can see its structure again, right?
If anything this highlights the failure of languages solving for this themselves. I'm looking at you, C++.
It's no surprise Bazel is a hard sell for Rust, Go, Node, etc. because for those languages/ecosystems Bazel BUILD files are not the best tool to represent software architecture.
And the whole point of the article is to say that import statements do actually _not_ solve this issue, because import statements are at the file level, not at the module level (whatever module means in your mind).
In any case. As I mentioned in the article en passing, other languages _do_ provide similar features to Bazel's build files though, and I explicitly called out Rust as one of them. When you are defining crates and expressing dependencies via Cargo, you are _essentially doing the same_ as what I was describing in the article. Same with Go if you are breaking your code apart into multiple modules
But then we all know that there are some huge repos out there that are just "one module" and you can't make anything out of their internal structure. Hence you start breaking them apart into Crates, Go modules, NPM packages, you name it or... you know, add Bazel and build files. They are the same tool -- and that's why I didn't write Bazel in the title, because I imagined "build files" more generically. I guess I needed to be clearer there.
We already have the tools to enforce these things in many mainstream languages.
Breaking things apart into crates/modules certainly makes sense sometimes, but other times it does not? If you have a monorepo, do you really need multiple modules? And if you don't, does that mean your architecture is difficult to understand? I don't think that tracks at all, so I don't really agree with where you're headed.
> But then we all know that there are some huge repos out there that are just "one module" and you can't make anything out of their internal structure.
There's always some shitty code out there, sure. But I don't like the suggestion that "one module" can't be coherent. It's orthagonal to the architecture. Not everything needs to be made generic and reusable.
> And the whole point of the article is to say that import statements do actually _not_ solve this issue, because import statements are at the file level, not at the module level (whatever module means in your mind).
This is not true for Go, for example. Import statements absolutely do solve this problem in Go. I rarely need to ever look at module files which are in some ways a byproduct of the import statements.
Go imports still work at the Go package level. If you have multiple .go source files in one package, you have the exact same issue I described for Java.
If I'm editing / reviewing a change to pkg1/foo.go, I cannot tell that pkg1 _already_ depends on pkg3. Can I?At work, go list was too slow and depended on a git checkout so we wrote our own import graph parser using the go std lib parser and operate on byte slices of the files we read directly from git. It’s speed of light fast and we can compute go import graphs in parallel from multiple commits to determine what has changed in the graph so we can reduce the scope of what is tested.
I'm not going to say it can be avoided in all cases but modularity, team structure and architecture both system and organisational can avoid this in a lot of cases.
What has changed in the past ~15 years? Many libraries and plugins have their own compilers nowadays. This increases the difficulty of successfully integrating with Bazel. Even projects that feel like they should be able to properly integrate Bazel (like Kubernetes) have removed it from the project as a nuisance.
Back when it was first designed, even compiling code within the same language could be a struggle; I remember going through many iterations of DLL hell back when I was a C++ programmer. This was the "it works on my machine" era. Bazel was nice because you could just say "Download this version of this thing, and give me a BUILD file path where I can reference it." Sometimes you needed to write some Starlark, but mostly not.
But now, many projects have grown in scale and complexity and they want to have their own automated passes. Just as C++ libraries needed special library wrappers for autotools within Bazel, now you often need to write multiple library compiler/automation wrappers yourself in any context. And then you'll find that Bazel's assumptions don't match the underlying code's. For example, my work's Go codebase compiles just fine with a standard Go compiler, but gazelle pukes because (IIRC) one of our third-party codegen tools outputs files with multiple packages to the same directory. When Etsy moved its Java codebase to Bazel, they needed to do some heavy refactoring because Bazel identified dependency loops and refused to compile the project, even though it worked just fine with javac. You can always push up your monocle and derisively say "you shouldn't have multiple packages per directory! you shouldn't have dependency loops!", but you should also have a compiler that can run your code just like the underlying language without needing to influence it at all.
That's why most engineers just need command runners. All of these languages and libraries are already designed to successfully run in their own contexts. You just need something to kick off the build with repeatable arguments across machines.
It also helps in a mono-repo to help control access to packages. BAZEL makes it so you can't import packages that aren't visible to your package.
I've always thought of "architecture" as a high-level description of run-time behavior, not a set of compile-time dependency constraints.
But as I already said in two other comments in this discussion, ProjectReference would be equivalent to what I'm describing in the article, just using language-specific tooling. If you are breaking your solution into various projects and keeping them separate with cross-references among them, you are doing exactly what I was describing already.
so the thing is that a BUILD file doesn't define the build graph, it approximates it -- the build graph is always defined by language-specific tooling and specifications
it's fine that the BUILD file is an approximation! that's as good as you can do, if you want to try to model dep relationships between heterogeneous languages
so when we're talking about the dep graph, "using language-specific tooling" isn't a detail you can brush aside, it's a core requirement for correctness, really
And it supposed to show 'architecture'?
Wow. I am happy that I never started working with Java. That is terrible.
When you do this in the wrong order, you end up with very poorly laid out concepts from a code organization standpoint, which is why vagaries like this needed to be written:
* https://google.github.io/styleguide/go/best-practices.html#p...
* https://matttproud.com/blog/posts/go-package-centricity.html
In languages that operate on a flat namespace of compilable units (e.g., C++ or Java), build target sizing and grouping in Bazel (and its relatives) largely doesn't matter (from a naming the namespace and namespace findability+ergonomics perspective). But the moment Bazel starts interfacing with a language that has strict organization and namespacing concepts, this can get rather hairy. The flat namespace practice with Bazel has (IMO) led to code organization brain-rot:
> Oh, I created another small feature; here, let me place it in another (microscopic) build target (without thinking about how my users will access the symbols, locate the namespace, or have an easy way of finding it).
— — —
Note: The above is not a critique of Bazel and such. More of a meta-comment on common mispractices I have seen in the wild. The build system can be very powerful for certain types of things (e.g., FFI dependency preparation and using Aspects as a form of meta-building and -programming).
And that's exactly what I was arguing against in the article! I've seen this happen a few times already (in Java and TypeScript specifically) where Bazel's fine-grained target definitions are pushed as "best practice" and everybody ends up hating the results, for good reasons.
There are _different_ ways in which one can organize the Bazel build rules that go against those best practices (like the 1:1:1 rule for Java), and I think you can end up with something that better maps to first principles / or what native built tooling does.
What are some of those good reasons (assuming they differ from GP's)?
I don't have much experience with Bazel aside from setting up a simple local workspace and following the tutorial.
* The build files are unreadable. If targets don't mean anything to a human, updates to build files become pure toil (and is when devs ask for build files to be auto-generated from source).
* IDE integrations (particularly via the IntelliJ Bazel plugin) become slower because generating metadata for those targets takes time.
* Binary debugging is slower because the C/C++ rules generate one intermediate .so file per target and GDB/LLDB take a long time to load those dependencies vs. a smaller set of deps.
* Certain Java operations can be slower. In the case of Java, the rules generate one intermediate JAR file per target, which has a direct impact on CLASSPATH length and that may matter when you do introspection. This tends to matter for tests (not so much for prod where you use a deploy JAR which collapses all intermediate JARs into just one).
My intuition was wrong, my naive understanding was that:
* Non-human intermediate targets would either be namespaced and available only in that namespace, or could be marked as hidden, and not clutter auto-completion
* IDE integrations would benefit, since they only have to deal with Bazel and not Bazel + cargo/go/Makefile/CMake/etc
* I thought C/C++ rules would generate .o files, and only the final cc_shared_library would produce an .so file
* Similar for .jar files
I guess my ideal build system has yet to be built. :(
This is actually possible but you need the new JetBrains-owned Bazel plugin _and_ you need to leverage visibility rules. The latter are something that's unique to Bazel (none of the other language-specific package managers I've touched upon in these replies offers it) and are even harder to explain to people somehow... because these only start making sense when you pass a certain codebase size / complexity.
> * I thought C/C++ rules would generate .o files, and only the final cc_shared_library would produce an .so file > * Similar for .jar files
These are possible too! Modern Bazel has finally pushed out all language-specific logic out of the core and into Starlark rules (and Buck2 has been doing this from the ground up). There is nothing preventing you from crafting your own build rules that behave in these specific ways.
In any case... as for dynamic libraries per target, I do not think what I described earlier is the default behavior in Bazel (we explicitly enable dynamic libraries to make remote caching more efficient), so maybe you can get what you want already by being careful with cc_shared_library and/or being careful about tagging individual cc_libraries as static/dynamic.
For Java, I've been tempted to write custom rules that do _not_ generate intermediate JARs at all. It's quite a bit of work though, so I haven't, but it could be done. BTW I'll actually be describing this problem in a BazelCon 2025 lighting talk :)
are there really people saying that "giving each package its own directory" is in any way optional?? it is literally part of the language spec, what on earth would make anyone think otherwise??
edit: ok so bazel folks are just on a completely alternative timeline it seems
https://github.com/bazel-contrib/rules_go?tab=readme-ov-file...
so bazel doesn't support go, gotchaAll of this makes it paramount for developers of Go tools to use a first-party package loading library like package packages (https://pkg.go.dev/golang.org/x/tools/go/packages), which can ameliorate over this problem through the specification of a GOPACKAGESDRIVER environment variable to support alternative build systems and import path layouts (the worst thing someone can do is attempt to reverse engineer how package loading works itself versus delegating it to a library like this).
this is wild, the code literally defines dependency relationships at the mechanical level, it's the source of truth
of course you can't derive dep graphs from source code via simple file-based grep analysis, but that's obvious? you need to use language-specific semantically-aware tools to do that. if the author believes otherwise that's their mistake
Preventing your backend’s web-route handler functions from directly instantiating a database client, forcing the code to instead access the db via a logic or service layer, preserves separation of concerns as the codebase grows. It’s obvious to human software engineers and usually is institutional knowledge, and everyone hates having to tell someone in a code review that they broke a rule that “everyone” knows..
Instead, use tooling to enforce these separations. That lets both new employees and agents autonomously work without making a mess and without other humans “in the loop” informing them not to break rules that aren’t written down anywhere - because when you make tools, now the rules are written down.
LLMs can quickly create scripts in your language of choice, that walk the AST of your code before you commit. They can check for violations of rules as arbitrary as you like. Put that little script in your codebase, run it every CI run, and it’ll keep paying dividends.
It’s like linting for architecture.