Reproducible C++ Builds by Logging Git Hashes
Postedabout 2 months agoActiveabout 1 month ago
jgarby.ukTechstory
calmpositive
Debate
0/100
C++Build SystemsReproducibilityGit
Key topics
C++
Build Systems
Reproducibility
Git
The author shares a simple approach to achieving reproducible C++ builds by logging Git hashes, sparking a brief discussion on the implementation.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
N/A
Peak period
17
120-132h
Avg / period
9.7
Comment distribution29 data points
Loading chart...
Based on 29 loaded comments
Key moments
- 01Story posted
Nov 14, 2025 at 4:45 AM EST
about 2 months ago
Step 01 - 02First comment
Nov 14, 2025 at 4:45 AM EST
0s after posting
Step 02 - 03Peak activity
17 comments in 120-132h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 19, 2025 at 7:43 PM EST
about 1 month ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45925394Type: storyLast synced: 11/20/2025, 2:12:10 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I absolutely can't imagine not using some kind of tool like this. Feels as vital as VCS to me now.
I'll also say I have absolutely 0 regrets about moving from Nix to Mise. All the common tools we want are available, it's especially easy to install tools from pip or npm and have the environments automanaged. The docs are infinity times better. And the speed of install and shell sourcing is, you guessed it, much better. Initial setup and install is also fantastically easier. I understand the ideology behind Nix, and if I were working on projects where some of our tools weren't pre-packageable or had weird conflicting runtime lib problems I'd get it, but basically everything these days has prebuilt static binaries available.
If the repo isn't dirty, then the hash you get excludes that part:
If you're using lightweight tags (the default) and not annotated tags (with messages and signatures and etc) you may want to add `--tags` because otherwise it'll skip over any lightweight tags.The other nice thing about this is that, if the repo is not -dirty, you can use the output from `git describe` in other git commands to reference that commit:
Also, if you don't feel ready to commit to tagging your repository you can start with the `--always` flag which falls back to just the short commit hash.
The article's script isn't far from `git describe --always --dirty`, which can be a good place to start, and then it gets better as you start tagging.
had to be said
(Fyi I just used something like the solution from the article, with the hash embedded in the binary image to be burned to ROM masks. The gaps in toolchain versioning and not building with dirty checkouts can be managed with self discipline /internal checks)
You need to control every single library header version you are using outside your source like stdlibs, os headers, third party, and have a strategy to deal with rand/datetime variables that can be part of the binary.
Eliminating nondeterminism from your builds might require some thinking, there are a number of places this can creep in (timestamps, random numbers, nondeterministic execution, ...). A good package manager can at least give you tooling to validate that you have eliminated nondeterminism (e.g. `guix build --check ...`).
Once you control the entire environment and your build is reproducible in principal, you might still encounter some fun issues, like "time traps". Guix has a great blog post about some of these issues and how they mitigate them: https://guix.gnu.org/en/blog/2024/adventures-on-the-quest-fo...
Here's a talk from 2024: https://debconf24.debconf.org/talks/18-reproducible-builds-t...
Several distros are above the 90% mark of all packages being byte-for-byte reproducible, and one or two have hit the 99% mark.
I do this git tags thing with my projects - it helps immensely if the end user can hover over the company logo and get a tooltip with the current version, git tag and hash, and any other relevant information to the build.
Then, if I need to triage something specific, I un-archive the virtualized build environment, and everything that was there in the original build is still there.
This is a very handy method for keeping large code bases under control, and has been very effective over the years in going back to triage new bugs found, fixing them, and so on.
Guix' full-source bootstrap is pretty enlightening on that topic: https://guix.gnu.org/manual/devel/en/html_node/Full_002dSour...
Just use ClearCase/ClearMake, it's been doing all of this software configuration auditing stuff for you since the 1990s.
Git hashes or tags can help identify what was built: the inputs.
You only need to know that for traceability: when you hold the released outputs, but do not hold (or are not sure you hold) the matching inputs.
If builds are reproducible, the traceability becomes more meaningful.
In the TXR project, have a ./configure option called --build-id. This sets an ID that is appended to the version, which is in the executable. It is nothing by default; not used. It is meant to be useful for people who interact with the code, so they can check what they are running (things can get confusing when you are going back and forth among versions, or making local changes).
If you set the build ID it to the word "git", then it is calculated using:
that's probably what this author should be using. It gives you a meaningful ID that is related to the most recent release tag, and whether the repo was dirty. We are (sadly, only) 20 commits after 302, at a commit whose short hash is 77c99b74e, and the repo is in a modified state.I have it rigged in the Makefile that it actually keeps track of the most recent build ID in a little .build_id file. If the build ID changes relative to what is in that file, the Makefile will force a rebuild of the .o files which incorporate the build ID.
Also, there is no need to be generating dynamic #include material just for this. A simple -Dsymbol=var option in the CFLAGS will define a preprocessor symbol:
It's addressing a distinct problem from "if we rebuild any given version, perhaps some later time, do we even get the same binary?" which is what people usually mean by "reproducible builds".
Your tip that injecting build ids can be done with linker flags without needing to generate header files is a great one.
Passing version info without code generation using linker flags can also be done in other languages & toolchains, e.g. with Go projects, the go linker exposes an -x flag that can be used to set the value of a string variable in a package [1] [2].
A step beyond this could be to explicitly build a feature into your software to help the user report bugs or request support, e.g. user clicks a button and the software dumps its own version info, info about what the user is doing & their machine, packages it up and sends in to your support queue. Doesn't make sense doing this for backend services, but you do see support features like this in PC games to help users easily send high quality bug reports.
[1] https://pkg.go.dev/cmd/link
[2] https://www.digitalocean.com/community/tutorials/using-ldfla...
Which golfs to "traceable" != "reproducible"
Someday, Go programs won't have to do this: https://github.com/golang/go/issues/50603
https://github.com/xrootd/xrootd/blob/master/cmake/XRootDVer...
and also the genversion.sh script at the top of the repo.
I use these plus #cmakedefine and git tags to manage the project version without having to do it via commits.
https://nikhilism.com/post/2020/windows-deterministic-builds... is a good resource on some of the other steps needed. It's... a non-trivial journey :)
12 more comments available on Hacker News