Rustgpt: a Pure-Rust Transformer LLM Built From Scratch
Posted4 months agoActive4 months ago
github.comTechstoryHigh profile
excitedpositive
Debate
20/100
RustLarge Language ModelsTransformerMachine Learning
Key topics
Rust
Large Language Models
Transformer
Machine Learning
The author has built a pure-Rust transformer LLM from scratch, sparking excitement and discussion around its simplicity, performance, and potential applications.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
20m
Peak period
146
0-12h
Avg / period
40
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 15, 2025 at 5:47 AM EDT
4 months ago
Step 01 - 02First comment
Sep 15, 2025 at 6:07 AM EDT
20m after posting
Step 02 - 03Peak activity
146 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 21, 2025 at 3:01 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45247890Type: storyLast synced: 11/20/2025, 7:55:16 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Looking good!
yep, still looks relatively good.
I'm over-simplifying a few things here:
1. Semver has special treatment of 0.x versions. For these crates the minor version depends like the major version and the patch version behaves like the minor version. So technically you could have v0.1 and v0.2 of a crate in the same crate graph.
2. I'm assuming all dependencies are specified "the default way", ie. as just a number. When a dependency looks like "1.3", cargo actually treats this as "^1.3", ie. the version must be at least 1.3, but can be any semver compatible version (eg. 1.4). When you specify an exact dependency like "=1.3" instead, the rules above still apply (you still can't have 1.3 and 1.4 in the same crate graph) but cargo will error if no version can be found that satisfies all constraints, instead of just picking a version that's compatible with all dependents.
What if devs don't do a good job of versioning and there is a real incompatibility between 0.9.3 and 0.9.4? Surely there's some way to actually require an exact version?
> So there's no difference at all between "0", "0.9" and "0.9.3" in cargo.toml
No, there is a difference, in particular, they all specify different minimum bounds.
The trick is that these are using the ^ operator to match, which means that the version "0.9.3" will satisfy all of those constraints, and so Cargo will select 0.9.3 (the latest version at the time I write this comment) as the one version to satisfy all of them.
Cargo will only select multiple versions when it's not compatible, that is, if you had something like "1.0.0" and "0.9.0".
> Surely there's some way to actually require an exact version?
Yes, you'd have to use `=`, like `=0.9.3`. This is heavily discouraged because it would lead to a proliferation of duplication in dependency versions, which aren't necessarily unless you are trying to avoid some sort of specific bugfix. This is sometimes done in applications, but basically should never be done in libraries.
Semver specifies versions. These are the x.y.z (plus other optional stuff) triples you see. Nothing should be complicated there.
Tools that use semver to select versions also define syntax for defining which versions are acceptable. npm calls these “ranges”, cargo calls them “version requirements”, I forget what other tools call them. These are what you actually write in your Cargo.toml or equivalent. These are not defined by the semver specification, but instead, by the tools. They are mostly identical across tools, but not always. Anyway, they often use operators to define the ranges (that’s the name I’m going to use in this post because I think it makes the most sense.) So for example, ‘>3.0.0’ means “any version where x >= 3.” “=3.0.0” means “any version where x is 3, y is 0, and z is 0” which 99% of the time means only one version.
When you write “0.9.3” in a Cargo.toml, you’re writing a range, not a version. When you do not specify an operator, Cargo treats that as if you use the ^ operator. So “0.9.3” is equivalent to “^0.9.3” what does ^ do? It means two things, one if x is 0 and one if x is nonzero. Since “^0.9.3” has x of zero, this range means “any version where x is 0, y is 9, and z is >= 3.” Likewise, “0.9” is equivalent to “^0.9.0” which is “any version where x is 0, y is 9, and z is >=0.”
Putting these two together:
Given that 0.9.3 is a version that has been released, if one package depends on “0.9” and another depends on “0.9.3”, version 0.9.3 satisfies both constraints, and so is selected.If we had “0.8” and “0.7.1”, no version could satisfy both simultaneously, as “y must be 8” and “y must be 7” would conflict. Cargo would give you both versions in this case, whichever y=8 and y=7 versions have the highest z at the time.
It is true that, if the change works on z < 3, you are expanding the possible set of versions a bit, so it's not useless; one could argue that you should only depend on z != 1 if there's a bug you want to make sure that you use the versions past when it works, otherwise, no reason to restrict yourself, but it's not a big deal either way :)
The reason for this is that unless otherwise specified, the ^ operator is used, so "0.9" is actually "^0.9", which then gets translated into the kind of range specifier I showed above.
There are other operators you can use, these are the common ones:
Note that while an exact bound will force that exact version to be used, it still doesn't allow two semver compatible versions of a crate to exist together. For example. If cargo can't find a single version that satisfies all constraints, it will just error.For this reason, if you are writing a library, you should in almost all cases stick to regular semver-compatible dependency specifications.
For binaries, it is more common to want exact control over versions and you don't have downstream consumers for whom your exact constraints would be a nightmare.
Rust projects can really go bananas on dependencies, partly because it's so easy to include them
But most people aren't writing libraries.
Or there's been a cleaning pass done over it.
For just plain text, I really like this one - https://huggingface.co/datasets/roneneldan/TinyStories
Enter prompt: hello Model output: What are eclipses ? Assistant : Eclipses occur when one celestial body moves into the shadow of another </s>
Enter prompt: what are facts Model output: eclipses Assistant : Eclipses Eclipses What What was when What through you ? through you meet through using of What was What tall ? of What thank help you explain using , and do you a patterns you you a patterns through air in tall you help metal metal </s>
Enter prompt: how are mountains formed? Model output: I ? ' I ' : Of : in happy Hi wind in yeast altering it it </s>
the difficulty of including a dependency should be proportional to the risk you're taking on, meaning it shouldn't be as difficult as it in, say, C where every other library is continually reinventing the same 5 utilities, but also not as easy as it is with npm or cargo, because you get insane dependency clutter, and all the related issues like security, build times, etc
how good a build system isn't equivalent of how easy it is include a dependency, while modern languages should have a consistent build system, but having a centralised package repository that anyone freely pull to/from, and having those dependencies freely take on any number of other dependencies is a bad way to handle dependencies
i like how zig does this, and the creator of odin has a whole talk where he basically uses the same arguments as my original comment to reason why odin doesn't have a package manager
Python packages still manage poorly dependencies that are in another lang like C or C++.
In my experience it's just bugs and poor decision making on the maintainers (eg. pytorch dropping support for intel mac, leftpad in node) or on the language and package manager developers side (py2->3, commonjs, esm, go not having a package manager, etc).
Cargo has less friction than pypi and npm. npm has less friction than pypi.
And yet, you just need to compromise one lone, unpaid maintainer to wreck the security of the ecosystem.
Way to go on insulting people on HN. Cargo is literally the reason why people coming to Rust from languages like C++ where the lack of standardized tooling is giant glaring bomb crater that poses burden on people every single time they need to do some basic things (like for example version upgrades).
Example:
https://github.com/facebook/folly/blob/main/build.sh
like the entire point of my comment is that people have misguided criteria for evaluating build systems, and your comment seems to just affirm this?
I think dev_l1x_be's comment is meant to imply that your believe about people having misguided criteria [for evaluation build systems] is itself misguided, and that your favored approach [that the difficulty of including a dependency should be proportional to the risk you're taking on] is also misguided.
That's just like, your opinion, man.
Obviously, I may be an outlier. Some crank who's just smitten by the proposal of spending his time writing code instead of trying to get a dependency (and its sub-dependencies and their sub-dependencies) to build at all (e.g. C/C++) or to have the right version that works with ALL the code that depends on it (e.g. Python).
I.e. I use cargo foremost (by a large margin) for that reason.
I would love to know how many younger readers recognize this classic movie reference.
"It's deliberately shit so that people won't use it unless they really have to."
my mistake :)
Why? Dependency hell is an unsolvable problem. Might as well make it easier to evaluate the tradeoff between dependencies and productivity. You can always arbitrarily ban dependencies.
So put a slim layer of enforcement to enact those policies on top? Who's stopping you from doing that?
[1] https://github.com/astral-sh/uv
The culture that language maintains is rather hostile to maintainable development, easier to just switch to Rust and just write better code by default.
Lack of types, lack of static analysis, lack of ... well, lack of everything Python doesn't provide and fights users on costs too much developer time. It is a net negative to continue pouring time and money into anything Python-based.
The sole exclusion I've seen to my social circle is those working at companies that don't directly do ML, but provide drivers/hardware/supporting software to ML people in academia, and have to try to fix their cursed shit for them.
Also, fwiw, there is no reason why Triton is Python. I dislike Triton for a lot of reasons, but its just a matmul kernel DSL, there is nothing inherent in it that has to be, or benefits from, being Python.... it takes DSL in, outputs shader text out, then has the vendor's API run it (ie, CUDA, ROCm, etc). It, too, would benefit from becoming Rust.
I wish this were broadly true.
But there's too much legacy Python sunk cost for most people though. Just so much inertia behind Python for people to abandon it and try to rebuild an extensive history of ML tooling.
I think ML will fade away from Python eventually but right now it's still everywhere.
If someone wrote a Triton impl that is all Rust instead, that would do a _lot_ of the heavy lifting on switching... most of their hard code is in Triton DSL, not in Python, the Python is all boring code that calls Triton funcs. That changes the argument on cost for a lot of people, but sadly not all.
To say most ML people are using Rust and C couldn’t be further from the truth
People saying "oh those Python libraries are just C/C++ libraries with Python API, every language can have them" have one problem - no other language has them (with such extensive documentation, tutorials etc.)
At least sibling actually mentioned Java.
Use C++ bindings in libtorch or tensorflow. If you actually mean C, and not C++, then you would need a shim wrapper. C++ -> C is pretty easy to do.
Yet it was created for Python. Someone took that effort and did it. No one took that effort in Rust. End of the story of crab's superiority.
Python community is constantly creating new, great, highly usable packages that become de facto industry standards, and maintain old ones for years, creating tutorials, trainings and docs. Commercial vendors ship Python APIs to their proprietary solutions. Whereas Rust community is going through forums and social media telling them that they should use Rust instead, or that they "cheated" because those libraries are really C/C++ libraries (and BTW those should be done in Rust as well, because safety).
It is happening.
It's like people just don't get it. The ML ecosystem in python didn't just spring from the ether. People wanted to interface in python badly, that's why you have all these libraries with substantial code in another language yet development didn't just shift to that language.
If python was fast enough, most would be fine to ditch the C++ backends and have everything in python, but the reverse isn't true. The C++ interface exists, and no-one is using it.
However people are definitely using it, as Android doesn't do Python, neither does ChromeOS.
That's not really a reason to think people are using it for that when things like onnxruntime and executorch exist. In fact, they are very likely not using it for that, if only because the torch runtime is too heavy for distribution on the edge anyway (plus android can run python).
Regardless, that's just inference of existing models (which yes I'm sure happens in other languages), not research and/or development of new models (what /u/airza was concerned about), which is probably 99% in python.
Yes, you can package Python alongside your APK, if you feel like having fun making it compiled with NDK, and running stuff even more slowly in phone ARM chipsets over Dalvik JNI than it already is on desktops.
Also, tons of CAE platforms have Python bindings, so you are "forced" to work on Python. Sometimes the solution is not just "abandoning a language".
If it fits your purpose, knock yourself out, for others that may be reading: uv is great for Python dependency management on development, I still have to test it for deployment :)
I'd say Go is a better alternative if you want to replace python scripting. Less friction and much faster compilation times than Rust.
The low level SIMD stuff was called out to over the c FFI bridge; golang was used for the rest of the program.
There are libraries to write SIMD in Go now, but I think the better fix is being able to autovectorize during the LLVM IR optimization stage, so its available with multiple languages.
I think LLVM has it now, its just not super great yet.
You can always drop into straight assembly if you need to as well. Go's assembler DX is quite nice after you get used to it.
the disease is the cargo cult addiction (which Rust is full of) to micro libraries, not the language that carries 90% of all peer reviewed papers, datasets, and models published in the last decade
every major breakthrough, from AlphaFold to Stable Diffusion, ships with a Python reference implementation because that is the language researchers can read, reproduce, and extend, remove Python and you erase the accumulated, executable knowledge of an entire discipline overnight, enforcing Rust would sabotage the field more than anything
on the topic of uv, it will do more harm than good by enabling and empowering cargo cults on a systemic level
the solution has always been education, teaching juniors to value simplicity, portability and maintainability
It is great for learning on how to program (BASIC replacement), OS scripting tasks as Perl replacement, and embedded scripting in GUI applications.
Additionally understand PYTHONPATH, and don't mess with anything else.
All the other stuff that is supposed to fix Python issues, I never bothered with them.
Thankfully, other languages are starting to also have bindings to the same C and C++ compute libraries.
So I guess what I'm wondering is, are you a python guy, or are you more like me? because for basically any of these tools, python people tell me "tool X solved all my problems" and people from my own cohort tell me "it doesn't really solve anything, it's still a mess".
If you are one of us, then I'm really listening.
As an occasional trainer of scientists: it didn't seem to help my students.
It sadly doesn’t solve stuff like transformer_engine being built with cxx11 ABI and pytorch isn’t by default, leading to missing symbols…
Python dependencies are still janky, but uv is a significant improvement over existing tools in both performance and ergonomics.
But I'd also hesitate to say it "solves all my problems". There's plenty of python problems outside of the core focus of `uv`. For example, I think building a python package for distribution is still awkward and docs are not straightforward (for example, pointing to non-python files which I want to include was fairly annoying to figure out).
I'm about the highest tier of package manager nerd you'll find out there, but despite all that, I've been struggling to create/run/manage venvs out there for ages. Always afraid of installing a pip package or some piece of python-based software (that might muck up Python versions).
I've been semi-friendly with Poetry already, but mostly because it was the best thing around at the time, and a step in the right direction.
uv has truely been a game changer. Try it out!
If something doesn't work or I'm still encountering any kind of error with uv, LLMs have gotten good enough that I can just copy / paste the error and I'm very likely to zero-in on a working solution after a few iterations.
Sometimes it's a bit confusing figuring out how to run open source AI-related python projects, but the combination of uv and iterating on any errors with an LLM has so far been able to resolve all the issues I've experienced.
I mean I would understand that comment in 2010, but in 2025 it's grossly ridiculous.
Also let's keep middle school taunts at home.
That's not my experience and e.g. uv hasn't helped me with that. I believe this is an issue with Python itself?
If parent was saying something "grossly ridiculous" I must be doing something wrong too. And I'm happy to hear what as that would lower the pain of using Python.
I.e. this was assumably true three years ago:
https://stackoverflow.com/questions/70828570/what-if-two-pyt...
Second, what exactly would you like to happen in that instance? You want to have, in a single project, the same library but at different and conflicting versions. The only way to solve that is to disambiguate, per call site, each use of said library. And guess what, that problem exist and was solved 30 years ago by simply providing different package names for different major version. You want to use both gtk 1 and gtk 2 ? Well you have the "gtk" and "gtk2" package, done, disambiguated. I don't think there is any package manager out there providing "gtk" and having version 1 and 2, it's just "gtk" and "gtk2".
Now we could design a solution around that I guess, nothing is impossible in this brave new world of programing, but that seems like a wasted effort for not-a-problem.
So you are saying that (a) I made this up and (b) intentionally so.
How so? I am always flabbergasted when people make such statements.
You know nothing of my use of Python. I work in a specific field (computer graphics) and within that an even more specific sub field, visual effects.
I have to use Python maybe every three months. And there is some dependency related pain every single time. Python's dependency management "is straight up terrible" (quoted from elsewhere in this thread), I concur.
And thusly, in my world, this example is not "contrived" and given the aforementioned circumstances -- that were unknown to you -- even less so "purposefully".
> Second, what exactly would you like to happen in that instance?
I would expect Python to namespace-wrap (on-the-fly) conflicting versions.
See Rust for some similar solution.
> [...] a wasted effort for not-a-problem.
If this was "not-a-problem" why would Rust/cargo go out of its way to solve it? And why would people regularly point out for this to be one of the reasons dependencies are indeed a "not-a-problem" in Rust and how great that is compared to whatever else they battled with before?
Indeed you and I do live in different worlds.
Sit down, have a coffee, re-read your whole comments, create bullet points for your case, and try to have an *objective* look at your arguments.
- Your are frustrated with your use case, seemingly to the point where you don't care about reasonable arguments but just want to lash out at something.
- By your own description, you have a specific use case, in a specific field, in a narrower sub field.
- You are not primarily a Python developer, and use it every 3 months when you have to.
Your experience, in your field, on your project, does not make you a poster child of what everyday Python is like. Sorry for the news.
Now I get that frustration of "I just want things done and not care about that whole ecosystem", but the reality is, that's not a Python thing, it's a "that's not my preferred stack thing".
I have that same feeling whenever I need to get things done in a stack I don't know, and get stuck by something <insert preferred stack> does.
I used Rust the other day and ended up in a case where I needed to implement a trait I do not own. Well that ended up not being possible. That pissed me off for a time, that *really* made the most sense for my use case. Yet... I'm not going to complain that Rust is unusable because of "trait ownership hell" on the internet.
If we let the frustration aside for a minute:
Your use case, as a fact, is very contrived.
One does not stumble into projects that need to work with different, incompatible, similarly named, versions of a same library, every day.
As I mentioned, when that need arises, library maintainers usually just create a new package, with a different name.
That is what have been done for 99.99% of package managers ever in existence, be it system package managers, or language package managers.
And the reason for it is really just common sense:
- It does not happen very often
- Whenever that happens, the solution of providing a new package is the simplest and most well established
- The pattern works, and has been used since 30 years
- It is unambiguous
Note that Rust does _not_ magically solve that problem either, as there is no one size fits all solution to this problem. The best Rust can do, is:
- In the subset use case of this problem where said dependency is solely accessed from the inside of another dependency
- And said library symbols need not be externally accessible
- And said library data structures need not be shared
- Then Rust can build the outer most dependency against a specific version of said inner dependency.
A cargo build that warms up your CPU during winter while recompiling the whole internet is better?
- exports being broken if code is executed from a different directory
- packaging being more complicated than it should be
and I don't even have too much experience in the area of packaging, besides occasionally publishing to a private repo.
I do remember banging my head against failed dependency resolution in my Early days of Python, circa 2014, with Pip and Conda, etc.
The dependency issues I have faced were mostly due to data science folks pinning exact package versions for the sake of replicability in requirements.txt for example
Do you plan on moving forward with this project ? I seem to understand that all the training is done on the CPU, and that you have next steps regarding optimizing that. Do you consider GPU accelerations ?
Also, do you have any benchmarks on known hardware ? Eg, how long would it take to train on a macbook latest gen or your own computer ?
Honestly, I don't know.
This was purely a toy project/thought experiment to challenge myself to learn exactly how these LLMs worked.
It was super cool to see the loss go down and it actually "train".
This is SUPER far from a the real deal. Maybe it could be cool to see how far a fully in memory LLM running on CPU can go.
// Increased for better learning
this doesn't tell me anything
// Use the constants from lib.rs
const MAX_SEQ_LEN: usize = 80;
const EMBEDDING_DIM: usize = 128;
const HIDDEN_DIM: usize = 256;
these are already defined in lib.rs, why not use them (as the comment suggests)
https://old.reddit.com/r/rust/comments/1nguv1a/i_built_an_ll...
However what you asked is wether the vibe coded rust will rot the quality of language ; this is a more difficult to answer to, but I don't think that people who are uninterested in the technics are going to go for rust anyway - from the signals I feedback people are actually not really liking it - they find it too difficult for some reason and prefer to blanket with stuff like C# or python.
Can't explain why.
I never thought about it this way, but it actually makes sense. It's just like how Rust / Go / Java / C# can sometimes be orders of magnitude faster than C, only because they're more expressive languages. If you have a limited amount of time, it may be possible to write an efficient, optimal and concurrent algorithm in Java, while in C, all you can do is the simplest possible solution. Linked list versus slices (which are much more cache-friendly) is the perfect example here.
Its a very cool excercise, I did the same with Zig and MLX a while back, so I can get a nice foundation, but since then as I got hooked and kept adding stuff to it, switched to Pytorch/Transformers.
(the code looks like a very junior or a non-dev wrote it tbh).
[0]: https://github.com/enricozb/picogpt-rust [1]: https://jaykmody.com/blog/gpt-from-scratch/
14 more comments available on Hacker News