Apple Silicon GPU Support in Mojo
Posted3 months agoActive3 months ago
forum.modular.comTechstoryHigh profile
calmmixed
Debate
70/100
Mojo Programming LanguageApple Silicon GPU SupportMachine LearningPython Ecosystem
Key topics
Mojo Programming Language
Apple Silicon GPU Support
Machine Learning
Python Ecosystem
The Mojo programming language has added support for Apple Silicon GPU, sparking discussion on its potential to replace CUDA and its compatibility with the Python ecosystem.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
46m
Peak period
38
0-6h
Avg / period
9.9
Comment distribution69 data points
Loading chart...
Based on 69 loaded comments
Key moments
- 01Story posted
Sep 21, 2025 at 4:35 PM EDT
3 months ago
Step 01 - 02First comment
Sep 21, 2025 at 5:21 PM EDT
46m after posting
Step 02 - 03Peak activity
38 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 25, 2025 at 6:08 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45326388Type: storyLast synced: 11/20/2025, 4:44:33 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
But I just think Python is not the right language to try to turn into this super-optimized parallel processing system they are trying to build.
But their target market are Python programmers, I guess. So I'm not sure what a better option would be.
It would be interesting for them to develop their own language and make it all work. But "yet another programming language" is a tough sell.
Octave has a very nice syntax (it's an extended Matlab's syntax to provide the good parts of numpy broadcasting). I assume Julia uses something very similar to that. I have wanted to work with Julia but it's so frustrating to have to build so much of the non-interesting stuff that just exists in python. And back when I looked into it there didn't seem to be an easy way to just plug Julia into python things and incrementally move over. Like you couldn't swap the numerics and keep with matplotlib things you already had. You had to go learn Julia's ways of plotting and doing everything. It would have been nice if there were an incremental approach.
One thing I am on the fence about is indexing with '()' vs '[]'. In Matlab both function calls and indexing use '()' which is a Fortran style (the ambiguity lets you swap functions for matrices to reduce memory use but that's all possible with '[]' in python) which can sometimes be nice. Anyway if you have something like mojo you're wanting to work directly with indices again and I haven't done that in a long time.
Ultimately I don't think anyone would care if mojo and python just play nicely together with minimal friction. (Think: "hey run this mojo code on these numpy blobs"). If I can build GUIs and interact with the OS and parse files and the interact with web in python to prep data while simultaneously crunching in mojo that seems wonderful.
I just hate that Julia requires immediately learning all the dumb crap that doesn't matter to me. Although it's seeming like LLM seem very good at the dumb crap so some sort of LLM translation for the dumb crap could be another option.
In summary: all mojo actually needs is to be better than numba and cython type things with performance that at least matches C++ and Fortran and the GPU libraries. Once that happens then things like the mojo version of pandas will be developed (and will replace things like polars)
Guess why it wasn't a success, or why Julia is having adoption issues among the same community.
Or why although Zig is basically Modula-2 type system, it is being more hyped than Modula-2 ever was since 1978 (it is even part of GCC nowadays).
Syntax and familiarity matters.
Even with Jax, PyTorch, HF Transformers, whatever you want to throw at it--the dx for cross-platform gpu programming that are compatible with large language models requirements specifically is extremely bad.
I think this may end up be the most important thing that Lattner has worked on in his life (And yes, I am aware of his other projects!)
I say the ship sailed in 2012 because that was around when it was decided to build Tensorflow around legacy data infrastructure at Google rather than developing something new, and the rest of the industry was hamstrung by that decision (along with the baffling declarative syntax of Tensorflow, and the requirement to use Blaze to build it precluding meaningful development outside of Google).
The industry was so desperate to get away from it that they collectively decided that downloading a single giant library with every model definition under the sun baked into it was the de facto solution to loading Torch models for serving, and today I would bet you that easily 90% of deep learning models in production revolve around either TensorRT, or a model being plucked from Huggingface’s giant library.
The decision to halfass machine learning was made a long time ago. A tool like Mojo might work at a place like Apple that works in a vacuum (and is lightyears behind the curve in ML as a result), but it just doesn’t work on Earth.
If there’s anyone that can do it, it’s Lattner, but I don’t think it can be done, because there’s no appetite for it nor is the talent out there. It’s enough of a struggle to get big boy ML engineers at Mag 7 companies to even use Python instead of letting Copilot write them a 500 line bash script. The quality of slop in libraries like sglang and verl is a testament to the futility of trying to reintroduce high quality software back into deep learning.
Are you talking about NVIDIA Hopper or any of the rest of the accelerators people care about these days? :). We're talking about a lot more performance and TCO at stake than traditional CPU compilers.
On the flipside, far from figuring out GPU efficiency, most people with huge jobs are network bottlenecked. And that’s where the problem arises: solutions for collective comms optimization tend to explode in complexity because, among other reasons, you now have to package entire orchestrators in your library somehow, which may fight with the orchestrators that actually launch the job.
Doing my best to keep it concise, but Hopper is like a good case study. I want to use Megatron! Suddenly you need FP8, which means the CXX11 ABI, which means recompiling Torch along with all those nifty toys like flash attention, flashinfer, vllm, whatever. Ray, jsonschema, Kafka and a dozen other things also need to match the same glibc and glibc++ versions. So using that as an example, suddenly my company needs C++ CICD pipelines, dependency management etc when we didn’t before. And I just spent three commas on these GPUs. And most likely, I haven’t made a dime on my LLMs, or autonomous vehicles, or weird cyborg slavebots.
So what all that boils down to is just that there’s a ton of inertia against moving to something new and better. And in this field in particular, it’s a very ugly, half-assed, messy inertia. It’s one thing to replace well-designed, well-maintained Java infra with Golang or something, but it’s quite another to try to replace some pile of shit deep learning library that your customers had to build a pile of shit on top of just to make it work, and all the while fifty college kids are working 16 hours a day to add even more in the next dev release, which will of course be wholly backwards and forwards incompatible.
But I really hope I’m wrong :)
I don't think it's gonna happen instantly, but it will happen, and Mojo/Modular are really the only language platform I see taking a coherent approach to it right now.
It’s just such a massive, uphill, ugly moving target to try to run down. And I sit here thinking the same as many of these comments—on the one hand, I can’t imagine we’re still using Python 3 in 2035? 2050?? But on the other hand I can’t envision a path leading out of the mess making money, or at least continue pretending they’ll start to soon.
Nope. There's certainly room for another alternative that's performant and portable than the rest without the hacks needed to meet it.
Maybe you caught the wrong ship, but Mojo is a speedboat.
> Mojo is never going to be anything but a vanity project.
Will come back in 10 years and we'll see if your comment needs to be studied like the one done for Dropbox.
I don't know if this is a language that will catch on, but I guarantee there will be another deep learning focused language that catches on in the future.
Metal.jl can be used to write GPU kernels in Julia to target an Apple Silicon GPU. Or you can use KernelAbstractions.jl to write once in a high-level CUDA-like language to target NVIDIA/AMD/Apple/Intel GPUs. For best performance, you'll want to take advantage of vendor-specific hardware, like Tensor Cores in CUDA or Unified Memory on Mac.
You also get an ever-expanding set of Julia GPU libraries. In my experience, these are more focused on the numerical side rather than ML.
If you want to compile an executable for an end user, that functionality was added in Julia 1.12, which hasn't been released yet. Early tests with the release candidate suggest that it works, but I would advise waiting to get a better developer experience.
Also I think because samples in one channel need to be processed sequentially, does that mean mono audio processing won't benefit a lot from GPU programming. Or maybe you are dealing with spectral signal processing?
You need to find parallelism somewhere to make it worth it. This can be multiple independent channels/voices, one large simulation, one high quality simulation, a large neural network, solving PDEs, voxel simulation (https://www.youtube.com/watch?v=1bS7sHyfi58), additive synthesis, a multitude of FFTs...
The closest to that is Mojo and borrows many of Rust's ideas, built in type safety with the aim of being compatible with the existing Python ecosystem which is great.
I've never heard a sound argument against Mojo and continue to see the weakest arguments that go along the lines of:
"I don't want to learn another language"
"It will never take off because we don't need another deep learning DSL"
"It's bad that a single company owns the language just like Google and Golang, Microsoft and C# and Apple and Swift".
Well I prefer tools that are extremely fast, save time and make lots of money, instead of spinning up hundreds of costly VMs as the solution. If Mojo excels in performance and reduces cost then I'm all for that, even better if it achieves Python compatibility.
By itself, that's not so bad. Plenty of "buy, don't build" choices out there.
However, every other would-be Mojo user also knowns that. And they don't want to build on top of an ecosystem that's not fully open.
Why don't Mathematica/MATLAB have pytorch-style DL ecosystems? Because nobody in their right mind would contribute for free to a platform owned by Wolfram Research or Mathworks.
I'm hopeful that Modular can navigate this by opening up their stack.
You realize that CUDA isn't open source or planned to be open source in the future, right?
Meanwhile parts of Mojo are already open source with the rest expected to be opened up next year.
Mojo is planned to be both free and open source by the end of next year and it's not vendor locked to extremely expensive hardware.
Also as of today anything CUDA works out of the box in Windows, Mojo might eventually work outside WSL, some day.
There is no disadvantage vs CUDA.
Not an expert, but though I wouldn't be surprised if Mojo ends up being a better language than Rust for the use case we're discussing, I'm not confident it will ever catch up to Rust in ecosystem and escape velocity as a sane general purpose compiled systems language. It really does feel like Rust has replaced C++ for net new buildouts that would've previously needed its power.
If one language was used for iOS apps and gpu programming, with some compatibility with python, it would be pretty neat.
I do not think that is same as VC-backed. Google/Microsoft/Apple need those languages for their ecosystem/infrastructure. Danger there is "just" vendor lock-in. With VC-backed language there is also possibility of enshittification.
If Mojo focuses on systems software ( and gets rid of exceptions - Chris, please <3 ) it will be a serious competitor to Rust and Go. It has all the performance and safety of Rust with a significantly easier learning curve.
We have a public roadmap and are hard charging about improving the language, check out https://docs.modular.com/mojo/roadmap/ to learn more.
-Chris
Some example motivations:
- Strange synchronization/coherency requirements
- Working with new hardware / new strategies that Nvidia&co haven't fine-tuned yet
- Just wanting to squeeze out some extra performance
Just the notion of replacing the parts of LLVM that force it to remain single threaded would be a major sea change for developer productivity.