ML Needs a New Programming Language – Interview with Chris Lattner
Posted4 months agoActive4 months ago
signalsandthreads.comTechstoryHigh profile
heatedmixed
Debate
80/100
Programming LanguagesMachine LearningMojo
Key topics
Programming Languages
Machine Learning
Mojo
The discussion revolves around Chris Lattner's interview on why ML needs a new programming language, specifically Mojo, and the community's mixed reactions to its potential, design, and licensing.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1h
Peak period
84
0-6h
Avg / period
17.8
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 5, 2025 at 7:33 AM EDT
4 months ago
Step 01 - 02First comment
Sep 5, 2025 at 8:35 AM EDT
1h after posting
Step 02 - 03Peak activity
84 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 8, 2025 at 3:15 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45137373Type: storyLast synced: 11/20/2025, 8:23:06 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
The strong AI focus seems to be a sign of the times, and not actually something that makes sense imo.
Are you sure about that? I think Mojo was always talked about as "The language for ML/AI", but I'm unsure if Mojo was announced before the current hype-cycle, must be 2-3 years at this point right?
Swift has some nice features. However, the super slow compilation times and cryptic error messages really erase any gains in productivity for me.
- "The compiler is unable to type-check this expression in reasonable time?" On an M3 Pro? What the hell!?
- To find an error in SwiftUI code I sometimes need to comment everything out block by block to narrow it down and find the culprit. We're getting laughs from Kotlin devs.
https://www.cocoawithlove.com/blog/2016/07/12/type-checker-i...
let a: Double = -(1 + 2) + -(3 + 4) + -(5)
Still fails on a very recent version of Swift, Swift 6.1.2, if my test works.
Chris Lattner mentions something about him being more of an engineer than a mathematician, but a responsible and competent computer science engineer that is designing programming languages with complex type systems, absolutely has to at least be proficient in university-level mathematics and relevant theory, or delegate out and get computer scientists to find and triple-check any relevant aspects of the budding programing language.
These days though the type checker is not where compile time is mostly spent in Swift; usually it’s the various SIL and LLVM optimization passes. While the front end could take care to generate less redundant IR upfront, this seems like a generally unavoidable issue with “zero cost abstraction” languages, where the obvious implementation strategy is to spit out a ton of IR, inline everything, and then reduce it to nothing by transforming the IR.
That’s really only true if you have overloading though! Without overloading there are no disjunction choices to attempt, and if you also have principal typing it makes the problem of figuring out diagnostics easier, because each expression has a unique most general type in isolation (so your old CSDiag design would actually work in such a language ;-) )
But perhaps a language where you have to rely on generics for everything instead of just overloading a function to take either an Int or a String is a bridge too far for mainstream programmers.
It feels like a much better design point overall.
My original reply was just to point out that constraint solving, in the abstract, can be a very effective and elegant approach to these problems. There’s always a tradeoff, and it all depends on the combination of other features that go along with it. For example, without bidirectional inference, certain patterns involving closures become more awkward to express. You can have that, without overloading, and it doesn’t lead to intractability.
That programming language designers have to be careful about a type system and its type checking's asymptotic time complexity was 100% widely known before Swift was first created. Some people like to diss on mathematics, but this stuff can have severe practical and widespread engineering consequences. I don't expect everyone to master everything, but budding programming language designers should then at least realize that there may be important issues, and for instance mitigate any issues with having one or more experts checking the relevant aspects.
> Still fails on a very recent version of Swift, Swift 6.1.2, if my test works.
FWIW, the situation with this expression (and others like it) has improved recently:
- 6.1 fails to type check in ~4 seconds
- 6.2 fails to type check in ~2 seconds (still bad obviously, but it's doing the same amount of work in less time)
- latest main successfully type checks in 7ms. That's still a bit too slow though, IMO. (edit: it's just first-time deserialization overhead; if you duplicate the expression multiple times, subsequent instances type check in <1ms).
It has been Mojo's explicit goal from the start. It has it's roots in the time that Chris Lattner spent at Google working on the compiler stack for TPUs.
It was explicitly designed to by Python-like because that is where (almost) all the ML/AI is happening.
https://techcrunch.com/2023/08/24/modular-raises-100m-for-ai...
You don’t raise $130M at a $600M valuation to make boring old dev infrastructure that is sorely needed but won’t generate any revenue because no one is willing to pay for general purpose programming languages in 2025.
You raise $130M to be the programming foundation of next Gen AI. VCs wrote some big friggen checks for that pitch.
https://www.modular.com/blog/a-new-simpler-license-for-max-a...
They say they'll open source in 2026 [1]. But until that has happened I'm operating under the assumption that it won't happen.
[0]: https://www.modular.com/legal/community
[1]: https://docs.modular.com/mojo/faq/#will-mojo-be-open-sourced
Or, arguably worse: my expectation is that they'll open source it, wait for it to get a lot of adoption, possibly some contribution, certainly a lot of mindshare, and then change the license to some text no one has ever heard of that forbids use on nvidia hardware without paying the piper or whatever
If it ships with a CLA, I hope we never stop talking about that risk
If OpenJDK did not or could not exist, we would all be mega fucked. Luckily alternative Java implementations exist, are performant, and are largely indistinguishable from the non-free stuff. Well, except whatever android has going on... That stuff is quirky.
But if you look at dotnet, prior to opensourcing it, it was kind of a fucked up choice to go with. You were basically locked into Windows Server for your backend and your application would basically slowly rot over time as you relied on legacy windows subsystems that even Microsoft barely cared to support, like COM+.
There were always alternative dotnet runtimes like Mono, but they were very not feature complete. Which, ironically, is why we saw so much Java. The CLR is arguably a better designed VM and C# a better language, but it doesn't matter.
Dotnet was on a slow roll to complete obsolence until MS saw the writing on the wall and open sourced it.
Also, it appears to be more robust. Julia is notoriously fickle in both semantics and performance, making it unsuitable for foundational software the way Mojo strives for.
Sure, Mojo the language is more robust. Until its investors decide to 10x the licensing Danegeld.
1: https://enzymead.github.io/Reactant.jl/dev/ 2: https://enzymead.github.io/Reactant.jl/dev/
> write state of the art kernels
You don't write kernels in Julia.
Not sure how that organization compares to Mojo.
People have used the same infrastructure to allow you to compile Julia code (with restrictions) into GPU kernels
The package https://github.com/JuliaGPU/KernelAbstractions.jl was specifically designed so that julia can be compiled down to kernels.
Julia's is high level yes, but Julia's semantics allow it to be compiled down to machine code without a "runtime interpretter" . This is a core differentiating feature from Python. Julia can be used to write gpu kernels.
> write state of the art kernels
Julia and Python are high-level languages that call other languages where the kernels exist.
[1] https://juliagpu.github.io/KernelAbstractions.jl/stable/
First-class support for AoT compilation.
https://docs.modular.com/mojo/cli/build
Yes, Julia has a few options for making executables but they feel like an afterthought.
> write state of the art kernels
Mojo seems to be competing with C++ for writing kernels. PyTorch and Julia are high-level languages where you don't write the kernels.
https://cuda.juliagpu.org/stable/tutorials/introduction/#Wri...
With KernelAbstractions.jl you can actually target CUDA and ROCm:
https://juliagpu.github.io/KernelAbstractions.jl/stable/kern...
For python (or rather python-like), there is also triton (and probably others):
https://pytorch.org/blog/triton-kernel-compilation-stages/
Although I have my doubts that Julia is actually willing to make the compromises which would allow Julia to go that low level. I.e. semantic guarantees about allocations and inference, guarantees about certain optimizations, and more.
First of all some people really like Julia, regardless of how it gets discussed on HN, its commercial use has been steadily growing, and has GPGPU support.
On the other hand, regardless of the sore state of JIT compilers on CPU side for Python, at least MVidia and Intel are quite serious on Python DSLs for GPGPU programming on CUDA and One API, so one gets close enough to C++ performance while staying in Python.
So Mojo isn't that appealing in the end.
1. Easy packaging into one executable. Then, making sure that can be reproducible across versions. Getting code from prior, AI papers to rub can be hard.
2. Predictability vs Python runtime. Think concurrent, low-latency GC's or low/zero-overhead abstractions.
3. Metaprogramming. There have been macro proposals for Python. Mojo could borrow from D or Rust here.
4. Extensibility in a way where extensions don't get too tied into the internal state of Mojo like they do Python. I've considered Python to C++, Rust, or parallelized Python schemes many times. The extension interplay is harder to deal with than either Python or C++ itself.
5. Write once, run anywhere, to effortlessly move code across different accelerators. Several frameworks are doing this.
6. Heterogenous, hot-swappable, vendor-neutral acceleration. That's what I'm calling it when you can use the same code in a cluster with a combination of Nvidia GPU', AMD GPU's, Gaudi3's, NPU's, SIMD chips, etc.
Languages on their own is very hard to gain adoption.
Most people that know this kind of thing don't get much value out of using a high level language to do it, and it's a huge risk because if the language fails to generate something that you want, you're stuck until a compiler team fixes and ships a patch which could take weeks or months. Even extremely fast bug fixes are still extremely slow on the timescales people want to work on.
I've spent a lot of my career trying to make high level languages for performance work well, and I've basically decided that the sweet spot for me is C++ templates: I can get the compiler to generate a lot of good code concisely, and when it fails the escape hatch of just writing some architecture specific intrinsics is right there whenever it is needed.
Optimizing Julia is much harder than optimizing Fortran or C.
Got any sources on that? I've been interested in learning Julia for a while but don't because it feels useless compared to Python, especially now with 3.13
https://info.juliahub.com/industries/case-studies-1/author/j...
If you're interested, they think the language will be ready for open source after completing phase 1 of the roadmap[2].
1.https://youtu.be/I0_XvXXlG5w?si=KlHAGsFl5y1yhXnm&t=943
2. https://docs.modular.com/mojo/roadmap
C++ just seems like a safer bet but I'd love something better and more ergonomic.
> maybe a year, 18 months from now [...] we’ll add classes
As per the roadmap[1], I expect to start seeing more adoption once phase 1 is completed.
1. https://docs.modular.com/mojo/roadmap
Mojo is effectively an internal tool that Modular have released publicly.
I'd be surprised to see any serious adoption until a 1.0 state is reached.
But as the other commented said, it's not really competing with PyTorch, it's competing with CUDA.
And "correlation is not causality," but the occupation with the most vibrant job market until recently was also the one that used free tools. Non-developers like myself looked to that trend and jumped on the bandwagon when we could. I'm doing things with Python that I can't do with Matlab because Python is free.
Interestingly, we may be going back to proprietary tools, if our IDE's become a "terminal" for the AI coding agents, paid for by our employers.
If you want the full C# experience, you will still be getting Windows, Visual Studio, or Rider.
VSCode C# support is under the same license as Visual Studio Community, and lack several tools, like the advanced graphical debugging for parallel code and code profiling.
The great Microsoft has not open sourced that debugger, nor many other tools on .NET ecosystem, also they can afford to subsidise C# development as gateway into Azure, and being valued in 4 trillion, the 2nd biggest in the world.
I don't believe the first two are true, and as a point of reference Rider is part of their new offerings that are free for non-commercial use https://www.jetbrains.com/rider/#:~:text=free%20for%20non-co...
I also gravely, gravely doubt the .NET ecosystem has anything in the world to do with Azure
Azure pays for .NET, and projects like Aspire.
Apple and Google have purged most GPL stuff out of their systems, after making clang shine.
Its not just that OS tooling is "free", it's also better and works for way longer. If you relied on proprietary Delphi-compatible tooling, well... you fucked up!
Or NextSTEP. Or DX 9. Or whatever the fuck.
That shit sucked when it came out and it's only gotten worse. The cherry on top is the companies that promised they're the bees knees actually know that, which is why they left them to die. And, unfortunately, your applications along with them.
With Mojo, on the other hand, I think a library (or improvements to an existing library) would have been a better approach. A new language needlessly forks the developer community and duplicates work. But I can see the monetary incentives that made the Mojo developers choose this path, so good for them.
Like, Rust could not be a C++ library, that does not make sense. Zig could not be a C library. Julia could not be a Python library.
There is some superficial level of abstraction where all programming languages do is interchangeable computation and therefore everything can be achieved in every language. But that superficial sameness doesn't correspond to the reality of programming.
Just comparing for example c++, c#, and typescript. These are all c-like, have heavy MS influence, and despite that all have deeply different fundamentals, concepts, use cases, and goals.
1) we were "ambitiously optimistic" (different way of saying "ego-driven naïveté" perhaps :) ) and 2) the internet misread our long-term ambitions as being short-term goals.
We've learned that the world really really wants a better Python and the general attention spans of clickbait are very short - we've intentionally dialed back to very conservative claims to avoid the perception of us overselling.
We're still just as ambitious though!
-Chris
It was indeed their goal to support Python as superset. Many discussions were around that. So I would not claim any malice in there.
Even if the goal is to be just "close enough", It seems as pie-in-the-sky (Py-in-the-sky?) now as it did when Mojo was first announced. CPython has a huge surface area and it seems like if Mojo is going to succeed they are going to want to focus on differentiating features in the ML space and not going feature-for-feature with CPython. I don't know what "close enough" is, but closer than Py2 was to Py3, certainly.
As far as the executable size, it was only 85kb in my test, a bouncing balls simulation. However, it required 300MB of Julia libraries to be shipped with it. About 2/3 of that is in libjulia-codegen.dll, libLLVM-16jl.dll. So you're shipping this chunky runtime and their LLVM backend. If you're willing to pay for that, you can ship a Julia executable. It's a better story than what Python offers, but it's not great if you want small, self-contained executables.
I've been involved in a few programming language projects, so I'm sympathetic as to how much work goes into one and how long they can take.
At the same time, it makes me wary of Julia, because it highlights that progress is very slow. I think Julia is trying to be too much at once. It's hard enough to be a dynamic, interactive language, but they also want to claim to be performant and compiled. That's a lot of complexity for a small team to handle and deliver on.
1.9 and 1.10 made huge gains in package precompilation and native code caching. then attentions shifted and there were some regressions in compile times due to unrelated things in 1.11 and the upcoming 1.12. but at the same time, 1.12 will contain an experimental new feature `--trim` as well as some further standardization around entry points to run packages as programs, which is a big step towards generating self-contained small binaries. also nearly all efforts in improving tooling are focused on providing static analysis and helping developers make their program more easily compilable.
it's also important a bit to distinguish between a few similar but related needs. most of what I just described applies to generating binaries for arbitrary programs. but for the example you stated "time to first plot" of existing packages, this is already much improved in 1.10 and users (aka non-package-developers) should see sub-second TTFP, and TTFX for most packages they use that have been updated to use the precompilation goodies in recent versions
more realistic examples of compiling a Julia package into .so: https://indico.cern.ch/event/1515852/contributions/6599313/a...
but all the normal marketing words: in my opinion it is fast, expressive, and has particularly good APIs for array manipulation
I'll warn you that Julia's ML ecosystem has the most competitive advantage on "weird" types of ML, involving lots of custom gradients and kernels, integration with other pieces of a simulation or diffeq, etc.
if you just want to throw some tensors around and train a MLP, you'll certainly end up finding more rough edges than you might in PyTorch
Combine that with all the cutting edge applied math packages often being automatically compatible with the autodiff and GPU array backends, even if the library authors didn't think about that... it's a recipe for a lot of interesting possibilities.
In practice, the Julia package ecosystem is weak and generally correctness is not a high priority. But the language is great, if you're willing to do a lot of the work yourself.
What Julia needs though: wayyyy more thorough tooling to support auto generated docs, well integrated with package management tooling and into the web package management ecosystem. Julia attracts really cutting edge research and researchers writing code. They often don't have time to write docs and that shouldn't really matter.
Julia could definitely use some work in the areas discussed in this podcast, not so much the high level interfaces but the low level ones. That's really hard though!
Then again, I am also open to the fact that I'm jammed up by the production use of dynamically typed languages, and maybe the "for ML" part means "I code in Jupyter notebooks" and thus give no shits about whether person #2 can understand what's happening
explicit and strict types on arguments to functions is one way, but certainly not the only way, nor probably the best way to effect that
I readily admit that I am biased in that I believe that having a computer check that every reference to every relationship does what it promises, all the time
more generally, the most important bits of a particular function to understand is
* what should it be called with
* what should it return
* what side effects might it have
and the "what" here refers to properties in a general sense. types are a good shortcut to signify certain named collections of properties (e.g., the `Int` type has arithmetic properties). but there are other ways to express traits, preconditions, postconditions, etc. besides types
they sure can...
That being said, I've always found the argument that types can be overly restrictive and prevent otherwise valid code from running unconvincing. I've yet to see dynamic code that benefits from this alleged advantage.
Nearly universally the properly typed code for the same thing is better, more reliable and easier for new people to understand and modify. So sure, you can avoid all of this if the types are really what bother you, but it feels a bit like saying "there are stunts I can pull off if I'm not wearing a seatbelt that I just can't physically manage if I am."
If doing stunts is your thing, knock yourself out, but I'd rather wear a seatbelt and be more confident I'm going to get to my destination in one piece.
I mean, even in C++ with concepts we can do most of that. And C++ doesn't have the most expressive type system.
That is the problem. Julia could not compete against Python's mindshare.
A competitor to Python needs to be 100% compatible with its ecosystem.
For example, a modern ML application might need an ETL pipeline to load and harmonize data of various types (text, images, video, etc., all in different formats) from various sources (local filesystem, cloud storage, HTTP, etc.) The actual computation then must leverage many different high-level functionalities, e.g. signal/image processing, optimization, statistics, etc. All of this computation might be too big for one machine, and so the application must dispatch jobs to a compute cluster or cloud. Finally, the end results might require sophisticated visualization and organization, with a GUI and database.
There is no single language with a rich enough ecosystem that can provide literally all of the aforementioned functionality besides Python. Python's numerical computing libraries (NumPy/PyTorch/JAX etc.) all call out to C/C++/FORTRAN under the hood and are thus extremely high-performance, and for functionality they don't implement, Python's C/C++ FFIs (e.g. Python.h, NumPy C integration, PyTorch/Boost C++ integration) are not perfect, but are good enough that implementing the performance-critical portions of code in C/C++ is much easier compared to re-implementing entire ecosystems of packages in another language like Julia.
> There is no single language with a rich enough ecosystem that can provide literally all of the aforementioned functionality besides Python.
That may be true, but some of us are still bitter that all that grew up around an at-least-averagely-annoying language rather than something nicer.
Then the title should be "why GPU kernel programming needs a new programming language." I can get behind that; I've written CUDA C and it was not fun (though this was over a decade ago and things may have since improved, not to mention that the code I wrote then could today be replaced by a couple lines of PyTorch). That said, GPU kernel programming is fairly niche: for the vast majority of ML applications, the high-level API functions in PyTorch/TensorFlow/JAX/etc. provide optimal GPU performance. It's pretty rare that one would need to implement custom kernels.
>which are never, ever written in Python.
Not true! Triton is a Python API for writing kernels, which are JIT compiled.
But I’m much more interested in how MLIR opens the door to “JAX in <x>”. I think Julia is moving in that direction with Reactant.jl, and I think there’s a Rust project doing something similar (I think burn.dev may be using ONNX has an even higher-level IR). In my ideal world, I would be able to write an ML model and training loop in some highly verified language and call it from Python/Rust for training.
That's kind of the point of Mojo, they're trying to solve the so-called "two language problem" in this space. Why should you need two languages to write your glue code and kernel code? Why can't there be a language which is both as easy to write as Python, but can still express GPU kernels for ML applications? That's what Mojo is trying to be through clever use of LLVM MLIR.
I think this is because many, many problem domains have a structure that lends themselves well to two-language solutions. They have a small homogenous computation structure on lots of data that needs to run extremely fast. And they also have a lot of configuration and data-munging that is basically quick one-time setup but has to be specified somewhere, and the more concisely you can specify it, the less human time development takes. The requirements on a language designed to run extremely fast are going to be very different from one that is designed to be as flexible and easy to write as possible. You usually achieve quick execution by eschewing flexibility and picking a programming model that is fairly close to the machine model, but you achieve flexibility by having lots of convenience features built into the language, most of which will have some cost in memory or indirections.
There've been a number of attempts at "one language to rule them all", notably PL/1, C++, Julia (in the mathematical programming subdomain), and Common Lisp, but it often feels like the "flexible" subset is shoehorned in to fit the need for zero-cost abstractions, and/or the "compute-optimized" subset is almost a whole separate language that is bolted on with similar but more verbose syntax.
Modern Common Lisp also seems to have given up its "one language to rule them all" mindset and is pretty okay with just dropping into CFFI to call into C libraries as needed. Over the years I've come to see that mindset as mostly a dead-end. Python, web browsers, game engines, emacs, these are all prominent living examples of two-language solutions that have come to dominate in their problem spaces.
One aspect of the "two language problem" that I find troubling though is that modern environments often ossify around the exact solution. For example, it's very difficult to have something like PyTorch in say Common Lisp even though libcuda and libdnn should be fairly straightforward to wrap in Common Lisp (see [1] for Common Lisp CUDA bindings.) JS/TS/WASM that runs in the browser often is dependent on Chrome's behavior. Emacs continues to be tied to its ancient, tech-debt ridden C runtime. There seems to be a lot of value tied into the glue between the two chosen languages and it's hard to recreate that value with other HLLs even if the "metal" language/runtime stays the same.
[1]: https://github.com/takagi/cl-cuda
PyTorch is actually quite illustrative as being a counterexample that proves the rule. It was based on Torch, which had very similar if not identical BLAS routines but used Lua as the scripting language. But now everybody uses PyTorch because Lua development stopped in 2017, so all the extra goodies that people rely on now are in the Python wrapper.
The only exception seems to be when multiple scripting languages are supported, and at roughly equal points of development. So for example - SQLite continues to have most of its value in the C substrate, and is relatively easy to port to other languages, because it has so many language bindings that there's a strong incentive to write new functionality in C and keep the API simple. Ditto client libraries for things like MySQL, PostGres, MongoDB, Redis, etc. ZeroMQ has a bunch of bindings that are largely dumb passthroughs to the underlying C++ substrate.
But even a small imbalance can lead to that one language being preferenced heavily in supporting tooling and documentation. Pola.rs is a Rust substrate and ships with bindings for Python, R, and Node.js, but all the examples on the website are in Python or Rust, and I rarely hear of a non-Python user picking it up.
I also wonder how much of the ossification comes from the embodied logic in the HLL. SQLite wrappers tend to be very simple and let the C core do most of the work. Something like PyTorch on the other hand layers on a lot of logic onto underlying CUDA/BLAS that is essential complexity living solely in Python the HLL. This is also probably why libcurl has so many great wrappers in HLLs because libcurl does the heavy lifting.
The pain point I see repeatedly in putting most of the logic into the performant core is asynchrony. Every HLL seems to have its own way to do async execution (Python with asyncio, Node with its async runtime, Go with lightweight green threads (goroutines), Common Lisp with native threads, etc.) This means that the C core needs to be careful as to what to expose and how to accommodate various asynchrony patterns.
Also, no. I can't use Python for inference, because it is too slow, so I have to export to tensorflow lite and run the model in C++, which essentially required me to rewrite half the code in C++ again.
It seems like it's just extremely difficult to give fine-grained control over the metal while having an easy, ergonomic language that lets you just get on with your tasks.
[0] https://www.youtube.com/watch?v=RUJFd-rEa0k
118 more comments available on Hacker News