Windows ML Is Generally Available
Key topics
Microsoft has announced the general availability of Windows ML, a built-in AI inferencing runtime for on-device model inference, sparking discussion about its implications, comparisons to other solutions, and potential limitations.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
38
Day 3
Avg / period
11.5
Based on 46 loaded comments
Key moments
- 01Story posted
Sep 25, 2025 at 4:11 PM EDT
3 months ago
Step 01 - 02First comment
Sep 25, 2025 at 6:32 PM EDT
2h after posting
Step 02 - 03Peak activity
38 comments in Day 3
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 9, 2025 at 4:23 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
It is the evolution of DirectX for ML, previously known as DirectML.
DirectX was nice in that the documentation, and example/sample code was excellent.
> Call the Windows ML APIs to initialize EPs [Execution Providers], and then load any ONNX model and start inferencing in just a few lines of code.
A lot of people still choose to build games on Direct3D 11 or even 9 for convenience, and now thanks to Proton games built that way run fine on Linux and Steam Deck. Plus technologies like shadercross and mojoshader mean that those HLSL shaders are fairly portable, though that comes at the cost of a pile of weird hacks.
One good thing is that one of the console vendors now supports Vulkan, so building your game around Vulkan gives you a head start on console and means your game will run on Windows, Linux and Mac (though the last one requires some effort via something like MoltenVK) - but this is a relatively new thing. It's great to see either way, since in the past the consoles all used bespoke graphics APIs (except XBox, which used customized DirectX).
An OpenGL-based renderer would have historically been even more of an albatross when porting to consoles than DX, since (aside from some short-lived, semi-broken support on PS3) native high-performance OpenGL has never been a feature on anything other than Linux and Mac. In comparison DirectX has been native on XBox since the beginning, and that was a boon in the XBox 360 era when it was the dominant console.
IMO historically picking a graphics API has always been about tradeoffs, and realities favored DirectX until at least the end of the XBox 360 era, if not longer than that.
Back in my "want to do games phase", and also during Demoscene days, going to Gamedev.net, Flipcode, IGDA forums, or attending GDCE, this was never something fellow coders complained about.
Rather how to do some cool stuff with specific hardware, or gameplay ideas, and mastering various systems was also seen as a skill.
So it isn't that portable as people think.
Game developers care about IP, how to make it go beyond games, getting a publisher deal, gameplay, the proprietary APIs is a set of plugins on a middleware engines in-house or external, and done it is.
Also there is a whole set of companies whose main business is porting games, where is where several studios got their foot into the door before coming up with their own ideas, as a means to get experience and recognition in the industry, they are thankful each platform is something else.
Finally anyone claiming Khronos APIs are portable never had the pleasure to use extensions or deal with drivers and shader compiler bugs.
Even Mac OS only adopted OpenGL after the OS X reboot, before it was QuickDraw 3D, and Amiga used Warp 3D during its last days.
The other components were very well: DirectInput etc.
And it is a developer feature hidden from end users. e.g. - In your ollama example, does the developer ask end users to install ollama? Does the dev redistribute ollama and keep it updated?
The ONNX format is pretty much a boring de-facto standard for ML model exchange. It is under the linux foundation.
The ONNX Runtime is a microsoft thing, but it is an MIT licensed runtime for cross language use and cross OS/HW platform deployment of ML models in the ONNX format.
That bit needs to support everything because Microsoft itself ships software on everything.(Mac/linux/iOS/Android/Windows.
ORT — https://onnxruntime.ai
Here is the Windows ML part of this —https://learn.microsoft.com/en-us/windows/ai/new-windows-ml/...
The primary value claims for Windows ML (for a developer using it)— This eliminates the need to: Bundle execution providers for specific hardware vendors
Create separate app builds for different execution providers
Handle execution provider updates manually.
Since ‘EP’ is ultra-super-techno-jargon:
Here is what GPT-5 provides:
Intensional (what an EP is)
In ONNX Runtime, an Execution Provider (EP) is a pluggable backend that advertises which ops/kernels it can run and supplies the optimized implementations, memory allocators, and (optionally) graph rewrites for a specific target (CPU, CUDA/TensorRT, Core ML, OpenVINO, etc.). ONNX Runtime then partitions your model graph and assigns each partition to the highest-priority EP that claims it; anything unsupported falls back (by default) to the CPU EP.
Extensional (how you use them) • You pick/priority-order EPs per session; ORT maps graph pieces accordingly and falls back as needed. • Each EP has its own options (e.g., TensorRT workspace size, OpenVINO device string, QNN context cache). • Common EPs: CPU, CUDA, TensorRT (NVIDIA), DirectML (Windows), Core ML (Apple), NNAPI (Android), OpenVINO (Intel), ROCm (AMD), QNN (Qualcomm).
Since this uses ONNX you probably won't be able to use ollama directly with it, but conceptually you could use an app like it to run your models in a more optimized way.
This sounds equivalent to Apple's announcement last week about opening up access for any developer to tap into the on-device large language model at the core of Apple Intelligence[1]
No matter the device, this is a win-win for developers making & consumers getting privacy-focused apps
[1] https://www.apple.com/newsroom/2025/09/new-apple-intelligenc...
Thus C#, C++ and Python support as WinRT projections on top of the new API.
Now, the real question is whether vLLM/ONNX or just running straight on CUDA/ROCm are the only alternatives or we are all trading one vendor lock-in with another.
Could there be a distributed training system run by contributors, like the various @Home projects? Yeah, decent chance of that working, especially with the widespread availability of fiber connections. But then to query the model you still need a large low-latency system (i.e. not distributed) to host it, and that's expensive.
Is this the new Windows 12 ?
[1] Docker Was Too Slow, So We Replaced It: Nix in Production [video]
https://news.ycombinator.com/item?id=45398468
This ability to run models locally enables developers to build AI experiences that are more responsive, private and cost-effective, reaching users across the broadest range of Windows hardware.