No Graphics API
Key topics
The debate rages on around the notion of simplifying graphics APIs, with a recent article sparking discussion on whether the complexity of Vulkan and DX12 is still justified. Commenters weigh in, with some arguing that the latest GPU capabilities render much of the current API complexity unnecessary, while others counter that recent advancements, like hardware ray tracing, still require the current level of complexity. A surprising observation is that the proposed simplified API bears a striking resemblance to the SDL3 GPU API, which went unmentioned in the original article. As the conversation unfolds, it becomes clear that the industry is on the cusp of a significant shift, with even major titles like Doom and Indiana Jones already embracing hardware ray tracing as a minimum requirement.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
19m
Peak period
95
0-12h
Avg / period
17.8
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 16, 2025 at 2:20 PM EST
19 days ago
Step 01 - 02First comment
Dec 16, 2025 at 2:39 PM EST
19m after posting
Step 02 - 03Peak activity
95 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 22, 2025 at 6:01 AM EST
14 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I hope the IHVs have a look at it because current DX12 seems semi abandoned, with it not supporting buffer pointers even when every gpu made on the last 10 (or more!) years can do pointers just fine, and while Vulkan doesnt do a 2.0 release that cleans things, so it carries a lot of baggage, and specially, tons of drivers that dont implement the extensions that really improve things.
I think this puts a floor on supported hardware though, like Nvidia 30xx and Radeon 5xxx. And of course motherboard support is a crapshoot until 2020 or so.
Bindless textures never needed any kind of resizable BAR, you have been able to use them since early 2010s on opengl through an extension. Buffer pointers also have never needed it.
But soon? Hopefully
there are also some arm laptops that just run Qualcomm chips, the same as some phones (tablets with a keyboard, basically, but a bit more "PC"-like due to running Windows).
AFAICT the fusion seems likely to be an accurate prediction.
Only if you're ignoring mobile entirely. One of the things Vulkan did which would be a shame to lose is it unified desktop and mobile GPU APIs.
In this context, both old Switch and Switch 2 have full desktop-class GPUs. They don't need to care about the API problems that mobile vendors imposed to Vulkan.
Mobile vendors insisting on using closed, proprietary drivers that they refuse to constantly update/stay on top of is the actual issue. If you have a GPU capable of cutting edge graphics, you have to have a top notch driver stack. Nobody gets this right except AMD and NVIDIA (and both have their flaws). Apple doesn't even come close, and they are ahead of everyone else except AMD/NVIDIA. AMD seems to do it the best, NVIDIA, a distant second, Apple 3rd, and everyone else 10th.
What about Intel?
I remember there was time about 15 years ago, they were famous for reporting OpenGL capabilities as supported, when they were actually only available as software rendering, which voided any purpose to use such features in first place.
> It is quite telling how good their iGPUs are at 3D that no one counts them in.
I'm not so certain about this: in
> https://old.reddit.com/r/laptops/comments/1eqyau2/apuigpu_ti...
APUs/iGPUs are compared, and here Intel's integrated GPUs seem to be very competitive with AMD's APUs.
---
You of course have to compare dedicated graphics cards with each other, and similarly for integrated GPUs, so let's compare (Intel's) dedicated GPUs (Intel Arc), too:
When I look at
> https://www.tomshardware.com/reviews/gpu-hierarchy,4388.html
the current Intel Arc generation (Intel-Arc-B, "Battlemage") seems to be competitive with entry-level GPUs of NVidia and AMD, i.e. you can get much more powerful GPUs from NVidia and AMD, but for a much higher price. I thus clearly would not call Intel's dedicated GPUs to be so bad "at 3D that no one counts them in".
Also there is no inherent thing that blocks extensions by default. I feel like a reasonable core that can optionally do more things similar to CPU extensions (i.e. vector extensions) could be the way to go here.
In hindsight it really would have been better to have a separate VulkanES which is specialized for mobile GPUs.
The different, separate engine variants for mobile and desktop users, on the other hand, can be based on the same graphics API; they'll just use different features from it in addition to having different algorithms and architecture.
...so you'll have different code paths for desktop and mobile anyway. The same can be achieved with a Vulkan vs VulkanES split which would overlap for maybe 50..70% of the core API, but significantly differ in the rest (like resource binding).
And beyond that if you look at historical trends, mobile is and always has been just "desktop from 5-7 years ago". An API split that makes sense now will stop making sense rather quickly.
And this is the reason why mobile and desktop should be separate graphics APIs. Mobile is holding desktop back not just feature wise, it also fucks up the API.
Vulkan is the actual barrier. On Windows, DirectX does an average job at supporting it. Microsoft doesn't really innovate these days, so NVIDIA largely drives the market, and sometimes AMD pitches in.
It has been mostly NVidia in collaboration with Microsoft, even HLSL traces back to Cg.
This isn't really the case, at least on desktop side.
All three desktop GPU vendors support Vulkan 1.4 (or most of the features via extensions) on all major platforms even on really old hardware (e.g. Intel Skylake is 10+ years old and has all the latest Vulkan features). Even Apple + MoltenVK is pretty good.
Even mobile GPU vendors have pretty good support in their latest drivers.
The biggest issue is that Android consumer devices don't get GPU driver updates so they're not available to the general public.
Ironically a lot of the time, these new APIs end up being slower in practice (something confirmed by gaming benchmarks), probably exactly because of the issues outlined in the article - having precompiled 'pipeline states', instead of the good ol state machine has forced devs to precompile a truly staggering amount of states, and even then sometimes compilation can occur, leading to these well known stutters.
The other issue is synchronization - as the article mentions how unnecessarily heavy Vulkan synchronization is, and devs aren't really experts or have the time to figure out when to use what kind of barrier, so they adopt a 'better be safe than sorry approach', leading to unneccessary flushes and pipeline stalls that can tank performance in real life workloads.
This is definitely a huge issue combined with the API complexity, leading many devs to use wrappers like the aforementioned SDL3, which is definitely very conservative when it comes to synchronization.
Old APIs with smart drivers could either figure this out better, or GPU driver devs looked at the workloads and patched up rendering manually on popular titles.
Additionally by the early to mid 10s, when these new APIs started getting released, a lot of crafty devs, together with new shader models and OpenGL extensions made it possible to render tens of thousands of varied and interesting objects, essentially the whole scene's worth, in a single draw call. The most sophisticated and complex of these was AZDO, which I'm not sure made it actually into a released games, but even with much less sophisticated approaches (and combined with ideas like PBR materials and deferred rendering), you could pretty much draw anything.
This meant much of the perf bottleneck of the old APIs disappeared.
There are some interesting GPU improvements coming down the pipeline, like a possible OoO part from AMD (if certain credible leaks are valid), however, crickets from Microsoft, and NVIDIA just wants vendor lock-in.
Yes, we need a vastly simpler API. I'd argue even simpler than the one proposed.
One of my biggest hopes for RT is that it will standardize like 80% of stuff to the point where it can be abstracted to libraries. It probably won't happen, but one can wish...
What does Microsoft then intend to use to replace the functionality that DirectX provides?
Vulkan is another mess, even if there was a 2.0, how are devs supposed to actually use it, especially on Android, the biggest consumer Vulkan platform?
- It's not exposing raw GPU addresses, SDL3_GPU has buffer objects instead. Also you're much more limited with how you use buffers in SDL3 (ex. no coherent buffers, you're forced to use a transfer buffer if you want to do a CPU -> GPU upload)
- in SDL3_GPU synchronization is done automatically, without the user specifying barriers (helped by a technique called cycling: https://moonside.games/posts/sdl-gpu-concepts-cycling/),
- More modern features such as mesh shading are not exposed in SDL3_GPU, and keeps the traditional rendering pipeline as the main way to draw stuff. Also, bindless is a first class citizen in Aaltonen's proposal (and the main reason for the simplification of the API), while SDL3_GPU doesn't support it at all and instead opts for a traditional descriptor binding system.
This "no api" proposal requires hardware from the last 5-10 years :)
Or has the use of Middleware like Unreal Engine largely made them irrelevant? Or should EPIC put out a new Graphics API proposal?
Game developers create a RHI (rendering hardware interface) like discussed on the article, and go on with game development.
Because the greatest innovation thus far has been ray tracing and mesh shaders, and still they are largely ignored still, so why keep on pushing forward?
The cost/compromise is dropping support for outdated GPUs.
Have you taken a look at the codebase of some game-engines, its complete cluster fk, cause some simple tasks just take 800 lines of code, and in the end the drivers don't even use the complexity graphics API's force upon you.
Improved this is not an accomplishment ?
It would be especially nice for game developers as they face long shader compile times more often, and it would dramatically reduce the complexity of the low level rendering code while improving flexibility.
- Would lead to reduced memory usage on the driver side due to eliminating all the statetracking for "legacy" APIs and all the PSO/shader duplication for the "modern" APIs (who doesn't like using less memory? won't show up on a microbenchmark but a reduced working set leads to globally increased performance in most cases, due to >cache hit%)
- A much reduced cost per API operation. I don't just mean drawcalls but everything else too. And allowing more asynchrony without the "here's 5 types of fences and barriers" kind of mess. As the article says, you can either choose between mostly implicit sync (OpenGL, DX11) and tracking all your resources yourself (Vulkan) then feeding all that data into the API which mostly ignores it. This one wouldn't really have an impact on speeding up existing applications but more like unlock new possibilities. For example massively improving scene variety with cheap drawcalls and doing more procedural objects/materials instead of the standard PBR pipeline. Yes, drawindirect and friends exist but they aren't exactly straightforward to use and require you to structure your problem in a specific way.
Per-drawcall cost goes to nanosecond scale. Assuming you do drawcalls of course, this makes bindless and indirect rendering a bit easier so you could drop CPU cost to near-0 in a renderer.
It would also highly mitigate shader compiler hitches due to having a split pipeline instead of a monolythic one.
The simplification on barriers could improve performance a significant amount because currently, most engines that deal with Vulkan and DX12 need to keep track of individual texture layouts and transitions, and this completely removes such a thing.
UMA or not doesn't matter, desktop GPUs have MMUs and are perfectly capable of reading the CPUs memory in a unified address space (even back then).
> Graphics APIs and shader languages have significantly increased in complexity over the past decade. It’s time to start discussing how to strip down the abstractions to simplify development, improve performance, and prepare for future GPU workloads.
Meaning ... SSDs initially reused IDE/SATA interfaces, which introduced inherent bottlenecks because those standards were designed for spinning disks.
To fully realize SSD performance, a new transport had to be built from the ground up, one that eliminated those legacy assumptions, constraints and complexities.
Unless the link of the article has changed since your comment?
I also think that the way forward is to go back to software rendering, however this time around those algorithms and data structures are actually hardware accelerated as he points out.
Note that this is an ongoing trend on VFX industry already, about 5 years ago OTOY ported their OctaneRender into CUDA as the main rendering API.
Meanwhile GPU raytracing was a purely software affair until quite recently when fixed-function raytracing hardware arrived, which is unfortunately pretty opaque. You kind of have to just let Jensen take the wheel and hope the driver does the right thing.
> “Inbetween” is never written as one word. If you have seen it written in this way before, it is a simple typo or misspelling. You should not use it in this way because it is not grammatically correct as the noun phrase or the adjective form. https://grammarhow.com/in-between-in-between-or-inbetween/
Matthew 7:3 "And why beholdest thou the mote that is in thy brother's eye, but considerest not the beam that is in thine own eye?"
[0] https://en.wiktionary.org/wiki/-%D0%B8%D0%BA#Russian
Also called a "back-formation". FWIF I don't think the existence of corrupted words automatically justifies more corruptions nor does the fact that it is a corruption automatically invalidate it. When language among a group evolves, everyone speaking that language is affected, which is why written language reads pretty differently looking back every 50 years or so, in both formal and informal writing. Therefore language changes should have buy-in from all users.
https://archive.nytimes.com/opinionator.blogs.nytimes.com/20...
Games like the original Half-Life, Unreal Tournament 2004, etc. ran surprisingly well and at decent resolutions.
With the power of modern hardware, I guess you could do a decent FPS in pure software with even naively written code, and not having to deal with the APIs, but having the absolute creative freedom to say 'this pixel is green' would be liberating.
Fun fact: Due to the divergent nature of computation, many ray tracers targeting real time performance were written on CPU, even when GPUs were quite powerful, software raytracers were quite good, until the hardware apis started popping up.
Note that when the parent comment says "software rendering" they're referring to software (compute shaders) on the GPU.
Which is easier to debug.
Going with Mesh shaders, or GPU compute would be the next step.
I think it's fair to say that for most gamers, Vulkan/DX12 hasn't really been a net positive, the PSO problem affected many popular games and while Vulkan has been trying to improve, WebGPU is tricky as it has is roots on the first versions of Vulkan.
Perhaps it was a bad idea to go all in to a low level API that exposes many details when the hardware underneath is evolving so fast. Maybe CUDA, as the post says in some places, with its more generic computing support is the right way after all.
Hot take, Metal is more sane than CUDA.
For example: https://github.com/StafaH/mujoco_warp/blob/render_context/mu...
(raytracer compiles to cuda, used for robobotics RL)
But then game/engine devs want to use the vertex shader producing a uv coordinate and a normal together with a pixel shader that only reads the uv coordinate (or neither for shadow mapping) and don't want to pay for the bandwidth of the unused vertex outputs (or the cost of calculating them).
Or they want to be able to randomly enable any other pipeline stage like tessellation or geometry and the same shader should just work without any performance overhead.
Basically do what most engines do - have preprocessor constants and use different paths based on what attributes you need.
I also don't see how separated pipeline stages are against this - you already have this functionality in existing APIs where you can swap different stages individually. Some changes might need a fixup from the driver side, but nothing which can't be added in this proposed API's `gpuSetPipeline` implementation...
I wish I still had this level of motivation :)
It's rather: can you find a company that pays you for having and extending this arcane knowledge (and even writing about it)?
Even if your job involves such topics, a lot of jobs that require this knowledge are rather "political" like getting the company's wishes into official standards.
And it's quite a bit simpler than what we have in the "modern" GPU APIs atm.
I was an only-half-joking champion of ditching vertex attrib bindings when we were drafting WebGPU and WGSL, because it's a really nice simplification, but it was felt that would be too much of a departure from existing APIs. (Spending too many of our "Innovation Tokens" on something that would cause dev friction in the beginning)
In WGSL we tried (for a while?) to build language features as "sugar" when we could. You don't have to guess what order or scope a `for` loop uses when we just spec how it desugars into a simpler, more explicit (but more verbose) core form/dialect of the language.
That said, this powerpoint-driven-development flex knocks this back a whole seriousness and earnestness tier and a half: > My prototype API fits in one screen: 150 lines of code. The blog post is titled “No Graphics API”. That’s obviously an impossible goal today, but we got close enough. WebGPU has a smaller feature set and features a ~2700 line API (Emscripten C header).
Try to zoom out on the API and fit those *160* lines on one screen! My browser gives up at 30%, and I am still only seeing 127. This is just dishonesty, and we do not need more of this kind of puffery in the world.
And yeah, it's shorter because it is a toy PoC, even if one I enjoyed seeing someone else's take on it. Among other things, the author pretty dishonestly elides the number of lines the enums would take up. (A texture/data format enum on one line? That's one whole additional Pinocchio right there!)
I took WebGPU.webidl and did a quick pass through removing some of the biggest misses of this API (queries, timers, device loss, errors in general, shader introspection, feature detection) and some of the irrelevant parts (anything touching canvas, external textures), and immediately got it down to 241 declarations.
This kind of dishonest puffery holds back an otherwise interesting article.
Among other things, that covers everything running on non-apple, non-nvidia ARM devices, including freshly bought.
The "legacy" part of Vulkan that everyone on desktop is itching to drop (including popular tutorials) is renderpasses... which remain critical for performance on tiled GPUs where utilization of subpasses means major performance differences (also, major mobile GPUs have considerable differences in command submission which impact that as well)
...at the cost of creating PSOs at random times which is an expensive operation :/
IMHO a small number of immutable state objects is the best middle ground (similar to D3D11, but reshuffled like described in Seb's post).
Trying to say pipelines weren't a problem with OpenGL is monumental levels of revisionism. Vulkan (and D3D12, and Metal) didn't invent them for no reason. OpenGL and DirectX drivers spent a substantial amount of effort to hide PSO compilation stutter, because they still had to compile shader bytecode to ISA all the same. They were often not successful and developers had very limited tools to work around the stutter problems.
Often older games would issue dummy draw calls to an off screen render target to force the driver to compile the shader in a loading screen instead of in the middle of your frame. The problem was always hard, you could just ignore it in the older APIs. Pipelines exist to make this explicit.
The mistake Vulkan made was putting too much state in the pipeline, as much of that state is dynamic in modern hardware now. As long as we need to compile shader bytecode to ISA we need some kind of state object to represent the compiled code and APIs to control when that is compiled.
WebGPU doesn't talk to the GPU itself. It requires Vulkan/D3D/Metal underneath to actually implement itself.
>Even Vulkan stopped doing pointless boilerplate like bindings and pipelines.
Vulkan did no such thing. They added VK_KHR_dynamic_rendering and VK_EXT_shader_object to core, which are not required to be supported and must be queried for before using. The former gets rid of render pass objects and framebuffer objects in favor of vkCmdBeginRendering(), and WebGPU already abstracts those two away so you don't see or deal with them. The latter gets rid of monolithic pipeline objects.
Many mobile GPUs still do not support VK_KHR_dynamic_rendering or VK_EXT_shader_object. Even my very own Samsung Galaxy S24 Ultra[1] does not.
Vulkan did not get rid of pipeline objects, they added extensions for modern desktop GPUs that didn't need them. Even modern mobile GPUs still need them, and WebGPU isn't going to fragment their API to wall off mobile users.
[1] https://vulkan.gpuinfo.org/displayreport.php?id=44583
So does WebGL and it's doing perfectly fine without pipelines. They were never necessary. Backends can implement via pipelines, or they can go for the modern route and ignore them.
They are an artificial problem that Vulkan created and WebGPU mistakenly adopted, and which are now being phased out. Some devices may refuse to implement pipeline-free drivers, which is okay. I will happily ignore them. Let's move on into the 21st century without that design mistake, and let legacy devices and companies that refuse to adapt die in dignity.
It is either pixel debugging, or trying to replicate in native code for proper tooling.
Then wgsl came and crippled WebGPU.
What would be one good primer to be able to comprehend all the design issues raised?
Bonus points if you then look at CUDA “hello world” and consider that this can run on the same hardware (sans fixed function accelerators) with 100x less boilerplate.
I have all of that but DX12 knowledge, and 50% of this article still went over my head.
> Meshlet has no clear 1:1 lane to vertex mapping, there’s no straightforward way to run a partial mesh shader wave for selected triangles. This is the main reason mobile GPU vendors haven’t been keen to adapt the desktop centric mesh shader API designed by Nvidia and AMD. Vertex shaders are still important for mobile.
I get that there's no mapping from vertex/triangle to tile until after the mesh shader runs. But even with vertex shaders there's also no mapping from vertex/triangle to tile until after the vertex shader runs. The binning of triangles to tiles has to happen after the vertex/mesh shader stage. So I don't understand why mesh shaders wouldn't work for mobile TBDR.
I guess this is suggesting that TBDR implementations split the vertex shader into two parts, one that runs before binning and only calculates positions, and one that runs after and computes everything else. I guess this could be done but it sounds crazy to me, with lots of duplicated work.
In fact, Qualcomm's documentation explicitly spells this out: https://docs.qualcomm.com/nav/home/overview.html?product=160...
A lot of this post went over my head, but I've struggled enough with GLSL for this to be triggering. Learning gets brutal for the lack of middle ground between reinventing every shader every time and using an engine that abstracts shaders from the render pipeline. A lot of open-source projects that use shaders are either allergic to documenting them or are proud of how obtuse the code is. Shadertoy is about as good as it gets, and that's not a compliment.
The only way I learned anything about shaders was from someone who already knew them well. They learned what they knew by spending a solid 7-8 years of their teenage/young adult years doing nearly nothing but GPU programming. There's probably something in between that doesn't involve giving up and using node-based tools, but in a couple decades of trying and failing to grasp it I've never found it.
https://lettier.github.io/3d-game-shaders-for-beginners/inde...
I agree on the other points. GPU graphics programming is hard in large part because of terrible or lack of documentation.
What the modern APIs give you is less CPU driver overhead and new functionality like ray tracing. If you're not CPU-bound to begin with and don't need those new features, then there's not much of a reason to switch. The modern APIs require way more management than the prior ones; memory management, CPU-GPU synchronization, avoiding resource hazards, etc.
Also, many of those AAA games are also moving to UE5, which is basically DX12 under the hood (presumably it should have a Vulkan backend too, but I don't see it used much?)
Vulkan has the same issues (and more) as D3D12, you just don't hear much about it because there are hardly any games built directly on top of Vulkan. Vulkan is mainly useful as Proton backend on Linux.
Will it be possible to hallucinate the frame of a game at a similar speed to rendering it with a mesh and textures?
We're already seeing the hybrid version of this where you render a lower res mesh and hallucinate the upscaled, more detailed, more realistic looking skin over the top.
I wouldn't want to be in the game engine business right now :/
* GPU virtualization (e.g., the D3D residency APIs), to allow many applications to share GPU resources (e.g., HBM).
* Undefined behavior: how easy is it for applications to accidentally or intentionally take a dependency on undefined behavior? This can make it harder to translate this new API to an even newer API in the future.
https://github.com/google/toucan
In particular, this fork: https://github.com/RobertBeckebans/nvrhi which adds some niceties and quality of life improvements.
18 more comments available on Hacker News