No Graphics API

Posted25 days agoActive19 days ago

ryandrake

830 points

178 comments

sebastianaaltonen.comTech DiscussionstoryHigh profile

informativeneutral

Debate

20/100

AI Image GenerationApi_designGame Remastering

Key topics

AI Image Generation

Api_design

Game Remastering

The debate rages on around the notion of simplifying graphics APIs, with a recent article sparking discussion on whether the complexity of Vulkan and DX12 is still justified. Commenters weigh in, with some arguing that the latest GPU capabilities render much of the current API complexity unnecessary, while others counter that recent advancements, like hardware ray tracing, still require the current level of complexity. A surprising observation is that the proposed simplified API bears a striking resemblance to the SDL3 GPU API, which went unmentioned in the original article. As the conversation unfolds, it becomes clear that the industry is on the cusp of a significant shift, with even major titles like Doom and Indiana Jones already embracing hardware ray tracing as a minimum requirement.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

19m

Peak period

0-12h

Avg / period

17.8

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Dec 16, 2025 at 2:20 PM EST
25 days ago
Step 01
02First comment
Dec 16, 2025 at 2:39 PM EST
19m after posting
Step 02
03Peak activity
95 comments in 0-12h
Hottest window of the conversation
Step 03
04Latest activity
Dec 22, 2025 at 6:01 AM EST
19 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (178 comments)

Showing 160 comments of 178

vblanco

25 days ago

7 replies

This is a fantastic article that demonstrates how much of vulkan and DX12 is no longer needed.

I hope the IHVs have a look at it because current DX12 seems semi abandoned, with it not supporting buffer pointers even when every gpu made on the last 10 (or more!) years can do pointers just fine, and while Vulkan doesnt do a 2.0 release that cleans things, so it carries a lot of baggage, and specially, tons of drivers that dont implement the extensions that really improve things.

tadfisher

25 days ago

1 reply

Isn't this all because PCI resizable BAR is not required to run any GPU besides Intel Arc? As in, maybe it's mostly down to Microsoft/Intel mandating reBAR in UEFI so we can start using stuff like bindless textures without thousands of support tickets and negative reviews.

I think this puts a floor on supported hardware though, like Nvidia 30xx and Radeon 5xxx. And of course motherboard support is a crapshoot until 2020 or so.

vblanco

25 days ago

This is not really directly about resizable BAR. you could do mostly the same api without it. Resizable bar simplifies it a little bit because you skip manual transfer operations, but its not completely required as you can write things to a cpu-writeable buffer and then begin your frame with a transfer command.

Bindless textures never needed any kind of resizable BAR, you have been able to use them since early 2010s on opengl through an extension. Buffer pointers also have never needed it.

kllrnohj

25 days ago

1 reply

No longer needed is a strong statement given how recent the GPU support is. It's unlikely anything could accept those minimum requirements today.

But soon? Hopefully

jsheard

25 days ago

2 replies

Those requirements more or less line up with the introduction of hardware raytracing, and some major titles are already treating that as a hard requirement, like the recent Doom and Indiana Jones games.

kllrnohj

25 days ago

5 replies

Only if you're ignoring mobile entirely. One of the things Vulkan did which would be a shame to lose is it unified desktop and mobile GPU APIs.

jsheard

25 days ago

4 replies

Eh, I think the jury is still out on whether unifying desktop and mobile graphics APIs is really worth it. In practice their capabilities and performance are so vastly different that an engine will typically only target one or the other, or have completely separate renderers for each anyway.

kllrnohj

25 days ago

3 replies

It's quite useful for things like skia or piet-gpu/vello or the general category of "things that use the GPU that aren't games" (image/video editors, effects pipelines, compute, etc etc etc)

Groxx

25 days ago

1 reply

would it also apply to stuff like the Switch, and relatively high-end "mobile" gaming in general? (I'm not sure what those chips actually look like tho)

there are also some arm laptops that just run Qualcomm chips, the same as some phones (tablets with a keyboard, basically, but a bit more "PC"-like due to running Windows).

AFAICT the fusion seems likely to be an accurate prediction.

deliciousturkey

24 days ago

1 reply

Switch has its own API. The GPU also doesn't have limitations you'd associate with "mobile". In terms of architecture, it's a full desktop GPU with desktop-class features.

kllrnohj

24 days ago

2 replies

well, it's a desktop GPU with desktop-class features from 2014 which makes it quite outdated relative to current mobile GPUs. The just released Switch 2 uses an Ampere-based GPU, which means it's desktop-class for 2020 (RTX 3xxx series), which is nothing to scoff about but "desktop-class features" is a rapidly moving target and the Switch ends up being a lot closer to mobile than it does to desktop since it's always launching with ~2 generations old GPUs.

deliciousturkey

20 days ago

The context was

Only if you're ignoring mobile entirely. One of the things Vulkan did which would be a shame to lose is it unified desktop and mobile GPU APIs.

In this context, both old Switch and Switch 2 have full desktop-class GPUs. They don't need to care about the API problems that mobile vendors imposed to Vulkan.

pjmlp

24 days ago

Still beats the design of all Web 3D APIs, and has much better development tooling, let that sink in.

jsheard

25 days ago

I suppose that's true, yeah. I was focusing too much on games specifically.

pjmlp

23 days ago

Those already have their own abstraction API, and implementing a RHI isn't a big issue as FOSS circles make it to be.

eek2121

24 days ago

1 reply

I definitely disagree here. What matters for mobile is power consumption. Capabilities can be pretty easily implemented...if you disagree, ask Apple. They have seemingly nailed it (with a few unrelated limitations).

Mobile vendors insisting on using closed, proprietary drivers that they refuse to constantly update/stay on top of is the actual issue. If you have a GPU capable of cutting edge graphics, you have to have a top notch driver stack. Nobody gets this right except AMD and NVIDIA (and both have their flaws). Apple doesn't even come close, and they are ahead of everyone else except AMD/NVIDIA. AMD seems to do it the best, NVIDIA, a distant second, Apple 3rd, and everyone else 10th.

aleph_minus_one

23 days ago

1 reply

> If you have a GPU capable of cutting edge graphics, you have to have a top notch driver stack. Nobody gets this right except AMD and NVIDIA (and both have their flaws). Apple doesn't even come close, and they are ahead of everyone else except AMD/NVIDIA. AMD seems to do it the best, NVIDIA, a distant second, Apple 3rd, and everyone else 10th.

What about Intel?

pjmlp

23 days ago

1 reply

It is quite telling how good their iGPUs are at 3D that no one counts them in.

I remember there was time about 15 years ago, they were famous for reporting OpenGL capabilities as supported, when they were actually only available as software rendering, which voided any purpose to use such features in first place.

aleph_minus_one

23 days ago

I know that in the past (such as your mentioned 15 years ago) Intel GPUs did have driver issues.

> It is quite telling how good their iGPUs are at 3D that no one counts them in.

I'm not so certain about this: in

> https://old.reddit.com/r/laptops/comments/1eqyau2/apuigpu_ti...

APUs/iGPUs are compared, and here Intel's integrated GPUs seem to be very competitive with AMD's APUs.

---

You of course have to compare dedicated graphics cards with each other, and similarly for integrated GPUs, so let's compare (Intel's) dedicated GPUs (Intel Arc), too:

When I look at

> https://www.tomshardware.com/reviews/gpu-hierarchy,4388.html

the current Intel Arc generation (Intel-Arc-B, "Battlemage") seems to be competitive with entry-level GPUs of NVidia and AMD, i.e. you can get much more powerful GPUs from NVidia and AMD, but for a much higher price. I thus clearly would not call Intel's dedicated GPUs to be so bad "at 3D that no one counts them in".

01HNNWZ0MV43FF

25 days ago

If the APIs aren't unified, the engines will be, since VR games will want to work on both standalone headsets and streaming headsets

ablob

25 days ago

I feel like it's a win by default. I do like to write my own programs every now and then and recently there's been more and more graphics sprinkled into them. Being able to reuse those components and just render onto a target without changing anything else seems to be very useful here. This kind of seamless interoperability between platforms is very desirable in my book. I can't think of a better approach to achieve this than the graphics API itself.

Also there is no inherent thing that blocks extensions by default. I feel like a reasonable core that can optionally do more things similar to CPU extensions (i.e. vector extensions) could be the way to go here.

flohofwoe

24 days ago

1 reply

> One of the things Vulkan did which would be a shame to lose is it unified desktop and mobile GPU APIs.

In hindsight it really would have been better to have a separate VulkanES which is specialized for mobile GPUs.

pjmlp

24 days ago

Apparently in many Android devices it is still better to target OpenGL ES than Vulkan due to driver quality, outside Samsung and Google brands.

m-schuetz

24 days ago

1 reply

On the contrary, I would say this is the main thing Vulkan got wrong and the main reason whe the API is so bad. Desktop and mobile are way too different for a uniform rendering API. They should be two different flavours with a common denominator. OpenGL and OpenGL ES were much better in that regard.

HelloNurse

23 days ago

1 reply

It is unreasonable to expect to run the same graphics code on desktop GPUs and mobile ones: mobile applications have to render something less expensive that doesn't exceed the limited capabilities of a low-power device with slow memory.

The different, separate engine variants for mobile and desktop users, on the other hand, can be based on the same graphics API; they'll just use different features from it in addition to having different algorithms and architecture.

flohofwoe

23 days ago

1 reply

> they'll just use different features from it in addition to having different algorithms and architecture.

...so you'll have different code paths for desktop and mobile anyway. The same can be achieved with a Vulkan vs VulkanES split which would overlap for maybe 50..70% of the core API, but significantly differ in the rest (like resource binding).

kllrnohj

23 days ago

1 reply

But they don't actually differ, see the "no graphics API" blog post we're all commenting on :) The primary difference between mobile & desktop is performance, not feature set (ignoring for a minute the problem of outdated drivers).

And beyond that if you look at historical trends, mobile is and always has been just "desktop from 5-7 years ago". An API split that makes sense now will stop making sense rather quickly.

m-schuetz

23 days ago

Different features/architecture is precisely the issue with mobile, be it due to hardware constraints or due to lack in deiver support. Render passes were only bolted into Vulkan because of mobile tiler GPUs, they never made any sense for desktop GPUs and only made Vulkan worse for desktop graphics development.

And this is the reason why mobile and desktop should be separate graphics APIs. Mobile is holding desktop back not just feature wise, it also fucks up the API.

eek2121

24 days ago

1 reply

Mobile is getting RT, fyi. Apple already has it (for a few generations, at least), I think Qualcomm does as well (I'm less familiar with their stuff, because they've been behind the game forever, however the last I've read, their latest stuff has it), and things are rapidly improving.

Vulkan is the actual barrier. On Windows, DirectX does an average job at supporting it. Microsoft doesn't really innovate these days, so NVIDIA largely drives the market, and sometimes AMD pitches in.

pjmlp

23 days ago

Where do you think many DirectX features came from?

It has been mostly NVidia in collaboration with Microsoft, even HLSL traces back to Cg.

pjmlp

24 days ago

It is not unified, when the first thing an application has to do is to find out if their set of extension spaghetti is available on the device.

tjpnz

25 days ago

2 replies

Doom was able to drop it and is now Steam Deck verified.

nicolaslem

24 days ago

Little known fact, the Steam Deck has hardware ray tracing, it's just so weak as to be almost non-existent.

jsheard

25 days ago

Did they drop it? The Steam Deck supports hardware raytracing and the discussion I'm seeing suggests it's still used, just with the quality turned way down when running on a Deck.

exDM69

24 days ago

1 reply

> tons of drivers that dont implement the extensions that really improve things.

This isn't really the case, at least on desktop side.

All three desktop GPU vendors support Vulkan 1.4 (or most of the features via extensions) on all major platforms even on really old hardware (e.g. Intel Skylake is 10+ years old and has all the latest Vulkan features). Even Apple + MoltenVK is pretty good.

Even mobile GPU vendors have pretty good support in their latest drivers.

The biggest issue is that Android consumer devices don't get GPU driver updates so they're not available to the general public.

pjmlp

24 days ago

Neither do laptops, where not using the driver from the OEM with whatver custom code they added can lead to interesting experiences, like power configuration going bad, not able to handle the mixed GPU setups, and so on.

torginus

24 days ago

1 reply

It's weird how the 'next-gen' APIs will turn out to be failures in many ways imo. I think still as sizeable amount of graphics devs still stuck to the old way of doing things. I know a couple graphics wizards (who work on major AAA titles) who never liked Vulkan/DX12, and many engines haven't really been rebuilt to accomodate the 'new' way of doing graphics.

Ironically a lot of the time, these new APIs end up being slower in practice (something confirmed by gaming benchmarks), probably exactly because of the issues outlined in the article - having precompiled 'pipeline states', instead of the good ol state machine has forced devs to precompile a truly staggering amount of states, and even then sometimes compilation can occur, leading to these well known stutters.

The other issue is synchronization - as the article mentions how unnecessarily heavy Vulkan synchronization is, and devs aren't really experts or have the time to figure out when to use what kind of barrier, so they adopt a 'better be safe than sorry approach', leading to unneccessary flushes and pipeline stalls that can tank performance in real life workloads.

This is definitely a huge issue combined with the API complexity, leading many devs to use wrappers like the aforementioned SDL3, which is definitely very conservative when it comes to synchronization.

Old APIs with smart drivers could either figure this out better, or GPU driver devs looked at the workloads and patched up rendering manually on popular titles.

Additionally by the early to mid 10s, when these new APIs started getting released, a lot of crafty devs, together with new shader models and OpenGL extensions made it possible to render tens of thousands of varied and interesting objects, essentially the whole scene's worth, in a single draw call. The most sophisticated and complex of these was AZDO, which I'm not sure made it actually into a released games, but even with much less sophisticated approaches (and combined with ideas like PBR materials and deferred rendering), you could pretty much draw anything.

This meant much of the perf bottleneck of the old APIs disappeared.

eek2121

24 days ago

1 reply

I think the big issue is that there is no 'next-gen API'. Microsoft has largely abandoned DirectX, Vulkan is restrictive as anything, Metal isn't changing much beyond matching DX/Vk, and NVIDIA/AMD/Apple/Qualcomm aren't interested in (re)-inventing the wheel.

There are some interesting GPU improvements coming down the pipeline, like a possible OoO part from AMD (if certain credible leaks are valid), however, crickets from Microsoft, and NVIDIA just wants vendor lock-in.

Yes, we need a vastly simpler API. I'd argue even simpler than the one proposed.

One of my biggest hopes for RT is that it will standardize like 80% of stuff to the point where it can be abstracted to libraries. It probably won't happen, but one can wish...

aleph_minus_one

23 days ago

> Microsoft has largely abandoned DirectX

What does Microsoft then intend to use to replace the functionality that DirectX provides?

PeterStuer

25 days ago

Still have some 1080's in gaming machines going strong. But as even nVidea retired support I guess it is time to move on.

pjmlp

25 days ago

DirectX documentation is on a bad state currently, you have the Frank Lunas's books, which don't cover the latest improvements, and then is hunting through Learn, Github samples and reference docs.

Vulkan is another mess, even if there was a 2.0, how are devs supposed to actually use it, especially on Android, the biggest consumer Vulkan platform?

_bohm

25 days ago

I'm surprised he made no mention of the SDL3 GPU API since his proposed API has pretty significant overlap with it.

henning

25 days ago

1 reply

This looks very similar to the SDL3 GPU API and other hardware rendering interface libraries that have been created at first glance.

cyber_kinetist

25 days ago

1 reply

If you look at the details you can clearly see SDL3_GPU is wildly different from this proposal, such as:

- It's not exposing raw GPU addresses, SDL3_GPU has buffer objects instead. Also you're much more limited with how you use buffers in SDL3 (ex. no coherent buffers, you're forced to use a transfer buffer if you want to do a CPU -> GPU upload)

- in SDL3_GPU synchronization is done automatically, without the user specifying barriers (helped by a technique called cycling: https://moonside.games/posts/sdl-gpu-concepts-cycling/),

- More modern features such as mesh shading are not exposed in SDL3_GPU, and keeps the traditional rendering pipeline as the main way to draw stuff. Also, bindless is a first class citizen in Aaltonen's proposal (and the main reason for the simplification of the API), while SDL3_GPU doesn't support it at all and instead opts for a traditional descriptor binding system.

Scaevolus

25 days ago

1 reply

SDL3 is kind of the intersection of features found in DX12/Vulkan 1.0/Metal: if it's not easily supported in all of them, it's not in SDL3-- hence the lack of bindless support. That means you can run on nearly every device in the last 10-15 years.

This "no api" proposal requires hardware from the last 5-10 years :)

cyber_kinetist

25 days ago

Yup you've actually pointed out the most important difference: SDL3 is designed to be compatible with the APIs and devices of the past (2010s), whereas this proposal is designed to be compatible with the newer 2020s batch of consumer devices.

thescriptkiddie

25 days ago

2 replies

the article talks a lot about PSOs but never defines the term

flohofwoe

25 days ago

"Pipeline State Objects" (immutable state objects which define most of the rendering state needed for a draw/dispatch call). Tbf, it's a very common term in rendering since around 2015 when the modern 3D APIs showed up.

CrossVR

25 days ago

PSOs are Pipeline State Objects, they encapsulate the entire state of the rendering pipeline.

ksec

25 days ago

3 replies

I wonder why M$ stopped putting out new Direct X? Direct X Ultimate or 12.1 or 12.2 is largely the same as Direct X 12.

Or has the use of Middleware like Unreal Engine largely made them irrelevant? Or should EPIC put out a new Graphics API proposal?

pjmlp

25 days ago

1 reply

That has always been the case, it is mostly FOSS circles that argue about APIs.

Game developers create a RHI (rendering hardware interface) like discussed on the article, and go on with game development.

Because the greatest innovation thus far has been ray tracing and mesh shaders, and still they are largely ignored still, so why keep on pushing forward?

djmips

25 days ago

1 reply

I disagree that ray tracing and mesh shaders are largely ignored - at least within AAA game engines they are leaned on quite a lot. Particularly ray tracing.

pjmlp

25 days ago

Game engines aren't games, or sales.

djmips

25 days ago

The frontier of graphics APIs might be the consoles and they don't get a bump until the hardware gets a bump and the console hardware is a little bit behind.

reactordev

25 days ago

[delayed]

yieldcrv

25 days ago

6 replies

what level of performance improvements would this represent?

flohofwoe

25 days ago

1 reply

It's mostly not about performance, but about getting rid of legacy cruft that still exists in modern 3D APIs to support older GPU architectures.

wbobeirne

25 days ago

2 replies

Getting rid of cruft isn't really a goal in and of itself, it's a goal in service of other goals. If it's not about performance, what else would be accomplished?

flohofwoe

25 days ago

A simplified API means higher programmer productivity, higher robustness, simplified debugging and testing, and also less internal complexity in the driver. All this together may also result in slightly higher performance, but it's not the main goal (you might gain a couple of microseconds here and there, but if your use case perfectly fits the 'modern subset' of existing 3D APIs the performance gains will be deep in 'diminishing returns' area. It's mostly about making programmer's life easier on both sides of the API.

The cost/compromise is dropping support for outdated GPUs.

tonis2

24 days ago

Getting rid of cruft and simplifying the GPU access, makes it easier to develope software that uses GPU's, like AI's, games ..etc.

Have you taken a look at the codebase of some game-engines, its complete cluster fk, cause some simple tasks just take 800 lines of code, and in the end the drivers don't even use the complexity graphics API's force upon you.

Improved this is not an accomplishment ?

modeless

25 days ago

It would likely reduce or eliminate the "compiling shaders" step many games now have on first run after an update, and the stutters many games have as new objects or effects come on screen for the first time.

It would be especially nice for game developers as they face long shader compile times more often, and it would dramatically reduce the complexity of the low level rendering code while improving flexibility.

Pannoniae

25 days ago

Most of it has been said by the other replies and they're really good, adding a few things onto it:

- Would lead to reduced memory usage on the driver side due to eliminating all the statetracking for "legacy" APIs and all the PSO/shader duplication for the "modern" APIs (who doesn't like using less memory? won't show up on a microbenchmark but a reduced working set leads to globally increased performance in most cases, due to >cache hit%)

- A much reduced cost per API operation. I don't just mean drawcalls but everything else too. And allowing more asynchrony without the "here's 5 types of fences and barriers" kind of mess. As the article says, you can either choose between mostly implicit sync (OpenGL, DX11) and tracking all your resources yourself (Vulkan) then feeding all that data into the API which mostly ignores it. This one wouldn't really have an impact on speeding up existing applications but more like unlock new possibilities. For example massively improving scene variety with cheap drawcalls and doing more procedural objects/materials instead of the standard PBR pipeline. Yes, drawindirect and friends exist but they aren't exactly straightforward to use and require you to structure your problem in a specific way.

Ono-Sendai

25 days ago

Relative to what? Relative to modern OpenGL with good driver support, not much probably.

m-schuetz

25 days ago

Probably mostly about quality of life. Legacy graphics APIs like Vulkan have abysmal developer UX for no reason.

vblanco

25 days ago

There is no implementation of it but this is how i see it, at least comparing with how things with fully extensioned vulkan work, which uses a few similar mechanics.

Per-drawcall cost goes to nanosecond scale. Assuming you do drawcalls of course, this makes bindless and indirect rendering a bit easier so you could drop CPU cost to near-0 in a renderer.

It would also highly mitigate shader compiler hitches due to having a split pipeline instead of a monolythic one.

The simplification on barriers could improve performance a significant amount because currently, most engines that deal with Vulkan and DX12 need to keep track of individual texture layouts and transitions, and this completely removes such a thing.

reactordev

25 days ago

1 reply

[delayed]

djmips

25 days ago

1 reply

You know what else is good like that? The Switch graphics API - designed by Nvidia and Nintendo. Easily the most straightforward of the console graphics APIs

reactordev

25 days ago

[delayed]

MaximilianEmel

25 days ago

1 reply

I wonder if Valve might put out their own graphics API for SteamOS.

m-schuetz

25 days ago

2 replies

Valve seems to be largely responsible for the mess that is Vulkan.

jsheard

25 days ago

1 reply

If anyone is responsible for the Vulkan mess it's the mobile GPU vendors, many of the compromises especially in the early versions were to solve problems that only existed on mobile.

torginus

20 days ago

1 reply

Which is very weird, considering mobile GPUs by their very nature use unified memory, so supporting things like bindless and GPU pointers (which in this case are just pointers) would be more straightforward than on PC, where basically you have 2 computers with separate memory spaces connected via PCI Express

MindSpunk

19 days ago

Bindless has nothing to do with UMA and everything to do with the fundamentals of how your GPU accesses memory. Older GPUs had limited register spaces where they could store texture and buffer references, the hardware had no instructions to read textures or buffers outside of the references in those small set of hardware registers. They just weren't able to issue a request to the texture unit to read any old texture, it had to be in that set. The GPU itself wasn't able to update those registers, only the CPU could.

UMA or not doesn't matter, desktop GPUs have MMUs and are perfectly capable of reading the CPUs memory in a unified address space (even back then).

pjmlp

25 days ago

Samsung and Google also have their share, see who does most of Vulkanised talks.

opminion

25 days ago

2 replies

The article is missing this motivation paragraph, taken from the blog index:

> Graphics APIs and shader languages have significantly increased in complexity over the past decade. It’s time to start discussing how to strip down the abstractions to simplify development, improve performance, and prepare for future GPU workloads.

alberth

25 days ago

1 reply

Would this be analogous to NVMe?

Meaning ... SSDs initially reused IDE/SATA interfaces, which introduced inherent bottlenecks because those standards were designed for spinning disks.

To fully realize SSD performance, a new transport had to be built from the ground up, one that eliminated those legacy assumptions, constraints and complexities.

rnewme

25 days ago

...and introduced new ones.

stevage

25 days ago

4 replies

Thanks, I had trouble figuring out what the article was about, lost in all the "here's how I used AI and had the article screened by industry insiders".

masspro

25 days ago

1 reply

I read that whole (single) paragraph as “I made really, really, really sure I didn’t violate any NDAs by doing these things to confirm everything had a public source”

beAbU

24 days ago

This is literally the second paragraph in the article. There is no need for interpretation here.

Unless the link of the article has changed since your comment?

jama211

24 days ago

You only read two paragraphs in then?

doctorpangloss

25 days ago

haha, instead of making them read an AI-coauthored blog post, which obviously, they didn't do, he could have asked them interesting questions like, "Do better graphics make better games?" or "If you could change anything about the platforms' technology, what would it be?"

yuriks

25 days ago

I was lost when it suddenly jumped from a long retrospective on GPUs to abruptly talking about "my allocator API" on the next paragraph with no segue or justification.

pjmlp

25 days ago

4 replies

I have followed Sebastian Aaltonen's work for quite a while now, so maybe I am a bit biased, this is however a great article.

I also think that the way forward is to go back to software rendering, however this time around those algorithms and data structures are actually hardware accelerated as he points out.

Note that this is an ongoing trend on VFX industry already, about 5 years ago OTOY ported their OctaneRender into CUDA as the main rendering API.

mrec

25 days ago

3 replies

Isn't this already happening to some degree? E.g. UE's Nanite uses a software rasterizer for small triangles, albeit running on the GPU via a compute shader.

jsheard

25 days ago

1 reply

Things are kind of heading in two opposite directions at the moment. Rasterization was traditionally done in fixed-function hardware, but has steadily become more and more driven by software, culminating in Nanite which is a 99% software rasterizer.

Meanwhile GPU raytracing was a purely software affair until quite recently when fixed-function raytracing hardware arrived, which is unfortunately pretty opaque. You kind of have to just let Jensen take the wheel and hope the driver does the right thing.

HelloNurse

23 days ago

And hardware raytracing is on the same trajectory as hardware rasterization: devs finding ways to repurpose it, leading to pressure for more general APIs, which enable further repurposing, until hardware raytracing evolves into a flexible hardware accelerated facility for indexing, reordering, etc.

djmips

25 days ago

1 reply

Why do you say 'albeit'? I think it's established that 'software rendering' can mean running on the GPU. That's what Octane is doing with CUDA in the comment you are replying to. But good callout on Nanite.

mrec

25 days ago

No good reason, I'm just very very old.

gmueckl

23 days ago

Nanite is just working around an inefficiency that occurs on small triangles that require screen space derivatives, which the hardware approxinates using finite differences between neighbors, e.g. for the texture footprint estimation in mipmapping. The rasterizer invokes additional shader instances around triangle borders to get the source values for these operations. That gets excessive when triangles are tiny. This is an edge case, but it becomes important when there is lots of tiny geometric details on screen.

gmueckl

25 days ago

2 replies

There are tons of places within the GPU where dedicated fixed function hardware provides massive speedups within the relevant pipelines (rasterization, raytracing). The different shader types are designed to fit inbetween those stages. Abandoning this hardware would lead to a massive performance regression.

efilife

25 days ago

4 replies

Offtop, but sorry, I can't resist. "Inbetween" is not a word. I started seeing many people having trouble with prepositions lately, for some unknown reason.

> “Inbetween” is never written as one word. If you have seen it written in this way before, it is a simple typo or misspelling. You should not use it in this way because it is not grammatically correct as the noun phrase or the adjective form. https://grammarhow.com/in-between-in-between-or-inbetween/

mikestorrent

25 days ago

1 reply

Surely you mean "I've started seeing..." rather than "I started seeing..."?

dragonwriter

24 days ago

Either the present perfect that you suggest or the past perfect originally presented is correct, and the denotation is basically identical. The connotation is slightly different, as the past perfect puts more emphasis on the "started...lately" and the emergent nature of the phenomenon, and the present perfect on the ongoing state of what was started, but there’s no giant difference.

Antibabelic

25 days ago

2 replies

"Offtop" is not a word. It's not in any English dictionary I could find and doesn't appear in any published literature.

Matthew 7:3 "And why beholdest thou the mote that is in thy brother's eye, but considerest not the beam that is in thine own eye?"

Joker_vD

24 days ago

1 reply

Oh, it's a transliteration of Russian "офтоп", which itself started as a borrowing of "off-topic" from English (but as a noun instead of an adjective/stative) and then went some natural linguistic developments, namely loss of a hyphen and degemination, surface analysis of the trailing "-ic" as Russian suffix "-ик" [0], and its subsequent removal to obtain the supposed "original, non-derived" form.

[0] https://en.wiktionary.org/wiki/-%D0%B8%D0%BA#Russian

fngjdflmdflg

24 days ago

>subsequent removal to obtain the supposed "original, non-derived" form

Also called a "back-formation". FWIF I don't think the existence of corrupted words automatically justifies more corruptions nor does the fact that it is a corruption automatically invalidate it. When language among a group evolves, everyone speaking that language is affected, which is why written language reads pretty differently looking back every 50 years or so, in both formal and informal writing. Therefore language changes should have buy-in from all users.

speed_spread

24 days ago

Language evolves in mysterious ways. FWIW I find offtop to have high cromulency.

dist-epoch

25 days ago

If enough people use it, it will become correct. This is how language evolves. BTW, there is no "official English language specification". And linguists think it would be a bad idea to have one.

https://archive.nytimes.com/opinionator.blogs.nytimes.com/20...

cracki

24 days ago

[delayed]

formerly_proven

25 days ago

Just consider the sheer number of computations offloaded to TMUs. Shaders would already do nothing but interpolate texels if you removed them.

torginus

24 days ago

2 replies

I really want to make a game using a software rasterizer sometime - just to prove its possible. Back in the good ol' days, I had to get by on my dad's PC, which had no graphics acceleration, but a farily substatial Pentium 3 processor.

Games like the original Half-Life, Unreal Tournament 2004, etc. ran surprisingly well and at decent resolutions.

With the power of modern hardware, I guess you could do a decent FPS in pure software with even naively written code, and not having to deal with the APIs, but having the absolute creative freedom to say 'this pixel is green' would be liberating.

Fun fact: Due to the divergent nature of computation, many ray tracers targeting real time performance were written on CPU, even when GPUs were quite powerful, software raytracers were quite good, until the hardware apis started popping up.

darzu

24 days ago

You should! And you might enjoy this video about making a CPU rasterizer: https://www.youtube.com/watch?v=yyJ-hdISgnw

Note that when the parent comment says "software rendering" they're referring to software (compute shaders) on the GPU.

pjmlp

24 days ago

You could start by staying on the CPU side, and make use of AVX, Larrabee style.

Which is easier to debug.

Going with Mesh shaders, or GPU compute would be the next step.

Q6T46nT668w6i3m

25 days ago

But they still rely on fixed functions for a handful of essential ops (e.g., intersection).

aarroyoc

25 days ago

3 replies

Impressive post, so many details. I could only understand some parts of it, but I think this article will probably be a reference for future graphics API.

I think it's fair to say that for most gamers, Vulkan/DX12 hasn't really been a net positive, the PSO problem affected many popular games and while Vulkan has been trying to improve, WebGPU is tricky as it has is roots on the first versions of Vulkan.

Perhaps it was a bad idea to go all in to a low level API that exposes many details when the hardware underneath is evolving so fast. Maybe CUDA, as the post says in some places, with its more generic computing support is the right way after all.

qiine

24 days ago

1 reply

yeah.. let's make nvidia control more things..

m-schuetz

24 days ago

1 reply

Problem is that NVIDIA literally makes the only sane graphics/compute APIs. And part of it is to make the API accessible, not needlessly overengineered. Either the other vendors start to step up their game, or they'll continue to lose.

Archit3ch

24 days ago

1 reply

> Problem is that NVIDIA literally makes the only sane graphics/compute APIs.

Hot take, Metal is more sane than CUDA.

m-schuetz

24 days ago

I'm having a hard time taking an API seriously that uses atomic types rather than atomic functions.

erwincoumans

25 days ago

Yes, very detailed post, enjoyed all of it. In AI, it is common to use jit compilers (pytorch, jax, warp, triton, taichi, ...) that compile to cuda (or rocm, cpu, tpu, ...). You could write renderers like that, rasterizers or raytracers.

For example: https://github.com/StafaH/mujoco_warp/blob/render_context/mu...

(raytracer compiles to cuda, used for robobotics RL)

apitman

24 days ago

The PSO problem is referring to this, right? https://therealmjp.github.io/posts/shader-permutations-part1...

ginko

25 days ago

1 reply

I mean sure, this should be nice and easy.

But then game/engine devs want to use the vertex shader producing a uv coordinate and a normal together with a pixel shader that only reads the uv coordinate (or neither for shadow mapping) and don't want to pay for the bandwidth of the unused vertex outputs (or the cost of calculating them).

Or they want to be able to randomly enable any other pipeline stage like tessellation or geometry and the same shader should just work without any performance overhead.

Pannoniae

25 days ago

A preprocessor step mostly solves this one. No one said that the shader source has to go into the GPU API 1:1.

Basically do what most engines do - have preprocessor constants and use different paths based on what attributes you need.

I also don't see how separated pipeline stages are against this - you already have this functionality in existing APIs where you can swap different stages individually. Some changes might need a fixup from the driver side, but nothing which can't be added in this proposed API's `gpuSetPipeline` implementation...

Bengalilol

25 days ago

1 reply

After reading this article, I feel like I've witnessed a historic moment.

bogwog

25 days ago

1 reply

Most of it went over my head, but there's so much knowledge and expertise on display here that it makes me proud that this person I've never met is out there proving that software development isn't entirely full of clowns.

ehaliewicz2

25 days ago

1 reply

Seb is incredibly passionate about games and graphics programming. You can find old posts of his on various forums, talking about tricks for programming the PS2, PS3, Xbox 360, etc etc. He regularly posts demos he's working on, progress clips of various engines, etc, on twitter, after staying in the same area for 3 decades.

I wish I still had this level of motivation :)

aleph_minus_one

23 days ago

> I wish I still had this level of motivation :)

It's rather: can you find a company that pays you for having and extending this arcane knowledge (and even writing about it)?

Even if your job involves such topics, a lot of jobs that require this knowledge are rather "political" like getting the company's wishes into official standards.

vegabook

25 days ago

2 replies

ironically, explaining that "we need a simpler API" takes a dense 69-page technical missive that would make the Kronos Vulkan tutorial blush. Somewhat self-defeating.

Pannoniae

25 days ago

It's actually not that low-level! It doesn't really get into hardware specifics that much (other than showing what's possible across different HW) or stuff like what's optimal where.

And it's quite a bit simpler than what we have in the "modern" GPU APIs atm.

mkoubaa

25 days ago

I don't understand why you think this is ironic

jdashg

25 days ago

2 replies

And the GPU API cycle of life and death continues!

I was an only-half-joking champion of ditching vertex attrib bindings when we were drafting WebGPU and WGSL, because it's a really nice simplification, but it was felt that would be too much of a departure from existing APIs. (Spending too many of our "Innovation Tokens" on something that would cause dev friction in the beginning)

In WGSL we tried (for a while?) to build language features as "sugar" when we could. You don't have to guess what order or scope a `for` loop uses when we just spec how it desugars into a simpler, more explicit (but more verbose) core form/dialect of the language.

That said, this powerpoint-driven-development flex knocks this back a whole seriousness and earnestness tier and a half: > My prototype API fits in one screen: 150 lines of code. The blog post is titled “No Graphics API”. That’s obviously an impossible goal today, but we got close enough. WebGPU has a smaller feature set and features a ~2700 line API (Emscripten C header).

Try to zoom out on the API and fit those *160* lines on one screen! My browser gives up at 30%, and I am still only seeing 127. This is just dishonesty, and we do not need more of this kind of puffery in the world.

And yeah, it's shorter because it is a toy PoC, even if one I enjoyed seeing someone else's take on it. Among other things, the author pretty dishonestly elides the number of lines the enums would take up. (A texture/data format enum on one line? That's one whole additional Pinocchio right there!)

I took WebGPU.webidl and did a quick pass through removing some of the biggest misses of this API (queries, timers, device loss, errors in general, shader introspection, feature detection) and some of the irrelevant parts (anything touching canvas, external textures), and immediately got it down to 241 declarations.

This kind of dishonest puffery holds back an otherwise interesting article.

m-schuetz

25 days ago

3 replies

Man, how I wish WebGPU didn't go all-in on legacy Vulkan API model, and instead find a leaner approach to do the same thing. Even Vulkan stopped doing pointless boilerplate like bindings and pipelines. Ditching vertex attrib bindings and going for programmable vertex fetching would have been nice.

p_l

25 days ago

1 reply

My understanding is that pipelines in Vulkan still matter if you target certain GPUs though.

m-schuetz

25 days ago

2 replies

At some point, we need to let legacy hardware go. Also, WebGL did just fine without pipelines, despite being mapped to Vulkan and DirectX code under the hood. Meaning WebGPU could have also worked without pipelines just fine as well. The backends can then map to whatever they want, using modern code paths for modern GPUs.

p_l

25 days ago

1 reply

Quoting things I only heard about, because I don't do enough development in this area, but I recall reading that it impacted performance on pretty much every mobile chip (discounting Apple's because there you go through a completely different API and they got to design the hw together with API).

Among other things, that covers everything running on non-apple, non-nvidia ARM devices, including freshly bought.

p_l

23 days ago

1 reply

After going through a bunch of docs and making sure I had the right reference.

The "legacy" part of Vulkan that everyone on desktop is itching to drop (including popular tutorials) is renderpasses... which remain critical for performance on tiled GPUs where utilization of subpasses means major performance differences (also, major mobile GPUs have considerable differences in command submission which impact that as well)

m-schuetz

23 days ago

Also pipelines and bindings. BDA, shader objects and dynamic rendering are just way better than the legacy Vulkan without these features.

flohofwoe

24 days ago

1 reply

> Also, WebGL did just fine without pipelines, despite being mapped to Vulkan and DirectX code under the hood.

...at the cost of creating PSOs at random times which is an expensive operation :/

m-schuetz

24 days ago

2 replies

No longer an issue with dynamic rendering and shader objects. And never was an issue with OpenGL. That's an artificial problem that Vulkan imposed for no good reason, and which they reverted in recent years.

flohofwoe

24 days ago

1 reply

Going entirely back to the granular GL-style state soup would have significant 'usability problems'. It's too easy to accidentially leak incorrect state from a previous draw call.

IMHO a small number of immutable state objects is the best middle ground (similar to D3D11, but reshuffled like described in Seb's post).

m-schuetz

24 days ago

Not using static pipelines does not imply having to use a global state machine like OpenGL. You could also make an API that uses a struct for rasterizer configs and pass it as an argument to a multi draw call. I would have actually preferred that over all the individual setters in Vulkan's dynamic rendering approach. Sure, doing it that way might make each draw call a tiny bit slower, but the foal shouldnt be to make draw calls fast, it should be to make it easy to batch everything into a single draw call. (or very few)

MindSpunk

19 days ago

That's not at all what dynamic rendering is for. Dynamic rendering avoids creating render pass objects, and does nothing to solve problems with PSOs. We should be glad for the demise of render pass objects, they were truly a failed experiment and weren't even particularly effective at their original goal.

Trying to say pipelines weren't a problem with OpenGL is monumental levels of revisionism. Vulkan (and D3D12, and Metal) didn't invent them for no reason. OpenGL and DirectX drivers spent a substantial amount of effort to hide PSO compilation stutter, because they still had to compile shader bytecode to ISA all the same. They were often not successful and developers had very limited tools to work around the stutter problems.

Often older games would issue dummy draw calls to an off screen render target to force the driver to compile the shader in a loading screen instead of in the middle of your frame. The problem was always hard, you could just ignore it in the older APIs. Pipelines exist to make this explicit.

The mistake Vulkan made was putting too much state in the pipeline, as much of that state is dynamic in modern hardware now. As long as we need to compile shader bytecode to ISA we need some kind of state object to represent the compiled code and APIs to control when that is compiled.

CupricTea

24 days ago

1 reply

>Man, how I wish WebGPU didn't go all-in on legacy Vulkan API model

WebGPU doesn't talk to the GPU itself. It requires Vulkan/D3D/Metal underneath to actually implement itself.

>Even Vulkan stopped doing pointless boilerplate like bindings and pipelines.

Vulkan did no such thing. They added VK_KHR_dynamic_rendering and VK_EXT_shader_object to core, which are not required to be supported and must be queried for before using. The former gets rid of render pass objects and framebuffer objects in favor of vkCmdBeginRendering(), and WebGPU already abstracts those two away so you don't see or deal with them. The latter gets rid of monolithic pipeline objects.

Many mobile GPUs still do not support VK_KHR_dynamic_rendering or VK_EXT_shader_object. Even my very own Samsung Galaxy S24 Ultra[1] does not.

Vulkan did not get rid of pipeline objects, they added extensions for modern desktop GPUs that didn't need them. Even modern mobile GPUs still need them, and WebGPU isn't going to fragment their API to wall off mobile users.

[1] https://vulkan.gpuinfo.org/displayreport.php?id=44583

m-schuetz

24 days ago

> WebGPU doesn't talk to the GPU directly. It requires Vulkan/D3D/Metal underneath to actually implement itself.

So does WebGL and it's doing perfectly fine without pipelines. They were never necessary. Backends can implement via pipelines, or they can go for the modern route and ignore them.

They are an artificial problem that Vulkan created and WebGPU mistakenly adopted, and which are now being phased out. Some devices may refuse to implement pipeline-free drivers, which is okay. I will happily ignore them. Let's move on into the 21st century without that design mistake, and let legacy devices and companies that refuse to adapt die in dignity.

pjmlp

24 days ago

1 reply

My biggest issues with WebGPU are, yet another shading language, and after 15 years, browser developers don't care one second for debugging tools.

It is either pixel debugging, or trying to replicate in native code for proper tooling.

m-schuetz

24 days ago

Ironically, WebGPU was way more powerful about 5 years ago before WGSL was made mandatory. Back then you could just use any Spirv with all sorts of extensions, including stuff like 64bit types and atomics.

Then wgsl came and crippled WebGPU.

xyzsparetimexyz

25 days ago

Who cares about dev friction in the beginning? That was a bad choice.

wg0

25 days ago

4 replies

Very well written but I can't understand much of this article.

What would be one good primer to be able to comprehend all the design issues raised?

adrian17

25 days ago

IMO the minimum is to be able to read a “hello world / first triangle” example for any of the modern graphics APIs (OpenGL/WebGL doesn’t count, WebGPU does), and have a general understanding of each step performed (resource creation, pipeline setup, passing data to shaders, draws, synchronization).

Bonus points if you then look at CUDA “hello world” and consider that this can run on the same hardware (sans fixed function accelerators) with 100x less boilerplate.

jplusequalt

24 days ago

A working understanding of legacy graphics APIs, GPU hardware, and some knowledge of Vulkan/DirectX 12/CUDA.

I have all of that but DX12 knowledge, and 50% of this article still went over my head.

cmovq

25 days ago

https://fgiesen.wordpress.com/2011/07/09/a-trip-through-the-...

arduinomancer

25 days ago

To be honest there isn't really one, a lot of these concepts are advanced even for graphics programmers

modeless

25 days ago

1 reply

I don't understand this part:

> Meshlet has no clear 1:1 lane to vertex mapping, there’s no straightforward way to run a partial mesh shader wave for selected triangles. This is the main reason mobile GPU vendors haven’t been keen to adapt the desktop centric mesh shader API designed by Nvidia and AMD. Vertex shaders are still important for mobile.

I get that there's no mapping from vertex/triangle to tile until after the mesh shader runs. But even with vertex shaders there's also no mapping from vertex/triangle to tile until after the vertex shader runs. The binning of triangles to tiles has to happen after the vertex/mesh shader stage. So I don't understand why mesh shaders wouldn't work for mobile TBDR.

I guess this is suggesting that TBDR implementations split the vertex shader into two parts, one that runs before binning and only calculates positions, and one that runs after and computes everything else. I guess this could be done but it sounds crazy to me, with lots of duplicated work.

yuriks

25 days ago

1 reply

I thought that the implication was that the shader compiler produces a second shader from the same source that went through a dead code elimination pass which maintains only the code necessary to calculate the position, ignoring other attributes.

modeless

25 days ago

2 replies

Sure, but that only goes so far, especially when users aren't writing their shaders with knowledge that this transform is going to be applied or any tools to verify that it's able to eliminate anything.

hrydgard

24 days ago

Well, it is what is done on several tiler architectures, and it generally works just fine. Normally your computations of the position aren't really intertwined with the computation of the other outputs, so dead code elimination does a good job.

kasool

25 days ago

Why would it be difficult? There are explicit shader semantics to specify output position.

In fact, Qualcomm's documentation explicitly spells this out: https://docs.qualcomm.com/nav/home/overview.html?product=160...

starkparker

25 days ago

1 reply

> GPU hardware started to shift towards a generic SIMD design. SIMD units were now executing all the different shader types: vertex, pixel, geometry, hull, domain and compute. Today the framework has 16 different shader entry points. This adds a lot of API surface and makes composition difficult. As a result GLSL and HLSL still don’t have a flourishing library ecosystem ... despite 20 years of existence

A lot of this post went over my head, but I've struggled enough with GLSL for this to be triggering. Learning gets brutal for the lack of middle ground between reinventing every shader every time and using an engine that abstracts shaders from the render pipeline. A lot of open-source projects that use shaders are either allergic to documenting them or are proud of how obtuse the code is. Shadertoy is about as good as it gets, and that's not a compliment.

The only way I learned anything about shaders was from someone who already knew them well. They learned what they knew by spending a solid 7-8 years of their teenage/young adult years doing nearly nothing but GPU programming. There's probably something in between that doesn't involve giving up and using node-based tools, but in a couple decades of trying and failing to grasp it I've never found it.

canyp

25 days ago

This page is a good place to start for shader programming:

https://lettier.github.io/3d-game-shaders-for-beginners/inde...

I agree on the other points. GPU graphics programming is hard in large part because of terrible or lack of documentation.

overgard

25 days ago

2 replies

I'm kind of curious about something.. most of my graphics experience has been OpenGL or WebGL (tiny bit of Vulkan) or big engines like Unreal or Unity. I've noticed over the years the uptake of DX12 always seemed marginal though (a lot of things stayed on D3D11 for a really long time). Is Direct3D 12 super awful to work with or something? I know it requires more resource management than 11, but so does Vulkan which doesn't seem to have the same issue..

canyp

25 days ago

1 reply

Most AAA titles are on DX12 now. ID is on Vulkan. E-sports titles remain largely on the DX11 camp.

What the modern APIs give you is less CPU driver overhead and new functionality like ray tracing. If you're not CPU-bound to begin with and don't need those new features, then there's not much of a reason to switch. The modern APIs require way more management than the prior ones; memory management, CPU-GPU synchronization, avoiding resource hazards, etc.

Also, many of those AAA games are also moving to UE5, which is basically DX12 under the hood (presumably it should have a Vulkan backend too, but I don't see it used much?)

kasool

25 days ago

UE5 has a fairly mature Vulkan backend but as you might guess is second class to DX12.

flohofwoe

24 days ago

> but so does Vulkan which doesn't seem to have the same issue

Vulkan has the same issues (and more) as D3D12, you just don't hear much about it because there are hardly any games built directly on top of Vulkan. Vulkan is mainly useful as Proton backend on Linux.

qingcharles

25 days ago

1 reply

I started my career writing software 3D renderers before switching to Direct3D in the later 90s. What I wonder is if all of this is going to just get completely washed away and made totally redundant by the incoming flood of hallucinated game rendering?

Will it be possible to hallucinate the frame of a game at a similar speed to rendering it with a mesh and textures?

We're already seeing the hybrid version of this where you render a lower res mesh and hallucinate the upscaled, more detailed, more realistic looking skin over the top.

I wouldn't want to be in the game engine business right now :/

jsheard

25 days ago

You can't really do a whole lot of inference in 16 milliseconds on consumer hardware...

blakepelton

25 days ago

Great post, it brings back a lot of memories. Two additional factors that designers of these APIs consider are:

* GPU virtualization (e.g., the D3D residency APIs), to allow many applications to share GPU resources (e.g., HBM).

* Undefined behavior: how easy is it for applications to accidentally or intentionally take a dependency on undefined behavior? This can make it harder to translate this new API to an even newer API in the future.

xyzsparetimexyz

25 days ago

This needs an index and introduction. It's also not super interesting to people in industry? Like yeah, it'd be nice if bindless textures were part of the API so you didn't need to create that global descriptor set. It'd be nice if you just sample from pointers to textures similar to how dereferencing buffer pointers works.

greggman65

25 days ago

This seems tangentially related?

https://github.com/google/toucan

klaussilveira

25 days ago

NVIDIA's NVRHI has been my favorite abstraction layer over the complexity that modern APIs bring.

In particular, this fork: https://github.com/RobertBeckebans/nvrhi which adds some niceties and quality of life improvements.

18 more comments available on Hacker News

View full discussion on Hacker News

ID: 46293062Type: storyLast synced: 12/19/2025, 7:15:38 PM

Want the full context?