Simd City: Auto-Vectorisation

Posted17 days agoActive10 days ago

brewmarche

56 points

14 comments

xania.orgTech Discussionstory

informativeneutral

Debate

20/100

Code MigrationCPU DesignAI Performance Analysis

Key topics

Code Migration

CPU Design

AI Performance Analysis

The unpredictable nature of auto-vectorization has sparked a lively debate, with some commenters praising the compiler's occasional "magical" abilities to deduce vectorization, while others lament its unreliability, citing significant performance hits when it fails to trigger. As one commenter noted, the issue is particularly pronounced with SIMD and floating-point numbers, a topic recently explored by Matt Godbolt. Meanwhile, others are looking to the future, speculating about the potential for large language models to revolutionize compiler optimization, with some even envisioning LLMs as disassemblers. The discussion highlights the complex trade-offs between performance, predictability, and semantic correctness in compiler design.

Snapshot generated from the HN discussion

Discussion Activity

Moderate engagement

First comment

Peak period

156-168h

Avg / period

9.5

Comment distribution19 data points

Loading chart...

Based on 19 loaded comments

Key moments

01Story posted
Dec 20, 2025 at 8:25 AM EST
17 days ago
Step 01
02First comment
Dec 27, 2025 at 12:03 AM EST
7d after posting
Step 02
03Peak activity
10 comments in 156-168h
Hottest window of the conversation
Step 03
04Latest activity
Dec 27, 2025 at 4:59 PM EST
10 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (14 comments)

Showing 19 comments

Scaevolus

11 days ago

4 replies

Auto-vectorization is consistently one of the least predictable optimization passes, which is rather awful, since when it doesn't trigger your functions are suddenly >3x slower. This drives people to more explicit SIMD coding, from direct assembly like in FFMPEG to wrappers providing some cross-platform support like Google's Highway.

It's just really hard to detect and exploit profitable and safe vectorization opportunities. The theory behind some of the optimizers is beautiful, though: https://en.wikipedia.org/wiki/Polytope_model

drob518

11 days ago

1 reply

I’m always shocked at what the compiler is able to deduce wrt vectorization. When it works, it’s magical.

dwattttt

10 days ago

1 reply

In the abstract, it's the inverse of the argument that "configuration formats should be programming languages"; the more general something can be, the less you can assume about it.

A way to express the operations you want, without unintentionally expressing operations you don't want, would be much easier to auto-vectorise. I'm not familiar enough with SIMD to give examples, but if a transformation would preserve the operations you want, but observably be different to what you coded, I assume it's not eligible (unless you enable flags that allow a compiler to perform optimisations that produce code that's not quite what you wrote).

Earw0rm

10 days ago

2 replies

That's very much an issue with SIMD, especially where floating point numbers are concerned.

Matt Godbolt wrote about it recently.

https://xania.org/202512/21-vectorising-floats

TLDR, math notation and language specify particular orders in which floating point operations happen, and precision limits of IEEE float representation mean those have to be honoured by default.

Allowing compilers to reorder things in breach of that contract is an option, but it comes with risks.

9029

10 days ago

1 reply

I like that Zig allows using relaxed floating point rules with per block granularity to reduce the risk of breaking something else where IEEE compliance does matter. I think OpenMP simd pragmas can be used similarly for C/C++, but that's non-standard.

galangalalgol

10 days ago

You can do the same thing with types or the wide crate. But it isn't always obvious when it will become a problem. Usung these types does make auto vectorization fairly reliable.

pklausler

10 days ago

Fortran requires compilers to “honor the integrity of parentheses” but otherwise doesn’t restrict compilers from rearranging expressions. Want a specific order of operations and rounding? Use parentheses to force them. This is why you’ll sometimes see parens around operations that already have arithmetic precedence, like “(xx)-(yy)”, to prevent the use of FMA.

webdevver

10 days ago

4 replies

i am quietly waiting for the "bitter lesson" to hit compilers: a large language model that speak in LLVM IR tokens that takes unoptimized IR from the frontend, and spits out an optimized version that works better than any "classical" compiler.

the only thing that might stand in the way is a dependence on reproducibility, but it seems like a weak argument: We already have a long history of people trying to push build reproducibility, and for better or worse they never got traction.

same story with LTO and PGO: I can't think of anyone other than browser and compiler people who are using either (and even they took a long time before they started using them). judged to be more effort than its worth i guess.

ultrahax

10 days ago

1 reply

Us video game folks are big fans of LTO, PGO, FDO, etc.

dazzawazza

10 days ago

1 reply

Indeed we are. I wish we interacted with the other industries more. There is a lot to learn from video game development where we are driven by soft real-time constraints.

Alas the standards committee is always asking for people like us to join but few of our billion dollar companies will pony up any money. This is despite many of them having custom forks of clang that they maintain.

mgaunard

10 days ago

There is a low-latency study group at the C++ standards committee, but most of the proposals coming from there where new libraries of limited value to the standard at large.

There is a large presence from the trading industry, less from gaming but you still see a lot of those guys.

gnufx

10 days ago

[delayed]

robertknight

10 days ago

The major constraint is that the compiler needs to guarantee that transformations produce semantically identical results to the unoptimized code, with the exception of undefined behavior or specific opt-outs (eg. `-ffast-math` rules).

An ML model can fit into existing compiler pipelines anywhere that heuristics are used though, as an alternative to PGO.

Earw0rm

10 days ago

How's it going in the other direction - LLMs as disassemblers?

I tried it a year or so back and was sorta disappointed at the results beyond simple cases, but it feels like an area that could improve rapidly.

vkazanov

10 days ago

1 reply

It seems that proper vectorization requires a different kind of language, something similar to cuda and the like, not a general putpose scalar kind of language.

I remember intel had something like it but it went nowhere.

astrange

10 days ago

1 reply

That is ispc.

You don't want "vectorization" though, you either want

a) a code generation tool that generates exactly the platform-specific code you want and can't silently fail.

b) at least a fundamentally vectorized language that does "scalarization" instead of the other way round.

gnufx

10 days ago

Fortran calling...

gnufx