Using the Tpde Codegen Back End in Llvm Orc

Posted3 months agoActive3 months ago

weliveindetail

24 points

3 comments

weliveindetail.github.ioTechstory

calmpositive

Debate

10/100

LlvmCompiler DesignCode Generation

Key topics

Llvm

Compiler Design

Code Generation

The article discusses using the TPDE codegen back end in LLVM ORC, a JIT compiler infrastructure, and the discussion revolves around the technical details and potential applications of this integration.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

Peak period

5-6h

Avg / period

Key moments

01Story posted
Sep 30, 2025 at 6:51 AM EDT
3 months ago
Step 01
02First comment
Sep 30, 2025 at 12:11 PM EDT
5h after posting
Step 02
03Peak activity
1 comments in 5-6h
Hottest window of the conversation
Step 03
04Latest activity
Sep 30, 2025 at 3:14 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (3 comments)

Showing 3 comments

aengelke

3 months ago

1 reply

TPDE co-author here. Nice work, this was easier than expected; so we'll have better upstream ORC support soon [1].

The benchmark is suboptimal in multiple ways:

- Multi-threading makes things just slower. When enabling multi-threading, LLJIT clones every module into a new context before compilation, which is much more expensive than compilation. There's also no way to disable this. This causes a ~1.5x (LLVM)/~6.5x (TPDE) slowdown (very rough measurement on my laptop).

- The benchmark compares against the optimizing LLVM back-end, not the unoptimizing back-end (which would be a fairer comparison) (Code: JTMB.setCodeGenOptLevel(CodeGenOptLevel::None);). Additionally, enabling FastISel helps (command line -fast-isel; setting the TargetOption EnableFastISel seems to have no effect). This gives LLVM a 1.6x speedup.

- The benchmark is not really representative, as it causes FastISel fallbacks to SelectionDAG in some very large basic blocks -- i24 occurs rather rarely in real-world code. This is the reason why the speedup from the unoptimizing LLVM back-end is so low. Replacing i24 with i16 gives LLVM another 2.2x speedup. (Hint: to get information on FastISel fallbacks, enable FastISel and pass the command line options "-fast-isel-report-on-fallback -pass-remarks-missed=sdagisel" to LLVM. This is really valuable when optimizing for compile times.)

So we get ~140ms (TPDE), ~730ms (LLVM -O0), or 5.2x improvement. This is nowhere near the 10-20x speedup that TPDE typically achieves. Why? The new bottleneck is JITLink, which is featureful but slow -- profiling indicates that it consumes ~55% of the TPDE "compile time" (so the net compile time speedup is ~10x). TPDE therefore ships its own JIT mapper, which has fewer features but is much faster.

LLVM is really powerful, and despite being not particularly fast, the JIT API makes it extremely difficult to make it not extra-slow, even for LLVM experts.

[1]: https://github.com/tpde2/tpde/commit/29bcf1841c572fcdc75dd61...

weliveindetailAuthor

3 months ago

Please note that the post didn't mention the word benchmark a single time ;) It does a "basic performance measurement" of "our csmith example". Anyway, thanks for your notes, they are very welcome and valid.

Comparing TPDE against the default optimization level in ORC is not fair (because that is -O2 indeed), but that's what we get off-the-shelf. I tested the explicit FastISel setting and it didn't help on the LLVM side, as you said. I didn't try the command-line option though, thanks for the tip! (Especially the -pass-remarks-missed will be useful.)

And yeah, csmith doesn't really generate representative code, but again that was not stated either. I didn't dive into JITLink as it would be a whole post on its own, but yes feature-completeness prevailed over performance here as well -- seems characteristic for LLVM and isn't soo surprising :)

Last but not least, yes multi-threading isn't working as good as the post indicates. This seems related to the fix that JuliaLang did for the TaskDispatcher [1]. I will correct this in the post and see which other points can be addressed in the repo.

Looking forward for your OrcCompileLayer in TPDE!

[1] https://github.com/JuliaLang/julia/pull/58950

jasonjmcghee

3 months ago

It's really exciting to continue to see these developments.

Would love to hear your perspective on cranelift too.

View full discussion on Hacker News

ID: 45423994Type: storyLast synced: 11/20/2025, 5:11:42 PM

Want the full context?