Using the Tpde Codegen Back End in Llvm Orc
Posted3 months agoActive3 months ago
weliveindetail.github.ioTechstory
calmpositive
Debate
10/100
LlvmCompiler DesignCode Generation
Key topics
Llvm
Compiler Design
Code Generation
The article discusses using the TPDE codegen back end in LLVM ORC, a JIT compiler infrastructure, and the discussion revolves around the technical details and potential applications of this integration.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
5h
Peak period
1
5-6h
Avg / period
1
Key moments
- 01Story posted
Sep 30, 2025 at 6:51 AM EDT
3 months ago
Step 01 - 02First comment
Sep 30, 2025 at 12:11 PM EDT
5h after posting
Step 02 - 03Peak activity
1 comments in 5-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 30, 2025 at 3:14 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45423994Type: storyLast synced: 11/20/2025, 5:11:42 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
The benchmark is suboptimal in multiple ways:
- Multi-threading makes things just slower. When enabling multi-threading, LLJIT clones every module into a new context before compilation, which is much more expensive than compilation. There's also no way to disable this. This causes a ~1.5x (LLVM)/~6.5x (TPDE) slowdown (very rough measurement on my laptop).
- The benchmark compares against the optimizing LLVM back-end, not the unoptimizing back-end (which would be a fairer comparison) (Code: JTMB.setCodeGenOptLevel(CodeGenOptLevel::None);). Additionally, enabling FastISel helps (command line -fast-isel; setting the TargetOption EnableFastISel seems to have no effect). This gives LLVM a 1.6x speedup.
- The benchmark is not really representative, as it causes FastISel fallbacks to SelectionDAG in some very large basic blocks -- i24 occurs rather rarely in real-world code. This is the reason why the speedup from the unoptimizing LLVM back-end is so low. Replacing i24 with i16 gives LLVM another 2.2x speedup. (Hint: to get information on FastISel fallbacks, enable FastISel and pass the command line options "-fast-isel-report-on-fallback -pass-remarks-missed=sdagisel" to LLVM. This is really valuable when optimizing for compile times.)
So we get ~140ms (TPDE), ~730ms (LLVM -O0), or 5.2x improvement. This is nowhere near the 10-20x speedup that TPDE typically achieves. Why? The new bottleneck is JITLink, which is featureful but slow -- profiling indicates that it consumes ~55% of the TPDE "compile time" (so the net compile time speedup is ~10x). TPDE therefore ships its own JIT mapper, which has fewer features but is much faster.
LLVM is really powerful, and despite being not particularly fast, the JIT API makes it extremely difficult to make it not extra-slow, even for LLVM experts.
[1]: https://github.com/tpde2/tpde/commit/29bcf1841c572fcdc75dd61...
Comparing TPDE against the default optimization level in ORC is not fair (because that is -O2 indeed), but that's what we get off-the-shelf. I tested the explicit FastISel setting and it didn't help on the LLVM side, as you said. I didn't try the command-line option though, thanks for the tip! (Especially the -pass-remarks-missed will be useful.)
And yeah, csmith doesn't really generate representative code, but again that was not stated either. I didn't dive into JITLink as it would be a whole post on its own, but yes feature-completeness prevailed over performance here as well -- seems characteristic for LLVM and isn't soo surprising :)
Last but not least, yes multi-threading isn't working as good as the post indicates. This seems related to the fix that JuliaLang did for the TaskDispatcher [1]. I will correct this in the post and see which other points can be addressed in the repo.
Looking forward for your OrcCompileLayer in TPDE!
[1] https://github.com/JuliaLang/julia/pull/58950
Would love to hear your perspective on cranelift too.