Make the Most of Compiled C Loops on the 68000
Posted3 months agoActive3 months ago
dciabrin.netTechstory
calmpositive
Debate
40/100
68000 ProcessorC ProgrammingAssembly Optimization
Key topics
68000 Processor
C Programming
Assembly Optimization
The article discusses optimizing compiled C loops on the 68000 processor, with commenters sharing their insights on further optimizations and the trade-offs between C and assembly programming.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
9h
Peak period
14
84-96h
Avg / period
3.8
Comment distribution19 data points
Loading chart...
Based on 19 loaded comments
Key moments
- 01Story posted
Sep 28, 2025 at 2:55 PM EDT
3 months ago
Step 01 - 02First comment
Sep 28, 2025 at 11:44 PM EDT
9h after posting
Step 02 - 03Peak activity
14 comments in 84-96h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 3, 2025 at 4:10 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45406879Type: storyLast synced: 11/20/2025, 1:20:52 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Arguably every hardware register should be declared that way as a symbol
Not sure if Borland or MS shipped big fat symbol tables for all hardware registers of an IBM PC though?
Specifically, I heard that the 68k backend keeps getting worse, whilst the front-end keeps getting better. So choosing a GCC version is a case of examining the tradeoffs between getting better AST-level optimisations from a newer version, or more optimised assembly language output from an earlier version.
I imagine GCC 6.5 probably has a backend that makes better use of the 68k chip than the GCC 11.4 that ngdevkit uses (such as knowing when to use dbra) but is probably worse in other ways due to an older and less capable frontend.
When playing King of Fighters, the time counter would go down to 0 and then wrap around to 99, effectively preventing the round from ending.
Eventually I tracked it down to the behavior of SBCD (Subtract Binary Coded Decimal): Internally, the chip actually does update the overflow flag reliably (it's marked as undefined in the docs). SNK was checking the V flag and ending the round when it got set.
https://github.com/kstenerud/Musashi/blob/master/m68k_in.c#L...
SBCD was an old throwback instruction that was hardly used anymore, and the register variant took 6 cycles to complete (vs 4 for binary subtraction).
HOWEVER... For displaying the timer counter on-screen, they saved a ton of cycles with this scheme because extracting the digits from a BCD value is a simple shift by 4 bits (6 cycles) rather than a VERY expensive divide (140 cycles).
A software implementation with masks and shifts will beat traditional CISC dividers.
;-)
This is actually required rather than an optimisation for any C compiler, from early on, as C semantically allows constant expressions rather than just constants to be used for statically allocated sizes, etc. While the 'optimisation' is not guaranteed you'll see even on -O0 the constant was evaluated at compile-time, as it's harder to not fold constant expressions sometimes than it is to just always fold them for the already required constant expression features.