The Von Neumann Bottleneck Is Impeding AI Computing?
Key topics
The article discusses how the von Neumann architecture is limiting AI computing performance, but commenters are skeptical about the novelty of the issue and the proposed solutions, with some pointing out that IBM is promoting their own research and products.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
24m
Peak period
15
0-2h
Avg / period
4
Based on 32 loaded comments
Key moments
- 01Story posted
Sep 26, 2025 at 5:12 PM EDT
3 months ago
Step 01 - 02First comment
Sep 26, 2025 at 5:36 PM EDT
24m after posting
Step 02 - 03Peak activity
15 comments in 0-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 27, 2025 at 5:19 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
ARM processors primarily use a modified Harvard architecture, including the raspberry pi pico.
I think this post is more about... compute in memory? if I got it right?
This is, IIRC, part of why Apple's M-series chips are as performant as they are: they not only have a unified memory architecture which eliminates the need to copy data from CPU main memory to GPU or NPU main memory to operate on it (and then copy the result back) but the RAM being on the package means that it's slightly "more local" and the memory channels can be optimized for the system they're going to be connected to.
He is absolutely one of IBM's historical rockstars. IMHO they are invoking him to sell their NorthPole chips which have on-die memory distributed between the processing components and probably has value.
> In its simplest form a von Neumann computer has three parts: a central processing unit (or CPU), a store, and a connecting tube that can transmit a single word between the CPU and the store (and send an address to the store). I propose to call this tube the yon Neumann bottleneck. The task of a program is to change the contents of the store in some major way; when one considers that this task must be accomplished entirely by pumping single words back and forth through the von Neumann bottleneck, the reason for its name becomes clear.
IMHO IBM is invoking John Backus' work to sell what may be an absolutely great product but are really just ASICs and don't relate to his machine or programming language limits.
[0] https://dl.acm.org/doi/pdf/10.1145/359576.359579
Edit: see also ARM7TDMI, Cortex-m0/0+/1, and probably a few others. All the big stuff is modified Harvard or very rarely pure Harvard.
That said AVH-lite is called lite because it is a simplified form of the arm norm.
The RP2350 can issue one fetch and one load/store per cycle, and that is that almost everything called a CPU and not a MCU will have ABH5 or better.
The “von Neumann bottleneck” was (when I went to school) that the CPU cannot simultaneously fetch an instruction and read/write data from or to memory.
That doesn’t apply to smartphones, PCs or servers even in the intel world due to instruction caches etc…
It is just old man yells at clouds
> The RP2350 can issue one fetch and one load/store per cycle, and that is that almost everything called a CPU and not a MCU will have ABH5 or better.
I mean, yes, but I'm not sure I see your point. The Harvard vs Von Neumann architectural difference is more related to the number of AHB ports on the core.
> That doesn’t apply to smartphones, PCs or servers even in the intel world due to instruction caches etc…
I wouldn't confuse instruction caches with Harvard vs Von Neumann either - loads of Von Neumann machines have instruction or Flash caches too.
It's also not uncommon to run into Von Neumann cores in mobile and PC chips, just as peripheral co-processors.
It is just middle aged guy who did this stuff for years...
The "Von Neumann architecture" is the more basic idea that all the computation state outside the processor exists as a linear range of memory addresses which can be accessed randomly.
And the (largely correct) argument in the linked article is that ML computation is a poor fit for Von Neumann machines, as all the work needed to present that unified picture of memory to all the individual devices is largely wasted since (1) very little computation is actually done on individual fetches and (2) the connections between all the neurons are highly structured in practice (specific tensor rows and columns always go to the same places), so a simpler architecture might be a better use of die space.
[1] Not actually unified, because there's a page translation, IO-MMUs, fabric mappings and security boundaries all over the place that prevents different pieces of hardware from actually seeing the same memory. But that's the idea anyway.
Faster interconnects are always nice, but this is more like routine improvement.
It's also fascinating that they are experimenting with analog memory because it pairs so well with model weights
A bit beautiful that we might end up partially going back to analog computers, which were quickly replaced by digital ones.
How long till we get a Ben Eater-style video about someone making a basic analog neural network using some DACs, analog multipliers[1] and bucket-brigade chips[2] for intermediate values?
[1]: https://www.analog.com/media/en/training-seminars/tutorials/...
[2]: https://en.wikipedia.org/wiki/Bucket-brigade_device
This is being done, with great results so far. As models get better, architecture search and creation and refinment improves, driving a reinforcement loop. At some point in the near future the big labs will likely start seeing significant returns from methods like this, translating into better and faster AI for consumers.
https://www.science.org/doi/full/10.1126/science.adh1174
Also they've been working on this for 10+ years so it's not exactly new news.
Maybe they're hoping someone else does it.. and then pays IBM for using whatever patents they have on it.
Which is fine! I am all for iterative improvements, it’s how we got to where we are today. I just wish more folks would start openly admitting that our current architecture designs are broadly based off “low hanging fruit” of early electronics and microprocessors, followed by a century of iterative improvements. With the easy improvements already done and universally integrated, we’re stuck at a crossroads:
* Improve our existing technologies iteratively and hope we break through some barrier to achieve rapid scaling again
OR
* Accept that we cannot achieve new civilizational uplifts with existing technologies, and invest more capital into frontier R&D (quantum processing, new compute substrates, etc)
I feel like our current addiction to the AI CAPEX bubble is a desperate Hail Mary to validate our current tech as the only way forward, when in fact we haven’t really sufficiently explored alternatives in the modern era. I could very well be wrong, but that’s the read I get from the hardware side of things and watching us backslide into the 90s era of custom chips to achieve basic efficiency gains again.
But you're right, I think it's not even grammarly correct.
Anyway, I'd like always to remember this about headlines as a question: https://en.wikipedia.org/wiki/Betteridge's_law_of_headlines