Athlon 64: How Amd Turned the Tables on Intel
Posted4 months agoActive3 months ago
dfarq.homeip.netTechstoryHigh profile
calmpositive
Debate
60/100
CPU ArchitectureAmd vs IntelX86-64
Key topics
CPU Architecture
Amd vs Intel
X86-64
The article discusses how AMD's Athlon 64 processor turned the tables on Intel, and the discussion revolves around the historical context and technical details of this development.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
30m
Peak period
103
0-12h
Avg / period
26.7
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 25, 2025 at 2:09 PM EDT
4 months ago
Step 01 - 02First comment
Sep 25, 2025 at 2:40 PM EDT
30m after posting
Step 02 - 03Peak activity
103 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 29, 2025 at 11:43 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45376605Type: storyLast synced: 11/20/2025, 8:23:06 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
The last one to run Windows XP.
Makes me want to play need for speed underground and drink some bawls energy
The key to the whole thing was that it was a great 32 bit processor; the 64 bit stuff was gravy for many, later.
Apple did something similar with its CPU changes - now three - they only swap when the old software runs better on the new chip even if emulated than it did on the old.
AMD64 was also well thought out; it wasn't just a simple "have two more bytes" slapped on 32 bit. Doubling the number of general purpose registers was noticeable - you took a performance hit going to 64 bit early on because all the memory addresses were wider, but the extra registers usually more than made up for it.
This is also where the NX bit entered.
It required immense multi-year efforts from compiler teams to get passable performance with Itanium. And passable wasn't good enough.
https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...
Intel first publicly mentioned Poulson all the way back in 2005 just FOUR years after the original chip was launched. Poulson was basically a traditional out-of-order CPU core that even had hyperthreading[0]. They knew really early on that the designs just weren't that good. This shouldn't have been a surprise to Intel as they'd already made a VLIW CPU in the 90s (i860) that failed spectacularly.
[0]https://www.realworldtech.com/poulson/
It wasn't a bad chip, but like Cell or modern Dojo tiles most people couldn't run it without understanding parallelism and core metastability.
amd64 wasn't initially perfect either, but was accessible for mere mortals. =3
I.e., the compiler had no access to information that's only revealed at runtime?
https://www.theregister.com/2004/01/27/have_a_reality_check_...
SPECjbb2000 (an important enterprise server benchmark): Itanic holds a slim (under 3%) lead over AMD64 at the 4-processor node size and another slim (under 4%) lead over POWER4+ at the 32-processor node size - hardly 'destroying' the competition, once again.
It was slightly faster than contemporary high-performance processors on Java. It was also really good at floating point performance. It was also significantly more expensive than AMD64 for server applications if you could scale your servers horizontally instead of vertically.
We have come a long way from that to arm64 and amd64 as the default.
ARM is certainly better than before, but could have been much better. =3
I have no idea how/why Intel got a second life after that, but they did. Which is a shame. A sane market would have punished them and we all would have moved on.
For the same reason the line "No one ever got fired for buying IBM." exists. Buying AMD at large companies was seen as a gamble that deciders weren't will to make. Even now, if you just call up your account managers at Dell, HP, or Lenovo asking for servers or PCs, they are going to quote you Intel builds unless you specifically ask. I don't think I've ever been asked by my sales reps if I wanted an Intel or AMD CPU. Just how many slots/cores, etc.
Cray tried to build the T3E (iirc) out of Alphas. DEC bragged how good Alpha was for parallel computing, big memory etc etc.
But Cray publicly denounced Alpha as unusable for parallel processing (the T3E was a bunch of Alphas in some kind of NUMA shared memory.) It was so difficult to make the chips work together.
This was in the Cray Connect or some such glossy publication. Wish I'd kept a copy.
Plus of course the usual DEC marketing incompetence. They feared Alpha undoing their large expensive machine momentum. Small workstation boxes significantly faster than big iron.
A decade or so later on, they more or less recreated the architecture but this time with 64-bit Opteron CPU's in the form of the 'Red Storm' supercomputer for Sandia. Which then became commercially available as the XT3. And later XT4/5/6.
Imagine a future where Intel and Apple both adopt DEC and Alpha instead of Intel HP and Apple IBM.
The arm twisting gets them through rough times like itanium and pentium4 + rambus, etc. I still think they can recover from the 10nm fab problems, even though they're taking their sweet time.
* Itanium has register windows.
* Itanium has register rotations, so that you can modulo-schedule a loop.
* Itanium has so many registers that a context switch is going to involve spilling several KB of memory.
* The main registers have "Not-a-Thing" values to be able to handle things like speculative loads that would have trapped. Handling this for register spills (or context switches!) appears to be "fun."
* It's a bi-endian architecture.
* The way you pack instructions in the EPIC encoding is... fun.
* The rules of how you can execute instructions mean that you kind of have branch delay slots, but not really.
* There are four floating-point environments because why not.
* Also, Itanium is predicated.
* The hints, oh god the hints. It feels like every time someone came up with an idea for a hint that might be useful to the processor, it was thrown in there. How is a compiler supposed to be able to generate all of these hints?
* It's an architecture that's complicated enough that you need to handwrite assembly to get good performance, but the assembly has enough arcane rules that handwriting assembly is unnecessarily difficult.
You would boot in x86 mode and run some code to switch to ia64 mode.
HP saw the end of the road for their solo efforts on PA-RISC and Intel eyed the higher end market against SPARC, MIPS, POWER, and Alpha (hehe. all those caps) so they banded together to tackle the higher end.
But as AMD proved, you could win by scaling up instead of dropping an all-new architecture.
* worked at HP during the HP-Intel Highly Confidential project.
Basically, you could write some tuned assembly that would run fast on one specific Itanium CPU release by optimizing for its exact number of execution units, etc. It was not possible to run `./configure && make && make install` for anything not designed with that level of care and end up with a binary that didn't run like frozen molasses.
I had to manage one of these pigs in a build farm. On paper, it should've been one of the more powerful servers we owned. In practice, the Athlon servers were several times faster at any general purpose workloads.
It should have been iterated on a bit before it was released to the world, but Intel was stressed by there being several 64-bit RISC-processors on the market already.
Itanium never met an exotic computer architecture journal article that it didn't try and incorporate. Initially this was viewed as "wow such amazing VLIW magic will obviously dominate" and subsequently as "this complexity makes it hard to write a good compiler for, and the performance benefit just doesn't justify it."
Intel had to respond to AMD with their "x86-64" copy, though it really didn't want to.
Eventually it became obvious that the amd64/x64/x86-64 chips were going to exceed Itanium in performance, and with the massive momentum of legacy on its side and Itanium was toast.
It's amazing that retirement units, the part of a superscalar CPU that puts everything back together as the parallel operations finish, not only work but don't slow things down. The Pentium Pro head designer had about 3,000 engineers working at peak, which indicates how hard this is. But it all worked, and that became the architecture of the future.
This was around the time that RISC was a big thing. Simplify the CPU, let the compiler do the heavy lifting, have lots of registers, make all instructions the same size, and do one instruction per clock. That's pure RISC. Sun's SPARC is an expression of that approach. (So is a CRAY-1, which is a large but simple supercomputer with 64 of everything.) RISC, or something like it, seemed the way to go faster. Hence Itanium. Plus, it had lots of new patented technology, so Intel could finally avoid being cloned.
Superscalars can get more than one instruction per clock, at the cost of insane CPU complexity. Superscalar RISC machines are possible, but they lose the simplicity of RISC. Making all instructions the same size increases the memory bandwidth the CPU needs. That's where RISC lost out over x86 extensions. x86 is a terse notation.
So we ended up with most of the world still running on an instruction set based on the one Harry Pyle designed when he was an undergrad at Case in 1969.
DEC (Compaq?) had some plans to make cheaper Alpha workstations, and while they managed to drive down the price somewhat, the volumes were never there to make them price-competitive with PC's. (See also the Talos Raptor POWER machines..)
Then came Compaq and its love for intel.
> Intel’s Pentium 4 had our own internal version of x86–64. But you could not use it: we were forced to “fuse it off”, meaning that even though the functionality was in there, it could not be exercised by a user. This was a marketing decision by Intel — they believed, probably rightly, that bringing out a new 64-bit feature in the x86 would be perceived as betting against their own native-64-bit Itanium, and might well severely damage Itanium’s chances. I was told, not once, but twice, that if I “didn’t stop yammering about the need to go 64-bits in x86 I’d be fired on the spot” and was directly ordered to take out that 64-bit stuff.
https://www.quora.com/How-was-AMD-able-to-beat-Intel-in-deli...
Intel has a strong history of completely mis-reading the market.
Quote: Business success contains the seeds of its own destruction. Success breeds complacency. Complacency breeds failure. Only the paranoid survive.
- Andy Grove, former CEO of Intel
From wikipedia: https://en.wikipedia.org/wiki/Andrew_Grove#Only_the_Paranoid...
Takeaway: Be paranoid about MBAs running your business.
Except Andy is talking about himself, and Noyce the engineers getting it wrong: (watch a few minutes of this to get the gist of where they were vs Japan) https://www.youtube.com/watch?v=At3256ASxlA&t=465s
Intel has a long history of sucking, and other people stepping in to force them to get better. Their success has been accident and intervention over and over.
And this isnt just an intel thing, this is kind of an American problem (and maybe a business/capitalism problem). See this take on steel: https://www.construction-physics.com/p/no-inventions-no-inno... that sounds an awful lot like what is happening to intel now.
If one can take popular histories of Intel at face value, they have had enough accidental successes, avoided enough failures, and outright failed so many times that they really ought to know better.
The Itanium wasn't their first attempt to create an incompatible architecture, and it sounds like it was incredibly successful compared to the iAPX 432. Intel never intended to get into microprocessors, wanting to focus on memory instead. Yet they picked up a couple of contracts (which produced the 4004 and 8008) to survive until they reached their actual goal. Not only did it help the company at the time, but it proved essential to the survival of the company when the Japanese semiconductor industry nearly obliterated American memory manufacturers. On the flip side, the 8080 was source compatible with the 8008. Source compatibility would help sell it to users of the 8008. It sounds like the story behind the 8086 is similar, albeit with a twist: not only did it lead to Intel's success when it was adopted by IBM for the PC, but it was intended as a stopgap measure while the iAPX 432 was produced.
This, of course, is a much abbreviated list. It is also impossible to suggest where Intel would be if they made different decisions, since they produced an abundance of other products. We simply don't hear much about them because they were dwarfed by the 80x86 or simply didn't have the public profile of the 80x86 (for example: they produced some popular microcontrollers).
Of course, the whole foundational thesis of market competition is that everything sucks unless forced by competitors to make your product better. That's why its VERY important to have effective competition.
It's not a capitalism problem, or really a "problem" at all. It's a recognition of a fact in nature that all animals are as lazy as they can get away with, and humans (and businesses made by humans) are no different.
My point isn't to take a side, but simply to highlight how history often repeats itself, sometimes almost literally, not rhyme.
Cancer is when elements of a system work to enrich themselves instead of the system.
With poor market demand and AMD's success with amd64, Microsoft did not support itanium in vista and later desktop versions which signaled the end of Intel's Itanium.
Also, for a long while, Intel rebranded the Pentium 4 as Intel Atom, which then usually got an iGPU on top with being a bit higher in clock rates. No idea if this is still the case (post Haswell changes) but I was astonished to buy a CPU 10 years later to have the same kind of oldskool cores in it, just with some modifications, and actually with worse L3 cache than the Centrino variants.
core2duo and core2quad were peak coreboot hacking for me, because at the time the intel ucode blob was still fairly simple and didn't contain all the quirks and errata fixes that more modern cpu generations have.
[1] https://en.wikipedia.org/wiki/Physical_Address_Extension
Rest is well explained by sibling posts :)
Possibly you meant Celeron?
Also the Pentium 4 uarch (Netburst) is nothing like any of the Atoms (big for the time out-of-order core vs. a small in-order core).
While I suspect the Intel equivalent would do similar things, simply from being a big enough break it's an obvious thing to do, there's no guarantee it wouldn't be worse than AMD64. But I guess it could also be "better" from a retrospective perspective.
And also remember at the time the Pentium 4 was very much struggling to get the advertised performance. One could argue that one of the major reasons that the AMD64 ISA took off is that the devices that first supported it were (generally) superior even in 32-bit mode.
EDIT: And I'm surprised it got as far as silicon. AMD64 was "announced" and the spec released before the pentium 4 was even released, over 3 years before the first AMD implementations could be purchased. I guess Intel thought they didn't "need" to be public about it? And the AMD64 extensions cost a rather non-trivial amount of silicon and engineering effort to implement - did the plan for Itanium change late enough in the P4 design that it couldn't be removed? Or perhaps this all implies it was a much less far-reaching (And so less costly) design?
I understand that r8-r15 require a REX prefix, which is hostile to code density.
I've never done it with -O2. Maybe that would surprise me.
If you mean literally `gcc -S`, -O0 is worse than not optimized and basically keeps everything in memory to make it easier to debug. -Os is the one with readable sensible asm.
But it's guaranteed to use `r8` and `r9` for for a function that takes 5 and 6 integer arguments (including unpacked 128-bit structs as 2 arguments), or 3 and 4 arguments (not sure about unpacking) for Microsoft. And `r10` is used if you make a system call on Linux.
Lots of people loved Itanium and wanted to see it succeed. But surely the business folks had their own ideas too.
Without AMD64, I firmly believe eventually Itanium would have been the new world no matter what.
We see this all the time, technology that could be great but fails due to not being pushed hard enough, and other similar technology that does indeed succeed because the creators are willing push it at a loss during several years until it finally becomes the new way.
( I might have forgotten)
Statically scheduled/in order stuff is still relegated to pretty much microcontroller, or specific numeric workloads. For general computation, it still seems like a poor fit.
This precludes any VLIW from having multiple differently constrained implementations. You cannot segment VLIW implementations you can do with as x86, ARM, MIPS, PowerPC, etc, where same code will be executed as optimal as possible on the concrete implementation of ISA.
So - no, Itanium (or any other VLIW for that matter) would not be the new world.
It was on IA-64, the bundle format was deliberately chosen to allow for easy extension.
But broadly it's true: you can't have a "pure" VLIW architecture independent of the issue and pipeline architecture of the CPU. Any device with differing runtime architecture is going to have to do some cooking of the instructions to match it to its own backend. But that decode engine is much easier to write when it's starting from a wide format that presents lots of instructions and makes explicit promises about their interdependencies.
Maybe ARM gets a real kick in the pants but high-performance server processors were probably too far in the future to play a meaningful role.
Insanely expensive for that performance. I was the architect of HPC clusters in that era, and Itanic never made it to the top for price per performance.
Also, having lived through the software stack issues with the first beta chips of Itanic and AMD64 (and MIPS64, but who's counting), AMD64 was way way more stable than the others.
https://en.m.wikipedia.org/wiki/Itanium
Essentially, while decoding a 64bit variant of x86 ISA might have been fused off, there was a very visible part that was common anyway, and that was available ALUs on NetBurst platform - which IIRC were 2x 32bit ALUs for integer ops. So you either issue micro-op to both to "chain" them together, or run every 64bit calculation in multiple steps.
https://ctho.org/toread/forclass/18-722/logicfamilies/Delega...
> There are two distinct 32-bit FCLK execution data paths staggered by one clock to implement 64-bit operations.
If it weren't fused off, they probably would've supported 64-bit ops with an additional cycle of latency?
As someone who works with AMD64 assembly very often - they didn't really clean it up all that much. Instruction encoding is still horrible, you still have a bunch of useless instructions even in 64-bit mode which waste valuable encoding space, you still have a bunch of instructions which hardcode registers for no good reason (e.g. the shift instructions have a hardcoded rcx). The list goes on. They pretty much did almost the minimal amount of work to make it 64-bit, but didn't actually go very far when it comes to making it a clean 64-bit ISA.
I'd love to see what Intel came up, but I'd be surprised if they did a worse job.
They also were affordable dual cores, it wasn't the norm at all at the time.
If you look at the 286's 16-bit protected mode and then the 386's 32-bit extensions, they fit neatly into the "gaps" in the former; there are some similar gaps in the latter, which look like they had a future extension in mind. Perhaps that consideration was already there in the 80s when the 386 was being designed, but as usual, management got in the way.
Segmentation very useful for virtualization? I don't follow that claim.
I would call this the real problem, and segmentation a bad workaround.
Damn!
File this one under "we made the right decision based on everything we knew at the time." It's really sad because the absolute right choice would have been to extend x86 and let it duke it out with Itanium. Intel would win either way and the competition would have been even more on the back heel. So easy to see that decades later...
The concern is that it won't cannibalize sales, it would cannibalize IA64 manager's job and status. "You ship the org chart"
The real thing that killed the division is Oracle announcing that they would no longer support IA-64. It just so happened that like 90% of the clients using Itanium were using it for oracle DBs.
But by that point HP was already trying to get people to transition to more traditional x86 servers that they were selling.
https://en.wikipedia.org/wiki/Half_Dome
The cost structure was just bonkers. I replaced a big file server environment that was like $2M of Sun gear with like $600k of HP Proliant.
Linux didn't "win" nearly as much as x86 did by becoming "good enough" - Linux just happened to be around to capitalize on that victory.
The writing on the wall was the decreasing prices and increasing capability of consumer-grade hardware. Then real game-changer followed: horizontal scalability.
You had AutoCAD, you had 3D Studio Max, you had After Effects, you had Adobe Premiere. And it was solid stuff - maybe not best-in-class, but good enough, and the price was right.
Itanium sounded the deathknell for all of them.
The only Unix to survive with any market share is MacOS, (arguably because of its lateness to the party) and it has only relatively recently went back to a more bespoke architecture
Meanwhile the decision to keep Itanium on expensive but lower-volume market meant that there simply wasn't much market growth, especially once non-technical part of killing other RISCs failed. Ultimately Itanium was left as recommended way in some markets to run Oracle databases (due to partnership between Oracle and HP) and not much else, while shops that used other RISC platforms either migrated to AMD64, or moved to other RISC platforms (even forcing HP to resurrect Alpha for last one gen)
The common attitude in the 80s and 90s was that legacy ISAs like 68k and x86 had no future. They had zero chance to keep up with the innovation of modern RISC designs. But not only did x86 keep up, it was actually outperforming many RISC ISAs.
The true factor is out-of-order execution. Some RISC contemporary designs were out-of-order too (Especially Alpha, and PowerPC to a lesser extent), but both AMD and Intel were forced to go all-in on the concept in a desperate attempt to keep the legacy x86 ISA going.
Turns out large out-of-order designs was the correct path (mostly OoO has side effect of being able to reorder memory accesses and execute them in parallel), and AMD/Intel had a bit of a head start, a pre-existing customer base and plenty of revenue for R&D.
IMO, Itanium failed not because it was a bad design, but because it was on the wrong path. Itanium was an attempt to achieve roughly the same end goal as OoO, but with a completely in-order design, relying on static scheduling. It had massive amounts of complexity that let it re-order memory reads. In an alternative universe where OoO (aka dynamic scheduling) failed, Itanium might actually be a good design.
Anyway, by the early 2000s, there just wasn't much advantage to a RISC workstation (or RISC servers). x86 could keep up, was continuing to get faster and often cheaper. And there were massive advantages to having the same ISA across your servers, workstations and desktops.
He was a key player in the Pentium Pro out of order implementation.
https://www.sigmicro.org/media/oralhistories/colwell.pdf
"We should also say that the 360/91 from IBM in the 1960s was also out of order, it was the first one and it was not academic, that was a real machine. Incidentally that is one of the reasons that we picked certain terms that we used for the insides of the P6, like the reservation station that came straight out of the 360/91."
Here is his Itanium commentary:
"Anyway this chip architect guy is standing up in front of this group promising the moon and stars. And I finally put my hand up and said I just could not see how you're proposing to get to those kind of performance levels. And he said well we've got a simulation, and I thought Ah, ok. That shut me up for a little bit, but then something occurred to me and I interrupted him again. I said, wait I am sorry to derail this meeting. But how would you use a simulator if you don't have a compiler? He said, well that's true we don't have a compiler yet, so I hand assembled my simulations. I asked "How did you do thousands of line of code that way?" He said “No, I did 30 lines of code”. Flabbergasted, I said, "You're predicting the entire future of this architecture on 30 lines of hand generated code?" [chuckle], I said it just like that, I did not mean to be insulting but I was just thunderstruck. Andy Grove piped up and said "we are not here right now to reconsider the future of this effort, so let’s move on"."
Actually no, it was Metaflow [0] who was doing out-of-order. To quote Colwell:
"I think he lacked faith that the three of us could pull this off. So he contacted a group called Metaflow. Not to be confused with Multiflow, no connection."
"Metaflow was a San Diego group startup. They were trying to design an out of order microarchitecture for chips. Fred thought what the heck, we can just license theirs and remove lot of risk from our project. But we looked at them, we talked to their guys, we used their simulator for a while, but eventually we became convinced that there were some fundamental design decisions that Metaflow had made that we thought would ultimately limit what we could do with Intel silicon."
Multiflow, [1] where Colwell worked, has nothing to do with OoO, its design is actually way closer to Itanium. So close, in-fact that the Itanium project is arguably a direct decedent of Multiflow (HP licensed the technology, and hired Multiflow's founder, Josh Fisher). Colwell claims that Itainum's compiler is nothing more than the Multiflow compiler with large chunks rewritten for better performance.
[0] https://en.wikipedia.org/wiki/Metaflow_Technologies
[1] https://en.wikipedia.org/wiki/Multiflow
I'm pressing X: the doubt button.
I would argue that speculative execution/branch prediction and wider pipeline, both of which that OoO largely benefitted from, would be more than OoO themselves to be the sole factor. In fact I believe the improvement in semiconductor manufacturing process node could contribute more to the IPC gain than OoO itself.
It's a little annoying that OoO is overloaded in this way. I have seen some people suggesting we should be calling these designs "Massively-Out-of-Order" or "Great-Big-Out-of-Order" in order to be more specific, but that terminology isn't in common use.
And yes, there are some designs out there which are technically out-of-order, but don't count as MOoO/GBOoO. The early PowerPC cores come to mind.
It's not that executing instructions out-of-order benefits from complex branch prediction and wide execution units, OoO is what made it viable to start using wide execution units and complex branch prediction in the first place.
A simple in-order core simply can't extract that much parallelism, the benefits drop off quickly after two-wide super scalar. And accurate branch prediction is of limited usefulness when the pipeline is that short.
There are really only two ways to extract more parallelism. You either do complex out-of-order scheduling (aka dynamic scheduling), or you take the VLIW approach and try to solve it with static scheduling, like the Itanium. They really are just two sides of the same "I want a wide core" coin.
And we all know how badly the Itanium failed.
Ah, the philosophy of having the CPU execution out of ordered, you mean.
> A simple in-order core simply can't extract that much parallelism
While yes, it is also noticable that it does not have data hazard because a pipeline simply doesn't exist at all, and thus there is no need for implicit pipeline bubble or delay slot.
> And accurate branch prediction is of limited usefulness when the pipeline is that short.
You can also use a software virtual machine to turn an out-of-order CPU into basically running in-order code and you can see how slow that goes. That's why JIT VM such as HotSpot and GraalVM for JVM platform, RyuJIT for CoreCLR, and TurboFan for V8 is so much faster, because when you compile them to native instruction, the branch predictor could finally kick in.
> like the Itanium > And we all know how badly the Itanium failed.
Itanium is not exactly VLIW. It is an EPIC [^1] fail though.
[1]: https://en.wikipedia.org/wiki/Explicitly_parallel_instructio...
I guess Oracle / Sun sparc is also still hanging on. I haven't seen a Sun shop since the early 2000's...
I still run into a number of Solaris/SPARC shops, but even the most die hard of them are actively looking for the off-ramp. The writing is on that wall.
To the point that once that ended with Oracle's purchase of Sun, there was a lawsuit between Oracle and HP. And a lot of angry customers as HP-UX was pushed to the last moment of acquisition announcement.
Almost all early startups I worked with were Sun / Solaris shops. All the early ISPs I worked with had Sun boxes for their customer shell accounts and web hosts. They put the "dot in dot-com", after all...
The late 90's to the early aughts' race for highest-frequency, highest-performance CPUs exposed not a need for a CPU-only, highly specialised foundry, but a need for sustained access to the very front of process technology – continuous, multibillion-dollar investment and a steep learning curve. Pure-play foundries such as TSMC could justify that spend by aggregating huge, diverse demand across CPU's, GPU's and SoC's, whilst only a handful of integrated device manufacturers could fund it internally at scale.
The major RISC houses – DEC, MIPS, Sun, HP and IBM – had excellent designs, yet as they pushed performance they repeatedly ran into process-cadence and capital-intensity limits. Some owned fabs but struggled to keep them competitive; others outsourced and were constrained by partners’ roadmaps. One can trace the pattern in the moves of the era: DEC selling its fab, Sun relying on partners such as TI and later TSMC, HP shifting PA-RISC to external processes, and IBM standing out as an exception for a time before ultimately stepping away from leading-edge manufacturing as well.
A compounding factor was corporate portfolio focus. Conglomerates such as Motorola, TI and NEC ran diversified businesses and prioritised the segments where their fab economics worked best – often defence, embedded processors and DSP's – rather than pouring ever greater sums into low-volume, general-purpose RISC CPU's. IBM continued to innovate and POWER endured, but industry consolidation steadily reduced the number of independent RISC CPU houses.
In the end, x86 benefited from an integrated device manufacturer (i.e. Intel) with massive volume and a durable process lead, which set the cadence for the rest of the field. The outcome was less about the superiority of a CPU-only foundry and more about scale – continuous access to the leading node, paid for by either gigantic internal volume or a foundry model that spread the cost across many advanced products.
It's also interesting to note that back then the consensus was that you needed your own in-house fab with tight integration between the fab and CPU design teams to build the highest performance CPU's. Merchant fabs were seen as second-best options for those who didn't need the highest performance or couldn't afford their own in-house fab. Only later did the meteoric rise of TSMC to the top spot on the semiconductor food chain upend that notion.
96 more comments available on Hacker News