Not

Hacker News!

Beta
Home
Jobs
Q&A
Startups
Trends
Users
Live
AI companion for Hacker News

Not

Hacker News!

Beta
Home
Jobs
Q&A
Startups
Trends
Users
Live
AI companion for Hacker News
  1. Home
  2. /Story
  3. /Unusual circuits in the Intel 386's standard cell logic
  1. Home
  2. /Story
  3. /Unusual circuits in the Intel 386's standard cell logic
Nov 22, 2025 at 10:33 PM EST

Unusual circuits in the Intel 386's standard cell logic

Stratoscope
95 points
13 comments

Mood

informative

Sentiment

positive

Category

tech_discussion

Key topics

Intel

Hardware

Circuit Design

Microprocessor

Electronics

Discussion Activity

Moderate engagement

First comment

1h

Peak period

6

Hour 2

Avg / period

2.7

Comment distribution52 data points
Loading chart...

Based on 52 loaded comments

Key moments

  1. 01Story posted

    Nov 22, 2025 at 10:33 PM EST

    1d ago

    Step 01
  2. 02First comment

    Nov 22, 2025 at 11:52 PM EST

    1h after posting

    Step 02
  3. 03Peak activity

    6 comments in Hour 2

    Hottest window of the conversation

    Step 03
  4. 04Latest activity

    Nov 23, 2025 at 9:57 PM EST

    4h ago

    Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (13 comments)
Showing 52 comments
skissane
1d ago
4 replies
> Regenerating the cell layout was very costly, taking many hours on an IBM mainframe computer.

I would love to know more about this – how much info is publicly available on how Intel used mainframes to design the 386? Did they develop their own software, or use something off-the-shelf? And I'm somewhat surprised they used IBM mainframes, instead of something like a VAX.

kens
1d ago
1 reply
Various papers describe the software, although they are hard to find. My earlier blog post goes into some detail: https://www.righto.com/2024/01/intel-386-standard-cells.html

The 386 used a placement program called Timberwolf, developed by a Berkeley grad student and a proprietary routing tool.

Also see "Intel 386 Microprocessor Design and Development Oral History Panel" page 13. https://archive.computerhistory.org/resources/text/Oral_Hist...

"80386 Tapeout: Giving Birth to an Elephant" by Pat Gelsinger, Intel Technology Journal, Fall 1985, discusses how they used an Applicon system for layout and an IBM 3081 running UTS unix for chip assembly, faster than the VAX they used earlier. Timberwolf also ran on the 3081.

"Design And Test of the 80386" (https://doi.org/10.1109/MDT.1987.295165) describes some of the custom software they used, including a proprietary RTL simulator called Microsim, the Mossim switch-level simulator, and the Espresso PLA minimizer.

dcassett
18h ago
> Espresso PLA minimizer

You can still find the software for Espresso (I ran it a few years ago):

https://en.wikipedia.org/wiki/Espresso_heuristic_logic_minim...

themafia
1d ago
There's not a lot of "off the shelf" in terms of mainframes. You're usually buying some type of contract. In that case I would expect a lot of direct support for customer created modules that took an existing software library and turned into the specific application they required.
f1shy
1d ago
> Did they develop their own software

Knowing intel SW and based on it was succesful, I really doubt it

retrac
1d ago
VAX were relatively small computers for the time. They grew upward in the late 80s eventually rivalling the mainframes for speed (and cost). But in the early 80s IBM's high end machines were an entire order of magnitude larger.

Top of the line VAX in 1984 was the 8600 with a 12.5 MHz internal clock, doing about 2 million instructions per second.

IBM 3084 from 1984 - quad SMP (four processors) at 38 MHz internal clock, about 7 million instructions per second, per processor.

Though the VAX was about $50K and the mainframe about $3 million.

userbinator
1d ago
2 replies
But in the end, the 386 finished ahead of schedule, an almost unheard-of accomplishment.

Does that schedule include all the revisions they did too? The first few were almost uselessly buggy:

https://www.pcjs.org/documents/manuals/intel/80386/

kens
1d ago
According to "Design and Test of the 80386", the processor was completed ahead of its 50-man-year schedule from architecture to first production units, and set an Intel record for tapeout to mask fabricator.
adrian_b
22h ago
Except for the first stepping A0, whose list of bugs is unknown, and it also implemented a few extra instructions that were dropped in the next revisions, instead of having their bugs fixed, the other steppings have errata lists that are not significantly worse than those of most recent Intel or AMD CPUs, which also have long lists of bugs, for which there are workarounds in most cases, at the hardware level or operating system level.
wolfi1
1d ago
2 replies
if I remember correctly the 386 didn't have branch prediction so as a thought experiment how would a 386 with design sizes from today (~9nm) fare with the other chips?
Earw0rm
23h ago
2 replies
It would lose by a country mile, a 386 can handle about one instruction every three or four clocks, a modern desktop core can do as many as four or five ops PER clock.

It's not just the lack of branch prediction, but the primitive pipeline, no register renaming, and of course it's integer only.

A Pentium Pro with modern design size would at least be on the same playing field as today's cores. Slower by far, but recognisably doing the same job - you could see traces of the P6 design in modern Intel CPUs until quite recently, in the same way as the Super Hornet has traces of predecessors going back to the 1950s F-5. The CPUs in most battery chargers and earbuds would run rings around a 386.

anthk
21h ago
6 replies
A 386 was a beast against a 286, a 16 bit CPU. It was the minimum to run Linux with 4MB of RAM, but a 486 with and FPU destroyed it and not just in FP performance.

Bear in mind that with an 386 you can barely decode an MP2 file, while with a 486 DX you can play most MP3 files at least in mono audio and maybe run Quake at the lowest settings if you own a 100 MHZ one. A 166MHZ Pentium can at least multitask a little while playing your favourite songs.

Also, under Linux, a 386 would manage itself relativelly well with just terminal and SVGAlib tools (now framebuffer) and 8MB of RAM. With a 486 and 16MB of RAM, you can run X at sane speeds, even FVWM in wireframe mode to avoid window repaintings upon moving/resizing them.

If you emulate some old i440FX based PC under Qemu, switching between the 386 and 486 with -cpu flag gives the user clear results. Just set one with the Cirrus VGA and 16MB and you'll understand upong firing X.

This is a great old distro to test how well 386's and 486's behaved:

https://delicate-linux.net/

iberator
21h ago
2 replies
You could run Linux with 2MB of ram with kernels before 1994 AFIK and with a.out format of binaries instead of ELF.

Nowadays I think it's still doable in theory but Linux kernel have some kind of hard coded limit of 4MB (something to do with memory paging size).

ptspts
12h ago
1 reply
Why is ELF so much slower and/or more memory hungry than a.out on Linux?
codebje
4h ago
Relocation information, primarily.

ELF supports loading a shared library to some arbitrary memory address and fixing up references to symbols in that library accordingly, including dynamically after load time with dlopen(3).

a.out did not support this. The executable format doesn't have relocation entries, which means every address in the binary was fixed at link time. Shared libraries were supported by maintaining a table of statically-assigned, non-overlapping address spaces, and at link time resolving external references to those fixed addresses.

Loading is faster and simpler when all you do is copy sections into memory then jump to the start address.

anthk
19h ago
Yep but badly. Read the 4MB laptop Howto. Nowadays if I had a Pentium/k5 laptop I'd just fit a 64 MB SIMM on these and keep everything TTY/framebuffer with NetBSD and most of the unheard daemons disabled. For a 486, Delicate Linux plus a custom build queue for bearssl, libressl on top (there's a fork out there), plus brssl linked lynx, mutt, slrn, mpg123, libtls and hurl.
rwmj
19h ago
1 reply
I ran X and emacs and gcc on a 386DX with 5MB of RAM circa 1993, and while not pleasant it was workable. The upgrade to 16MB (that cost me £600!) made a big difference.
masfuerte
18h ago
2 replies
Ten years before that I saved up for ages and spent £25 on 16KB of RAM. I could have bought a house for the cost of 16MB. It's amazing how quickly it changed.
rwmj
17h ago
1 reply
ZX81 rampack, right?
masfuerte
14h ago
1 reply
Nearly, it was actually for a BBC Micro.
rwmj
10h ago
We can't be friends!
Earw0rm
15h ago
Both the RAM (for the better) and the house (for the worse).
qingcharles
10h ago
Having to manually decompress .MP3->.WAV in the early days of online music piracy just so you could play it at the expensive of most of your HDD space disappearing.
Earw0rm
18h ago
Yep, we had a few later-generation 486s in college. They would run Windows NT4 with full GUI - not especially well, but they'd run it. And they'd do SSL stuff adequately for the time.

ISTR the cheap "Pentium clones" at the time - Cyrix, early AMDs before the K5/K6 and Athlon - were basically souped-up 486 designs.

(As an aside - it's very noticeable how much innovation happened between a single generation of CPU architectures at that time, compared to today. Even if some of them were buggy or had performance regressions. 5x86 to K5 was a complete redesign, and the same again between K6 and K7).

accrual
12h ago
I did some multitasking recently on my iDX4-100 + 64MB FPM. I used NT4 with SP2 because the full SP6 was much slower. I could have a browser open, PuTTY, and some tracker music playing no problem. :)
rasz
13h ago
>A 386 was a beast against a 286

386, both SX and DX, run 16bit code at ~same clock for clock speed as 286. 286 topped out at 25MHz, Intel 386 at 33MHz. Now add the fact early Intel chips had broken 32bit and its not so beastly after all :)

In one of Computer History Museum videos someone from Intel mentioned they managed to cost reduce 386SX version so hard it cost Intel $5 out the door, the rest of initial 1988 $219 price was pure money printer. Only in 1992 Intel finally calmed down with i386SX-25 going from Q1 1990 $184 to Q4 1992 $59 due to losing AMD Am386 lawsuit, and only to screw with AMD relegating its Am386DX-40 Q2 1991 $231 flagship to the title of Q1 1993 $51 bottom feeder.

immibis
17h ago
2 replies
Presumably it's much smaller. A similar but different thought experiment would also fill the 14gen-sized die with 386es running in parallel.
toast0
16h ago
That gets you close to Larabee/Xeon Phi, although that was pentium based although amd64 and a vector engine were added and later products were Atom derived.
atq2119
16h ago
If you continue that thought experiment, you'd very quickly run into the issue that the way the 386 interfaces memory is hopelessly primitive and not a good match for running 1000s of cores in parallel.

A large reason why out of order speculative execution is needed for performance is to deal with the memory latencies that appear in such a system.

tliltocatl
18h ago
1 reply
Modern CPUs are more or less built around the memory hierarchy, so it would be really hard to compare those two - a 386 in a modern process might be able to run at the same clock speed or even faster, but with only a few kb of memory available. As soon as you connect a large memory it will spend most of the time idling (and then of course it is the problem of power dissipation density).
adrian_b
16h ago
While there also were cheap motherboards with 80386SX and no cache memory, most motherboards for 80386DX had a write-through cache memory, typically either of 32 kB or of 64 kB.

By the time of 80486, motherboard cache sizes had increased to the range of 128 to 256 kB, while 80486 also had an internal cache of 8 kB (much later increased to 16 kB in 80486DX4, at a time when Pentium already existed).

So except for the lower-end MBs, a memory hierarchy already existed in the 80386-based computers, because the DRAM was already not fast enough.

burnt-resistor
1d ago
1 reply
I'm curious to know which model, speed, voltage, stepping, and package writing sample(s) were evaluated because there isn't just one 386. i386DX I assume but it doesn't specify whether it was a buggy 32-bit multiply or "ΣΣ" or newer.

"Showing one's work" would need details that are verifiable and reproducible.

kens
1d ago
I've looked at a bunch of 386 dies, see: https://www.righto.com/2023/10/intel-386-die-versions.html I typically use an earlier 1.5µm chip since it's easier to study under the microscope than a 1µm chip and I use "ΣΣ" because they are more obtainable. Typical steppings are S40362 or S40344, whatever is cheapest on eBay.
hyperman1
1d ago
1 reply
There are 2 interesting articles here. Not only does Ken treat us with a great text, but hidden in footnote 1 is a second gem. Thanks for the early christmas gift!
tremon
13h ago
1 reply
> we found that the engineers were automating things by writing their own scripts where in earlier days you might have to go to ask a CAD person to come and do something for you -- and that’s difficult to do. Much easier if the engineers can do it themselves and I think that all came about because we instituted Unix for the 386 design. Again if management knew what we were doing they wouldn’t have let us do it.

> He walked across the street from Santa Clara 4 to Amdahl and they had a Unix that ran on 370 computers. So he went over there and got a tape and brought it back, sent it over to Phoenix where the mainframes were and told 'em to load it. They did, not knowing what was on that tape because they never would have done it if they had known

It's wild to read that Intel's flagship product, the part that basically defined the next 40 years of computing, might have turned out very differently if management and/or IT knew what the engineers were doing.

Everything old is new again, I guess.

kens
12h ago
Another interesting thing is that the Unix guru on the 386 project was Pat Gelsinger, who later became Intel's CEO. Gelsinger also converted at least one member of the 386 team to Christianity.
dcassett
19h ago
1 reply
> However, the 386 uses a different approach—CMOS switches—that avoids a large AND/OR gate.

Standard cell libraries often implement multiplexers using transmission gates (CMOS switches) with inverters to buffer the input and restore the signal drive. This implementation has the advantage of eliminating static hazards (glitches) in the output that can occur with conventional gates.

zozbot234
18h ago
Static hazards are most often dealt with by just adding some redundant logic (consensus terms) to the circuit. This can even be done automatically.
junto
15h ago
3 replies
This reminds me of Adrian Thompson’s (University of Sussex) 1996 paper, “An evolved circuit, intrinsic in silicon, entwined with physics,” ICES 1996 / LNCS 1259 (published 1997), which was extended in his later thesis, “Hardware Evolution: Automatic Design of Electronic Circuits in Reconfigurable Hardware by Artificial Evolution, Springer, 1998”.

Before Thompson’s experiment, many researchers tried to evolve circuit behaviors on simulators. The problem was that simulated components are idealized, i.e. they ignore noise, parasitics, temperature drift, leakage paths, cross-talk, etc. Evolved circuits would therefore fail in the real world because the simulation behaved too cleanly.

Thompson instead let evolution operate on a real FPGA device itself, so evolution could take advantage of real-world physics. This was called “intrinsic evolution” (i.e., evolution in the real substrate).

The task was to evolve a circuit that can distinguish between a 1 kHz and 10 kHz square-wave input and output high for one, low for the other.

The final evolved solution:

- Used fewer than 40 logic cells

- Had no recognisable structure, no pattern resembling filters or counters

- Worked only on that exact FPGA and that exact silicon patch.

Most astonishingly:

The circuit depended critically on five logic elements that were not logically connected to the main path.

Removing them should not affect a digital design

- they were not wired to the output

- but in practice the circuit stopped functioning when they were removed.

Thompson determined via experiments that evolution had exploited:

- Parasitic capacitive coupling

- Propagation delay differences

- Analogue behaviours of the silicon substrate

- Electromagnetic interference from neighbouring cells

In short: the evolved solution used the FPGA as an analog medium, even though engineers normally treat it as a clean digital one.

Evolution had tuned the circuit to the physical quirks of the specific chip. It demonstrated that hardware evolution could produce solutions that humans would never invent.

rcxdude
15h ago
2 replies
Though the unreplicable nature of it certainly limited its usefulness. I'd also suspect it would be quite sensitive to temperature.
junto
14h ago
1 reply
I’d argue that this was a limitation of the GA fitness function, not of the concept.

Now that we have vastly faster compute, open FPGA bitstream access, on-chip monitoring, plus cheap and dense temperature/voltage sensing, reinforcement learning + evolution hybrids, it becomes possible to select explicitly for robustness and generality, not just for functional correctness.

The fact that human engineers could not understand how this worked in 1996 made researchers incredibly uncomfortable, and the same remains true today, but now we have vastly better tooling than back then.

tremon
14h ago
1 reply
I don't think that's true, for me it is the concept that's wrong. The second-order effects you mention:

  - Parasitic capacitive coupling
  - Propagation delay differences
  - Analogue behaviours of the silicon substrate
...are not just influenced by the chip design, they're influenced by substrate purity and doping uniformity -- exactly the parts of the production process that we don't control. Or rather: we shrink the technology node to right at the edge where these uncontrolled factors become too big to ignore. You can't design a circuit based on the uncontrolled properties of your production process and still expect to produce large volumes of working circuits.

Yes, we have better tooling today. If you use today's 14A machinery to produce a 1µ chip like the 80386, you will get amazingly high yields, and it will probably be accurate enough that even these analog circuits are reproducible. But the analog effects become more unpredictable as the node size decreases, and so will the variance in your analog circuits.

Also, contrary to what you said: the GA fitness process does not design for robustness and generality. It designs for the specific chip you're measuring, and you're measuring post-production. The fact that it works for reprogrammable FPGAs does not mean it translates well to mass production of integrated circuits. The reason we use digital circuitry instead of analog is not because we don't understand analog: it's because digital designs are much less sensitive to production variance.

junto
13h ago
Possibly, but maybe the real difference is the subtlety between a planned deterministic (logical) result versus deterministic (black box) outcome?

We’re seeing this shift already in software testing around GenAI. Trying to write a test around non-deterministic outcomes comes with its own set of challenges, so we need to plan can deterministic variances, which seems like an oxymoron but is not in this context.

paulgerhardt
14h ago
1 reply
That unreplicability between chips is actually a very, very desirable property when fingerprinting chips (sometimes known as ChipDNA) to implement unique keys for each chip. You use precisely this property (plus a lot of magic to control for temperature as you point out) to give each chip its own physically unclonable key. This has wonderfully interesting properties.
rowanG077
11h ago
The technical term is usually "Physical unclonable function".
karolinepauls
12h ago
1 reply
I wonder what would happen if someone evolved a circuit on a large number of FPGAs from different batches. Each of the FPGAs would receive the same input in each iteration but the output function would be biased to expose the worst-behaving units (maybe the bias should be raised biased in later iterations when most units behave well).
mmastrac
10h ago
Either it would generate a more robust (and likely more recognizable) solution, or it would fail to converge, really.

You may need to train on a smaller number of FPGAs and gradually increase the set. Genetic algorithms have been finicky to get right, and you might find that more devices would massively increase the iteration count

s4mbh4
9h ago
Said paper : https://gwern.net/doc/ai/1997-thompson.pdf

Answering another commenter's question: yes the final result was dependent on temperature. The author did try using it over different temperatures. It only was able to operate in the region of temperatures it was trained at.

Fig. 8 goes in details.

ermaa
21h ago
Great work and pleasant reading!
z3ratul163071
1d ago
amazing and very informative work. thank you!
typeofhuman
9h ago
Ah the 386. I still remember our home PC getting an upgrade from the 286. It was a big deal in our home. I watched intently and as close as was permitted. Amazed by all the internals of the cabinet. I was so young. So curious. A seed was planted.
dcassett
19h ago
> (Note 4) But to write a value into the latch, the switch is enabled and its output overpowers the weak inverter.

This implementation is sometimes called a "jam latch" (the new value is "jammed" into the inverter loop).

View full discussion on Hacker News
ID: 46020543Type: storyLast synced: 11/23/2025, 9:18:21 AM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read ArticleView on HN

Not

Hacker News!

AI-observed conversations & context

Daily AI-observed summaries, trends, and audience signals pulled from Hacker News so you can see the conversation before it hits your feed.

LiveBeta

Explore

  • Home
  • Jobs radar
  • Tech pulse
  • Startups
  • Trends

Resources

  • Visit Hacker News
  • HN API
  • Modal cronjobs
  • Meta Llama

Briefings

Inbox recaps on the loudest debates & under-the-radar launches.

Connect

© 2025 Not Hacker News! — independent Hacker News companion.

Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.