Back to Home11/18/2025, 6:54:54 PM

OrthoRoute – GPU-accelerated autorouting for KiCad

wanderingjew

207 points

25 comments

Mood

thoughtful

Sentiment

positive

Discussion Activity

Light discussion

First comment

41m

Peak period

Hour 2

Avg / period

1.7

Comment distribution22 data points

Based on 22 loaded comments

Key moments

01Story posted
11/18/2025, 6:54:54 PM
1d ago
Step 01
02First comment
11/18/2025, 7:36:13 PM
41m after posting
Step 02
03Peak activity
4 comments in Hour 2
Hottest window of the conversation
Step 03
04Latest activity
11/19/2025, 5:54:53 PM
1h ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (25 comments)

Showing 22 comments of 25

NoiseBert69

23h ago

1 reply

Being that engineer in a PCB Fab in China who has to punch all the vias by hand

heart attack

varispeed

23h ago

2 replies

Are these pushed by hand? I'd believe if you said it happens in the UK. Fabs are still stuck in the 90s.

NoiseBert69

22h ago

Joking.

There are videos from JLCPCB (one of the biggest). That stuff is 90% automated.

sitzkrieg

23h ago

bias (even blind) are certainly automated these days. only time i've seen people manually via in the last several years is using a pcb mill, and little copper rivets you could slam thru a fr2 board for quick in house top/bottom prototypes

morellt

22h ago

4 replies

I would love to know the application of this ludicrous PCB, and I'd be even more interested to see the quote price

NoiseBert69

22h ago

2 replies

A4-sized, 32 layers.. I'd guess something around 1500€ at JLC.

willis936

8h ago

And 44k blind vias...

farkanoid

14h ago

~$320USD each at ALLPCB with a MOQ of 5 for their standard 32-layer stackup, but that doesn't include the blind vias. Probably close to $400?

wanderingjew

19h ago

3 replies

Hey, guy who made this here. This probably deserves a little explanation. First off, I'd like to tell you I'm really, really unemployed, and have the freedom to do some cool stuff. So I came up with a project idea. This is only a small part of a project I'm working on, but you'll see where this is going.

I was inspired by this video: https://www.youtube.com/watch?v=HRfbQJ6FdF0 from bitluni that's a cluster of $0.10-0.20 RISC-V microcontrollers. For ten or twenty cents, these have a lot of GPIOs compared to other extremely low-cost microcontrollers. 18 GPIOs on the CH32V006F4U6. This got me thinking, what if I built a cluster of these chips. Basically re-doing bitluni's build.

But then I started thinking, at ten cents a chip, you could scale this to thousands. But how do you connect them? That problem was already solved in the 80s, with the Connection Machine. The basic idea here is to get 2^(whatever) chips, and connect them so each chip connects to (whatever) many other chips. The Connection Machine sold this as a hypercube, but it's better described as a hamming-distance-one graph or something.

So I started building that. I did the LEDs first, just to get a handle on thousands of parts: https://x.com/ViolenceWorks/status/1987596162954903808 and started laying out the 'cards' of this thing. With a 'hypecube topology' you can split up the cube into different parts, so this thing is made of sixteen cards (2^4), with 256 chips on each card (2^8), meaning 4096 (2^12) chips in total. This requires a backplane. A huge backplane with 8196 nets. Non-trivial stuff.

So the real stumbling block for this project is the backplane, and this is basically the only way I could figure out how to build it; write an autorouter. It's a fun project that really couldn't have been done before the launch of KiCad 9; the new IPC API was a necessity to make this a reality. After that it's just some CuPy because of sparse matrices and a few blockers trying to adapt PathFinder to circuit boards.

Last week I finished up the 'cloud routing' functionality and was able to run this on an A100 80GB instance on Vast.io; the board wouldn't fit in my 16GB 5080 I used for testing. That instance took 41 hours to route the board, and now I have the result back on my main battlestation ready for the bit of hand routing that's still needed. No, it's not perfect, but it's an autorouter. It's never going to be perfect.

This was a fun project but what I really should have been doing the past three months or so is grinding leetcode. It's hard out there, and given that I've been rejected from every technician job I've applied to, I don't think this project is going to help me. Either way, this project.... is not useful. There's probably a dozen engineers out there in the world that this _could_ help.

So, while it's working for my weird project, this is really not what hiring managers want to see.

vrinsd

18h ago

1 reply

Author: Thanks for taking the time to reply.

I read the write-up with a LOT of interest, this is really amazing work, there's not a lot of good options for auto-routing with open-source PCB tools (i.e. KiCad). I have also used the other autorouter you mentioned for "low-complexity" boards in KiCad and it helped do the job but was painful.

In my career I've also used the autorouter built into the "high-end" PCB tools and they could handle the complexity of boards you outlined WITHOUT needing a massive GPU, but they also paid people to improve this stuff over 15-to-20-years and development happened when single-core computers with limited RAM were the norm.

On the technical side, somewhat more recent FPGA 'placement' algorithms used a simulated annealing algorithm, while what you didn't isn't about placement, that approach could posisbly help with 'net cross-over reduction' type of passes, and maybe help with designs where you can do port swap / pin swap.

I'm amused you made a RISC-V array with discrete parts -- I'm sure you considered using an FPGA? Jan Gray has done > 1000+ RISC-V cores (https://fpga.org/grvi-phalanx/) in "older" Xilinx FPGAs.

If you're trying to emulate Thinking Machines / CM-x or anything else, frankly I think a "mondo" FPGA is still the way to go.

Job-wise: A suggestion might be to reach out to the guys at AllSpice ( allspice.io ) who make revision control software for Altium and possibly KiCad. The work you did to enable IPC, etc seems like exactly the type of skillset these guys might need (contractor, maybe full-time?) to interoperate with KiCad.

If I see anything that might be up your alley I'd also reach out. I'm not in a position to hire anyone and while "some companies" may not be impressed by what you did, the right organization WOULD be.

I share your sentiment that the likes of "modern" companies like Apple, MSFT, etc the hiring process is really taylored to "I want a guy who can do X" and rarely "I want a guy who's shown he can learn Y and Z so he can certainly do X".

wanderingjew

18h ago

1 reply

> On the technical side, somewhat more recent FPGA 'placement' algorithms used a simulated annealing algorithm, while what you didn't isn't about placement, that approach could posisbly help with 'net cross-over reduction' type of passes, and maybe help with designs where you can do port swap / pin swap.

Yeah, that was the first step in creating the netlist for the backplane. Simulated annealing on the 8196 nets. TO BE FAIR, this would be a lot easier to route if I didn't explicitly want each of the 16 cards to be identical, but I think that's the most cost-effective way to do it.

As far as an FPGA.... I don't know if I see the point. The nodes in the original CM-1 were basically _only_ ALUs. Very little processing power. The CM-5 was a little better, but this entire thing is batshit crazy. I might as well go for four thousand indivdually programmable cores. See what it can actually do.

vrinsd

18h ago

If you're open to technical feedback your last comment, I've worked on these kinds of systems, have architected and built things even far "weirder" and these products have shipped and out in the real world, in silicon, in FPGAs and things between.

The reason an FPGA is a more suitable platform is you can translate "physical effort of making PCBs" into "creating a design in an infinitely re-programmable platform" and change your design as needed to your hearts content.

In fact, the original design of RISC-V included a bus called 'TileLink' to enable 'Many core' arrays of RISC-V processors.

Translation: You can pare-down open-source RISC-V cores and use TileLink and emulate CM or build something more complex as you see fit since that was built into the original open-source RISC-V specs.

FPGAs are their own joy and pain for sure and it's not as "cool" to re-program a blackbox on a PCB as it might be to make your own thing, so all depends on your goals.

tecleandor

8h ago

Oh, interesting! For a second I thought you were playing again with SXM sockets...

farkanoid

14h ago

Would you happen to be the "what the god damn shit is this fuck" Brian Benchoff of benchoffisms[1] fame?

[1] https://hackaday.io/project/7986-benchoffisms

abraae

21h ago

It would be interesting to know if the finished board would work at all, given there must be some non-zero failure rate for each via.

RicoElectrico

21h ago

Me too. I can't imagine a backplane where the connections would be so irregular as to require bringing out such big guns.

bsder

13h ago

1 reply

Was there a particular reason why "Think real hard on the symmetry and write a Python script." wouldn't do the job? Or was it simply "It would be cool to abuse some GPUs"?

Either option is cool, though.

wanderingjew

1h ago

Author here – I did do the ‘think real hard and write Python’ part, but at the netlist level. I used simulated annealing on the 8196 nets to minimize total length / crossings before importing into KiCad.

Where the GPU router comes in is the geometric part: obeying layer stack, via rules, keepouts, blind-via constraints, etc. You can absolutely hand-encode one or two nice symmetric patterns in code; this board is ‘what if we made the search space big enough that you want Dijkstra + PathFinder + sparse GPU data structures to do it for you’.

X-Ryl669

5h ago

2 replies

I don't get why you process the whole connector board at once. If I understand correctly, you're connecting individual & identic boards to your board. So each connector on you giant board is actually dealing a bunch of small board right?

In that case, can't you exploit the inherent symmetry in the design here to only route a quarter of your connectors and then mirror/rotate the result for the other one? Or, if you have a X*X matrix, route one size minus the corners and replicate to the other sides?

Also, with such a huge connection board, it smells a NIH issue here. I think you'd better serialize the IO to a bus (whatever) and few lines and perform the connection in software (in a GoWin FPGA for example, both extremely cheap and quite powerful). Just think of the harness you'll need to build to fit the connectors in. The obvious routing bugs, and so on. Any maintenance will be a nightmare, if you need to swap 2 pins on a connector or re-run the routing.

actionfromafar

5h ago

But then you need 4096 CPUs plus 4096 FPGAs?

wanderingjew

1h ago

Hi, author here, for this project, the backplane is as much of the computer as the 'daughter cards'. Think of it like the wire-wrap boards of _really old_ minicomputers. I'm using the PDP straight-8 as an analogy here because that's the oldest computer I've been inside of, but the backplane connects the different daughter cards together in a way such that the backplane _is_ the computer.

As far as symmetry goes, there really isn't any. For example, Board 0 conects to 1, 2, 4, and 8. Board 1 connects to 0, 3, 5, and 9. Board 3 connections to 1, 2 , 7, and 11.

There's one way I can think of to make this routing easier. Of of the 16 daughter boards, make the pinout unique to each daughter board. If I was doing this as a product, for manufacturing, this is exactly what I would do. I'd rearrange the pins on each daughter card so it would be easier to route. The drawback of this technique is that there would be 16 different varieties of daughter cards; not economical if you're just building one of these things.

So, with those constraints the only real optimization I have left is ensuring that the existing net plan is optimal. I already did that when I generated the netlist; used simulated annealing to ensure the minimal net length for the board before I even imported it into KiCad.

And yeah, serializing the IO would be better, but even better than that would be to emulate the entire system in a giant black box of compute. But then I wouldn't have written a GPU autorouter. I'm trying not to, but there is some optimization for _cool_ here, you know?

zeroping

13h ago

Beautiful closing to the write-up. "Never trust the autorouter. But at least this one is fast."

3 more comments available on Hacker News

View full discussion on Hacker News

ID: 45970391Type: storyLast synced: 11/19/2025, 7:26:53 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN