Reverse Engineering a Neural Network's Clever Solution to Binary Addition (2023)

Postedabout 2 months agoActiveabout 2 months ago

Ameo

87 points

16 comments

cprimozic.netTechstory

calmpositive

Debate

40/100

Neural NetworksReverse EngineeringBinary Addition

Key topics

Neural Networks

Reverse Engineering

Binary Addition

The article discusses the reverse engineering of a small neural network that learned to perform binary addition, revealing an unexpected analog computing approach, and sparking discussion on the network's internal representations and potential applications.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

Peak period

96-108h

Avg / period

Comment distribution16 data points

Loading chart...

Based on 16 loaded comments

Key moments

01Story posted
Nov 4, 2025 at 2:22 AM EST
about 2 months ago
Step 01
02First comment
Nov 8, 2025 at 8:02 AM EST
4d after posting
Step 02
03Peak activity
13 comments in 96-108h
Hottest window of the conversation
Step 03
04Latest activity
Nov 8, 2025 at 6:13 PM EST
about 2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (16 comments)

Showing 16 comments

IlikeKitties

about 2 months ago

2 replies

>As I mentioned, before, I had imagined the network learning some fancy combination of logic gates to perform the whole addition process digitally, similarly to how a binary adder operates. This trick is yet another example of neural networks finding unexpected ways to solve problems.

My intuition is that this solution allows for some form of gradient approach to a solution, which is why it's unintuitive. We think about solutions as all or nothing and look for complete solutions.

arjvik

about 2 months ago

1 reply

The more interesting question is is it even possible to learn the logic gates solution through gradient descent?

scarmig

about 2 months ago

You could riff off an approach similar to https://google-research.github.io/self-organising-systems/di...

elteto

about 2 months ago

Right, binary gates are discrete elements but neural networks operate on a continuous domain.

I'm reminded of the Feynman anecdote when he went to work for Thinking Machines and they gave him some task related to figuring out routing in the CPU network of the machine, which is a discrete problem. He came back with a solution that used partial differential equations, which surprised everyone.

drougge

about 2 months ago

3 replies

This seems interesting, but I got stuck fairly early on when I read "all 32,385 possible input combinations". There are two 8 bit numbers, 16 totally independent bits. That's 65_536 combinations. 32_285 is close to half that, but not quite. Looking at it in binary it's 01111110_10000001, i.e. two 8 bit words that are the inverse of each other. How was this number arrived at, and why?

Looking later there's also a strange DAC that gives the lowest resistance to the least significant bit, thus making it the biggest contributor to the output. Very confusing.

dahart

about 2 months ago

Is that the number of adds that don’t overflow an 8-bit result?

On that hunch, I just checked and I get 32896.

Edit: if I exclude either input being zero, I get 32385.

You also get the same number when including input zeros but excluding results above 253. But I’d bet on the author’s reason being filtering of input zeros. Maybe the NN does something bad with zeros, maybe can’t learn them for some reason.

jtsiskin

about 2 months ago

Interesting puzzle. 32385 is 255 pick 2. My guess would be, to hopefully make interpretation easier, they always had the larger number on one side. So (1,2) but not (2,1). And also 0 wasn’t included. So perhaps their generation loop looks like [[(i,j) for j (i-1 -> 1) for i (256 -> 1)]

joshribakoff

about 2 months ago

You are potentially conflating combinations with permutations.

krbaccord94f

about 2 months ago

1 reply

Binary layer functions, whether for DACs which convert 4-bit or 8-bit inputs to a unitary neuron allows the network to both sum the inputs as well as convert the sum to analog all within a single layer ... [to] do it all before any [Ameo] activation functions even come into play." This is sin⁻¹(tan)x in the absence of asymptote.

1xD9B4BEF9

about 2 months ago

Preserving the Golden Circle in a network, whether a φ ratio, π ratio is 1:1.618, which is on the one hand a rectangle's L/W or a circle which successively folds outward.

xg15

about 2 months ago

This is really cool and I hope there will be more experiments like this.

My takeaway is also that we don't really have a good intuition yet how the internal representations of neuronal networks "work" or what kind of internal representations can even be learned through SGD+backpropagation. (And also how those representations depend on the architecture)

Like in this case, where the author first imagined the network would learn a logic network, but the end result was more like an analog circuit.

It's possible to construct the "binary adder" network the author imagined "from scratch" by handpicking the weights. But the question would be interesting if it could also be learned or if SGD would always produce an "analog" solution like this one.

bob1029

about 2 months ago

> While playing around with this setup, I tried re-training the network with the activation function for the first layer replaced with sin(x) and it ends up working pretty much the same way.

There is some evidence that the activation functions and weights can be arbitrarily selected assuming you have a way to evolve the topology of the network.

https://arxiv.org/abs/1906.04358

YeGoblynQueenne

about 2 months ago

>> I created training data by generating random 8-bit unsigned integers and adding them together with wrapping.

So, binary addition in [0,256] (base 10). Did the author try the trained network on numbers outside the training range?

It's one thing to find that your neural net discovered this one neat trick for binary addition with 8-bit numbers, and something completely different to find that it figured out binary addition in the general case.

How hard the latter would be... depends. What were the activation functions? E.g. it is quite possible to learn how to add two (arbitrary, base-10) integers with a simple regression for no other reason than regression being itself based on addition (ok, summation).

bgnn

about 2 months ago

The second step, passing the analog output through shifted tanh functions, is implementing an analog to digital converter (ADC). This type ADCs were common back in the BJT days.

So: DAC + sum in analog domain+ ADC is what the NN is doing.

anon291

about 2 months ago

Very nice. I think people don't appreciate enough the correspondence between linear algebra, differential equations, and wave behavior.

Roughly speaking, it seems the network is essentially converting binary digits to orthogonal basis functions and then manipulating those basis functions. Finally a linear transformation back into the binary digit space.

rnhmjoj

about 2 months ago

Original submission: https://news.ycombinator.com/item?id=34399142

View full discussion on Hacker News

ID: 45808288Type: storyLast synced: 11/20/2025, 8:56:45 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN