The Dragon Hatchling: the Missing Link Between the Transformer and Brain Models

Posted2 months agoActive2 months ago

thatxliner

134 points

99 comments

arxiv.orgTechstoryHigh profile

skepticalmixed

Debate

80/100

AINeuromorphic ComputingTransformer Architecture

Key topics

Neuromorphic Computing

Transformer Architecture

A new paper proposes the 'Dragon Hatchling' model, a biologically-inspired AI architecture that rivals GPT-2 performance, sparking debate among HN users about its innovation, scalability, and potential.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

46m

Peak period

0-12h

Avg / period

16.5

Comment distribution99 data points

Loading chart...

Based on 99 loaded comments

Key moments

01Story posted
Oct 22, 2025 at 9:00 AM EDT
2 months ago
Step 01
02First comment
Oct 22, 2025 at 9:45 AM EDT
46m after posting
Step 02
03Peak activity
94 comments in 0-12h
Hottest window of the conversation
Step 03
04Latest activity
Oct 28, 2025 at 11:29 AM EDT
2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (99 comments)

Showing 99 comments

bob1029

2 months ago

1 reply

The nature of the abstract is making me hesitate to go any further on this one. It doesn't even seem to fit within arxiv's web layout.

CaptainOfCoit

2 months ago

3 replies

Judging science based on the layout of a webpage feels less than ideal :/ The PDF seems to render just fine.

bob1029

2 months ago

1 reply

This doesn't change the fact that the PDF contains a ~440 word abstract. It comes off as a defensive marketing pitch when it's this long.

batuhandumani

2 months ago

You're truly judging the book by its cover, but I have to give credit where it's due the abstract is very long.

fxwin

2 months ago

While it's less than ideal it's also a very useful (and pretty accurate) heuristic to weed out papers in an area where it is impossible to read all (or even most) of them

oofbey

2 months ago

It’s a clear signal the paper is gonna be hard to read. It takes a ton of work to compress complex ideas down to 8 pages for a conference paper. But that work makes it easier to understand. This paper did not do that work. In fact it seems they did the opposite: try to write a LONG paper as if that shows how much originality they have.

moffkalast

2 months ago

2 replies

Another day, another neuromorphic AI group still trying to make aeroplanes with flapping wings. This time it'll surely work, they've attached a crank to the jet engine to drive their flapping apparatus.

bobbyprograms

2 months ago

3 replies

Birds safely do VTOL. Humans still haven’t figured that out yet (helicopters are so dangerous).

I think that it is necessary we try all things because LLMs as we know them take too much energy.

toxik

2 months ago

2 replies

Uh, this is a strange thing to ask, but have you seen birds fly? It is most certainly not vertical take off (or landing.)

fragmede

2 months ago

hummingbirds would seem to be the exception tho

deepanwadhwa

2 months ago

It doesn't sound strange at all. Good question.

falcor84

2 months ago

1 reply

Just regarding VTOL, we absolutely have figured out how to do it with quadcopter drones safely handling weights comparable to the largest birds, and it seems to me that scaling these up to carry humans will be relatively straightforward [0].

[0] https://en.wikipedia.org/wiki/Passenger_drone

bobbyprograms

2 months ago

1 reply

As long as you have rapidly spinning BLADES it’s not safe

falcor84

2 months ago

Why? Planes also have rapidly spinning blades, but the actual risk from getting hurt by the blades is probably two full orders of magnitude lower than the risk from the whole thing crashing into something at speed, which is an inherent risk when you have massive objects moving quickly, regardless of mechanism.

moffkalast

2 months ago

We can absolutely do VTOL reliably at bird scale, look at acrobatic RC planes and quad drones. Even the smallest helicopters are massive and the square cube rule will always make that dangerous in this gravity. But also like, harrier, F-35, osprey, etc.

smohare

2 months ago

While the scornful analogy there is dubious at best, I’d certainly reserve far more skepticism for the groups claiming AGI is within reach via the current crop of mathematically simplistic models

fxwin

2 months ago

3 replies

I haven't read through the entire thing yet, but the long abstract combined with the way the acronym BDH is introduced (What does the B stand for?) along with the very "flowery" name (When neither "dragon" nor "hatchling" appears again past page 2) is rather offputting

- It seems strange to make use of the term "scale-free" and then defer a definition until half way through the paper (in fact, the term is mentioned 3 times after, and 14 times before said definition)

- This might just be CS people doing CS things, but the notation in the paper is awful: Claims/Observations end with a QED-symbol (for example on pages 29 and 30) but without a proof

- They make strong claims about performance and scaling ("It exhibits Transformer-like scaling laws") but the only (i think?) benchmark is a translation task comparison with <1B models, ,which is ~2 orders of magnitude smaller than sota

mwigdahl

2 months ago

1 reply

The B stands for "Baby". Baby Dragon Hatchling is their model name.

fxwin

2 months ago

Seems like this should be in the paper! Thanks though

halfdeadcat

2 months ago

1 reply

It's a 'dragon hatchling' because it is 'scale-free'.

fxwin

2 months ago

Hah, that's pretty clever if it's true .D

dxtrous

2 months ago

1 reply

> Claims/Observations end with a QED-symbol

Author comment: as a fairly common convention, QED immediately after a particular statement means that the statement should be considered proven. Depending on the text, this may either be because the statement (Observation) is self-explanatory, or, the discussion in the text leading up to the statement is sufficient, or, whenever the final statement of a Theorem follows as a direct corollary of Lemmas previously proven in the text.

fxwin

2 months ago

1 reply

I could agree with that, but the example on p29 (Claim 6) ends with QED, but only then the proof follows. I realize I'm nitpicking form here, but still

dxtrous

2 months ago

Well spotted. Thank you!

PaulRobinson

2 months ago

1 reply

[Not a specialist, just a keen armchair fan of this sort of work]

> In addition to being a graph model, BDH admits a GPU-friendly formulation.

I remember about two years ago people spotting that if you just moved a lot of weights through a sigmoid and reduced floats down to -1, 0 or 1, we barely lost any performance from a lot of LLM models, but suddenly opened up the ability to use multi-core CPUs which are obviously a lot cheaper and more power efficient. And yet, nothing seems to have moved forward there yet.

I'd love to see new approaches that explicitly don't "admit a GPU-friendly formulation", but still move the SOTA forward. Has anyone seen anything even getting close, anywhere?

> It exhibits Transformer-like scaling laws: empirically BDH rivals GPT2 performance on language and translation tasks, at the same number of parameters (10M to 1B), for the same training data.

That is disappointing. It needs to do better, in some dimension, to get investment, and I do think alternative approaches are needed now.

From the paper though there are some encouraging side benefits to this approach:

> [...] a desirable form of locality: important data is located just next to the sites at which it is being processed. This minimizes communication, and eliminates the most painful of all bottlenecks for reasoning models during inference: memory-to-core bandwidth.

> Faster model iteration. During training and inference alike, BDH-GPU provides insight into parameter and state spaces of the model which allows for easy and direct evaluation of model health and performance [...]

> Direct explainability of model state. Elements of state of BDH-GPU are directly localized at neuron pairs, allowing for a micro-interpretation of the hidden state of the model. [...]

> New opportunities for ‘model surgery’. The BDH-GPU architecture is, in principle, amenable to direct composability of model weights in a way resemblant of composability of programs [...]

These, to my pretty "lay" eyes look like attractive features to have. The question I have is whether the existing transformer based approach is now "too big to fail" in the eyes of people who make the investment calls, and whether this will get the work it needs to get it from GPT2 performance to GPT5+.

sdenton4

2 months ago

1 reply

/I'd love to see new approaches that explicitly don't "admit a GPU-friendly formulation", but still move the SOTA forward. Has anyone seen anything even getting close, anywhere?/

The speedup from using a GPU over a CPU is around 100x, as a rule of thumb. And there's been an immense amount of work maximizing throughput when training on a pile of GPUs together... And a sota model will still take a long time to train. So even if you do have a non-GPU algo which is better, it'll take you a very very long time to train it - by which point the best GPU algos will have also improved substantially.

janwas

2 months ago

Wow, that number requires STRONG caveats, lest it be called out as completely false. Take away the tensor cores (unless you only do matmuls?), and an H100 has roughly 2x as many f32 flops as a Zen5 CPU, which is considerably cheaper. I suspect brute force HW/algorithms are not going to age well: https://www.sigarch.org/dont-put-all-your-tensors-in-one-bas... (/personal opinion)

gcr

2 months ago

5 replies

a cursory sniff test of the abstract reveals greater-than-trace presence of bullshit

I would trust this paper far more if it didn’t trip my “crank” radar so aggressively

Red flags from me:

- “Biologically-inspired,” claiming that this method works just like the brain and therefore should inherit the brain’s capabilities

- Calling their method “B. Dragon Hatchling” without explaining why, or what the B stands for, and not mentioning this again past page 2

- Saying all activations are “sparse and positive”? I get why being sparse could be desirable, but why is all positive desirable?

These are stylistic critiques and not substantive. All of these things could be “stressed grad student under intense pressure to get a tech report out the door” syndrome. But a simpler explanation is that this paper just lacks insight

AmazingTurtle

2 months ago

3 replies

Your quotes (“ and ”) and apastrophes (’) make me think this was written by AI as no sane human wouldn't use " or '

cootsnuck

2 months ago

Some people draft stuff in different places, different devices, or whatever and then copy paste, just a heads up.

gcr

2 months ago

pardon, i use those for literals/proper nouns. it's a "typing quirk," LLMs wouldn't output with that style

fxwin

2 months ago

could just be non-english autocorrect or keyboard layout

terminalshort

2 months ago

> claiming that this method works just like the brain and therefore should inherit the brain’s capabilities

Your words, not the author's. They did not make this claim.

empiko

2 months ago

What about the fact that it has 45 pages with exactly one comparison with transformer architecture (Figure 7)? This is just a fluff piece for a company trying to raise funding.

dxtrous

2 months ago

One of the authors here. I am genuinely thrilled to know someone would like to know what "B" stands for, knowing/expecting that "DH" stands for "Dragon Hatchling"!

A clarification is thus due. As indicated in the github repo: "BDH" stands for "Baby Dragon: Hatchling". Technically, "Hatchling" would perhaps be the version number.

For readers who find this discussion point interesting, I recommend considering for context: 1. The hardware naming patterns in the Hopper/Gracehopper architecture. 2. The attitude to acronyms taken in the CS Theory vs. Applied CS/AI community.

daemonologist

2 months ago

I suppose all positive saves you a bit per weight, sort of, and potentially some circuitry to deal with negative numbers.

ZeroCool2u

2 months ago

1 reply

This is one of the first papers in the neuromorphic vein that I think may hold up. It would be amazing if it did too due to the following properties:

-Linear (transformer) complexity at training time

-Linear scaling with number of tokens

-Online learning(!!!)

The main point that made me cautiously optimistic:

-Empirical results on par with GPT-2

I think this is one of those ideas that needs to be tested with scaled up experiments sooner rather than later, but someone with budget needs to commit. Would love to see HuggingFace do a collab and throw a bit of $$$ at it with a hardware sponsor like Nvidia.

deviation

2 months ago

2 replies

I guarantee if there's even a 0.1% chance of this architecture eventually outperforming traditional ones, then Zuckerberg et al are already eating the cost and have teams spinning up experiments doing just that.

nickpsecurity

2 months ago

1 reply

That's not true. The AI industry appears to play a game of follow the leader copying other companies and major researchers. There's all kinds of good ideas we never see applied by big companies. So, it's not safe to assume they tried them all and they didn't work.

In fact, we've sometimes seen new companies show up with models based on research big companies didn't use, the new models are useful or better in some way, and people use them or big companies acquire them. I'd say that's proof big companies miss a lot of good ideas internally.

ACCount37

2 months ago

Not every company is investigating every direction. Like, it's clear that Google is investing a lot in embodiment and multimodal understanding, but Anthropic barely cares about either. Across the field though?

I think it's fairly safe to say that every remotely promising thing that showed up in the papers was tried at some big lab at least once. If it showed good results, they'd pick it up.

ZeroCool2u

2 months ago

Absolutely agreed, but we may not even hear about it as Meta has made it clear they're not necessarily committed to the open source first policy at this point.

kouteiheika

2 months ago

1 reply

> Performance of BDH-GPU and GPTXL versus model size on a translation task. [...] On the other hand, GPTXL [...] required Dropout [...] The model architecture follows GPT2

I love when a new architecture comes out, but come on, it's 2025, can we please stop comparing fancy new architectures to the antiquated GPT2? This makes the comparison practically, well, useless. Please pick something more modern! Even the at-this-point ubiquitous Llama would be a lot better. I don't want to have to spend days of my time doing my own benchmarks to see how it actually compares to a modern transformer (and realistically, I was burned so many times now that I just stopped bothering).

Modern LLMs are very similar to GPT2, but those architectural tweaks do matter and can make a big difference. For example, take a look at the NanoGPT speedrun[1] and look at how many training speedups they got by tweaking the architecture.

Honestly, everyone who publishes a paper in this space should read [2]. The post talks about optimizers, but this is also relevant to new architectures too. Here's the relevant quote:

> With billions of dollars being spent on neural network training by an industry hungry for ways to reduce that cost, we can infer that the fault lies with the research community rather than the potential adopters. That is, something is going wrong with the research. Upon close inspection of individual papers, one finds that the most common culprit is bad baselines [...]

> I would like to note that the publication of new methods which claim huge improvements but fail to replicate / live up to the hype is not a victimless crime, because it wastes the time, money, and morale of a large number of individual researchers and small labs who run and are disappointed by failed attempts to replicate and build on such methods every day.

Sure, a brand new architecture is most likely not going to compare favorably to a state-of-art transformer. That is fine! But at least it will make the comparison actually useful.

[1] -- https://github.com/KellerJordan/modded-nanogpt

[2] -- https://kellerjordan.github.io/posts/muon/#discussion-solvin...

jacobgorm

2 months ago

How would you actually get funding for that, if not by demonstrating that it works on smaller models first?

lackoftactics

2 months ago

1 reply

The authors seem to have good credentials and I found the repo with code for this paper.

https://github.com/pathwaycom/bdh

There isn't a ton of code and there are a lot comments in my native language, so at least that is novel to me

jacobgorm

2 months ago

There is an adaption to vision here, looks very promising: https://github.com/takzen/vision-bdh

CaptainOfCoit

2 months ago

1 reply

> It exhibits Transformer-like scaling laws: we find empirically that BDH rivals GPT2-architecture Transformer performance on language and translation tasks, at the same number of parameters (10M to 1B), for the same training data.

I'm assuming they're using "rivals GPT2-architecture" instead of "surpasses" or "exceeds" because they got close, but didn't manage to create something better. Is that a fair assessment?

ACCount37

2 months ago

Pretty much.

Everyone and their dog says "transformer LLMs are flawed", but words are cheap - and in practice, no one seems to have come up with something that's radically better.

Sidegrades yes, domain specific improvements yes, better performance across the board? Haha no. For how simple autoregressive transformers seem, they sure set a high bar.

alyxya

2 months ago

3 replies

I tried understanding the gist of the paper, and I’m not really convinced there’s anything meaningful here. It just looks like a variation of the transformer architecture inspired by biology, but no real innovation or demonstrated results.

> BDH is designed for interpretability. Activation vectors of BDH are sparse and positive.

This looks like the main tradeoff of this idea. Sparse and positive activations makes me think the architecture has lower capacity than standard transformers. While having an architecture be more easily interpretable is a good thing, this seems to be a significant cost to the performance and capacity when transformers use superposition to represent features in the activations spanning a larger space. Also I suspect sparse autoencoders already make transformers just as interpretable as BDH.

jimbo808

2 months ago

4 replies

There isn't. The title is totally clickbait.

Anything "brain-like" that fits into one single paper is bullshit.

ljlolel

2 months ago

1 reply

A real scientist wouldn’t use an imprecise term like “brain-like”

dxtrous

2 months ago

1 reply

You can, of course, use the almost equivalent scientific-sounding Latin-derived term ("neuromorphic"), buy popcorn, and come back with it for a discussion about memristors.

ljlolel

2 months ago

1 reply

Yes but “brain-like” is clickbait in the headline

dxtrous

2 months ago

Missed it - seems HN moderation (or OP?) changed the headline to the paper title.

astroflection

2 months ago

1 reply

The actual paper's title: "The Dragon Hatchling: The Missing Link between the Transformer and Models of the Brain"

Don't berate the authors for the HN submitter's carelessness.

jimbo808

2 months ago

The actual title is just as click baity

nickpsecurity

2 months ago

1 reply

That last line isn't true. To be brain-like, it only needs to imitate one thing in the brain. That thing is udually tested in isolation against observed results in human brains. Then, people will combine multiple, brain-inspired components in various ways.

That's standard in computational neuroscience. Our standard should simply be whether they are imitating an actual structure or technique in the brain. They usually mention that one. If they don't, it's probably a nonsense comparison to get more views or funding.

jimbo808

2 months ago

1 reply

I am genuinely baffled by this reply. Every single sentence you've typed is complete and utter nonsense. I'm going to bookmark this as a great example of the Dunning-Kruger effect in the wild.

Just to illustrate the absurdity of your point: I could claim, using your standard, that a fresh pile of cow dung is brain-like because it imitates the warmth and moistness of a brain.

nickpsecurity

2 months ago

1 reply

I'll ignore the insults and rhetoric to give some examples.

The brain-inspired papers have done realistic models of specific neurons, spiking, Hebbian learning, learning rates tied to neuron measurements, matched firing patterns, done temporal synchronization, hippocampus-like memory, and prediction-based synthesis for self-training.

Brain-like or brain-inspired appears to mean using techniques similar to the brain. They study the brain, develop models that match its machinery, implement them, and compare observed outputs of both. That, which is computational neuroscience, deserves to be called brain-like since it duplicates hypothesized, brain techniques with brain-like results.

Others take the principles or behavior of the above, figure out practical designs, and implement them. They have some attributes of the brain-like models or similar behavior but don't duplicate it. They could be called brain-inspired but we need to be careful. Folks could game the label by making things that have nothing to do with brain-like models or went very far away.

I prefer the be specific about what is brain-like or brain-inspired. Otherwise, just mention the technique (eg spiking NN) to let us focus on what's actually being done.

jimbo808

2 months ago

Be specific, provide examples. Much of the things brains do that we call intelligent are totally unknown to us. We have, quite literally, no idea what algorithms the brain employs. If you want to talk about intelligence, I don’t know why you’re talking about neuron spiking. We don’t talk about semiconductor voltages when we talk about computer programs we’re working on.

AI systems are software, so if you want to build something brain like, you need to understand what the brain is actually like. And we don’t.

dang

2 months ago

(Submitted title was "A Brain-like LLM to replace Transformers" - a mod changed it later, in keeping with the title rule at https://news.ycombinator.com/newsguidelines.html)

busssard

2 months ago

1 reply

This is like at the beginning or the end of the Crypto Bubble. Publish a whitepaper for the next model architecture and hope that uninformed people with money blow it up your companys... i mean blow up the economy.... i mean blow , ahh whatever you know

raincole

2 months ago

3 replies

> the end of the Crypto Bubble

BTC literally hit all time high this month, fyi.

cootsnuck

2 months ago

2 replies

What's your point?

House prices are at all time highs too. That doesn't mean the housing bubble never happened.

AnimalMuppet

2 months ago

1 reply

Means it's not the end of the crypto bubble.

Unless you're going to claim that previous large drops in crypto were perhaps bubbles, but this time it's real...

jimbo808

2 months ago

1 reply

Prices have risen by orders of magnitude, untethered to any measurable fundamentals, then crashed, multiple times. I'm not sure what other definition of bubble you're operating with...

AnimalMuppet

2 months ago

I'm not arguing that it's not a bubble. I'm conceding that cootsnuck may be claiming that this time it's not a bubble.

But if that's not the claim, then I'm saying that the current value makes it's clear that it's not the end of a bubble.

DaveZale

2 months ago

Housing could still be a bubble. Many homes on wheels in addition to the stationary ones in my neighborhood. But sure that's another topic.

jimbo808

2 months ago

Remember the dotcom bubble? There are still websites, by the way. Doesn't mean it wasn't a bubble.

busssard

2 months ago

ATH was in july. but yeah i am not talking about BTC. I am talking about the sheer number of Rugpulls that were done by people doing a etherium fork and claiming it as the new revolution in decentralized computing.

oofbey

2 months ago

Attention mechanisms are wonderfully interpretable as is. You can literally see which tokens each token is attending to. People don’t bother much these days. But that’s not a strong selling point.

recitedropper

2 months ago

5 replies

Repo seems legit, and some of the ideas are pretty novel. As always though, we'll have to see how it scales. A lot of interesting architectures have failed the GPT3+ scale test.

As a sidenote--does anyone really think human-like intelligence on silica is a good idea? Assuming it comes with consciousness, which I think is fair to presume, brain-like AI seems to me like a technology that shouldn't be made.

This isn't a doomer position; that human-like AI would bring about the apocalypse. It is one of empathy: At this point in time, our species isn't mature enough to have the ability to spin up conscious beings so readily. I mean look how we treat each other--we can't even treat beings we know to be conscious with kindness and compassion. Mix our immaturity with a newfound ability to create digital life and it'll be the greatest ethical disaster of all time.

It feels like researchers in the space think there is glory to be found in figuring out human-like intelligence on silicon. That glory has even attracted big names outside the space (see John Carmack), under the presumption that the technology is a huge lever for good and likely to bring eternal fame.

I honestly think it is a safer bet that, given how we aren't ready for such technology, the person / team who would go on to actually crack brain-like AI would be remembered closer to Hitler than to Einstein.

lr4444lr

2 months ago

1 reply

I mean look how we treat each other--we can't even treat beings we know to be conscious with kindness and compassion. Mix our immaturity with a newfound ability to create digital life and it'll be the greatest ethical disaster of all time.

Or maybe if we had artificial life to abuse, it would be a suffcient outlet for our destructive and selfish impulses so that we would do less of it to genuine life. Maybe it's just an extension of sport contests that scratch that tribal itch to compete and win. There are no easy answers to these questions.

recitedropper

2 months ago

In this thought experiment, I am considering artificial life genuine. I would agree that there could be productive outlets for our selfish impulses if there was something that mimicked their targets without consciousness to experience the externalities of such impulses.

That said, I think probably the best path would just be to build and foster technologies that help our species mature, so if one day we do get the ability to spin-up conscious beings artificially, it can be done in a manner that adds more beauty rather than despair to our universe.

ACCount37

2 months ago

1 reply

We have no clue what "consciousness" even is, let alone what the prerequisites are. Our best guesses are just that. Guesses. Guesswork based on information so sparse that astronomers in ancient Greece might have had a better time guessing what the stars truly are.

For all we know, an ICE in a 2001 Toyota truck is conscious too - just completely inhuman in its consciousness.

Nonetheless, here we are - building humanlike intelligence. Because it's useful. Having machines that think like humans do is very useful. LLMs are a breakthrough in that already - they implement a lot of humanlike thinking on a completely inhuman substrate.

recitedropper

2 months ago

1 reply

For the record, I'm agnostic to whether or not consciousnses is possible upon silica. I think it is pretty safe to say though that it likely is an emergent property of specifically-configured complex systems, and humanlike intelligence on silica is certainly something that might qualify.

I don't think appealing to whether or not inanimate objects may be conscious is sufficient to discount that we are toying with a different beast in machine learning. And, if we were to discover that inanimate objects are in-fact conscious, that would be an even greater reason to reconfigure our society and world around compassion.

I agree that LLMs are a great breakthrough, and I think there are many reasons to doubt consciousness there. But I would suggest we rest on our laurels for a bit, and see what we can get out of LLMs, rather than push to create something that is closer to mimicking humans because it might be more useful. From the evil perspective of pure utility, slaves are quite useful as well.

ACCount37

2 months ago

1 reply

The issue so far is that this "closer to mimicking humans" doesn't actually seem to give performance gains. So, why bother?

Existing LLMs are already trained to mimic humans - by imitating text, most of which is written by humans, or for humans, and occasionally both. The gains from other types of human-mimicry don't quite seem to land.

The closest we got to "breakthrough by mimicking what humans do" since pre-training on unlabeled text would probably be reasoning. And it's unclear how much of reasoning was "try to imitate what humans do on a high level", and how much was just trying to generalize the lessons from the early "let's think about it step by step" prompting techniques.

It's likely that we just don't know enough about the human mind to spot, extract and apply the features that would be worth copying. And even if we did... what are the chances that the features we would want to copy would turn out to be the ones vital for consciousness?

recitedropper

2 months ago

For the most part I think we agree. There is a lot of uncertainty around the mechanics of consciousness, a lot of reasons to doubt the existence of those mechanics in current AI, and a lot of failed endeavors to use biological mimicry to improve AI state of the art.

I don't think that precludes remaining concerned with the continued push to make current models more humanlike in nature. My initial comment was spurred by the fact that this paper is literally presenting itself as solving the missing link between transformer architectures and the human brain.

Here's to hoping this all goes toward a better world.

raducu

2 months ago

4 replies

> human-like intelligence on silica is a good idea.

The famous Chinese Room Translator -- silica is irelevant, you could probably implement LLM-like algorithm with pen and paper, do you still think the paper could suffer or be "conscious"?

kelseyfrog

2 months ago

The paper would think it's ridiculous that meat could suffer.

tim333

2 months ago

Characters in stories written on paper can suffer. Do you think humans can't suffer because the individual synapses don't?

recitedropper

2 months ago

I am empathetic to arguments against consciounsess being computational. Definitely strange to imagine an algorithm played out on trillions of abacuses being conscious.

That said, I don't think it is a sufficient appeal to entirely discount the possibility that the right process implemented on silicon could in fact be conscious in the same way we are. I'm open to whether or not it is possible--I don't have a vested interest in the space--but silica seems to be a medium that can possible hold the level of complexity for something like consciousness to emerge.

So this is to say that I agree with you that consciousness likely requires substrate-specific embodiment, but I'm open to silica being a possible substrate. I certainly don't think it can be discounted at this point in time, and I'd suggest that we don't risk a digital holocaust on the bet that it can't.

varjag

2 months ago

Suffering isn't necessary outside evolutionary pressures. But if a bouillon of animal proteins could be conscious why not.

cootsnuck

2 months ago

1 reply

Yea, actual "human-like" consciousness would be an ethical nightmare. Any sane company should not be legitimately pursuing this.

My most generous interpretation of Anthropic's flirting with it is they too think it would be a nightmare and are hyper-vigilant. (My more realistic interpretation is that it's just some mix of a Frankenstein complex and hype-boosting.)

recitedropper

2 months ago

I hope your generous interpretation is right... I can't really tell what's going on with Anthropic's theater either. They definitely seem like they are vigilant of bad outcomes, going as far as to publish their own economic index trying to monitor how AI is affecting labor markets.

That said, the cynic in me thinks they give lip service to these things while pushing fully ahead into the unknown on the presumption of glory and a possibility of abundance. A bunch of the leadership are EAs who subscribe to a kind of superintelligence eschatology that goes as far as to give a shot at their own immortality. Given that, I think they act on the assumption that AGI is a necessity, and they'd rather take the risks on everyone's behalf than just not create the technology in the first place.

Them recently flirting with money from the gulf states is a pretty concerning signal pointing to them being more concerned with their own goals rather than ethics.

nix0n

2 months ago

> the person / team who would go on to actually crack brain-like AI would be remembered closer to Hitler than to Einstein

I completely agree. I think that the people who are funding AI research are essentially attempting to create slaves. The engineers actually doing the work have either not thought it through or don't care.

> Assuming it comes with consciousness, which I think is fair to presume, brain-like AI seems to me like a technology that shouldn't be made.

"Fair to presume" is a good way to put it. I'm not convinced that being "like a brain" is either necessary or sufficient for consciousness, but it's necessary to presume it will, because consciousness is not understood well enough for the risk to be eliminated.

jacobgorm

2 months ago

1 reply

I read the through first 20+ pages 1.5 weeks ago, and found it quite inspiring. I tried submitting it here, but it did not catch on at the time. I watched the podcast interview with the founder, who seems very smart, but that made me realize that not everything described in the paper has been released as open source, which was a bit disappointing.

neom

2 months ago

1 reply

This the podcast you watched? https://www.youtube.com/watch?v=mfV44-mtg7c

jacobgorm

2 months ago

Yes.

badmonster

2 months ago

1 reply

How does BDH handle long-range dependencies compared to Transformers, given its locally interacting neuron particles? Does the scale-free topology implicitly support efficient global information propagation?

dxtrous

2 months ago

From the authors: great question. If you take an "easy" task for long-range dependencies where a Mamba-like architecture flies (and the transformer doesn't, or gets messy), the hatchling should also be made to fly. For more ambitious benchmarks, give it a try in a place you care about. The paper is really vanilla and focused on explaining what's happening inside the model, but should be good enough as a starting point for architecture tweaks and experiments.

polskibus

2 months ago

1 reply

I posted it 19 days ago. It it didn’t get any traction, I wonder why. https://news.ycombinator.com/item?id=45453119

lawlessone

2 months ago

Just luck i think

vatsachakrvthy

2 months ago

Great work by the authors!

But... As Karpathy stated on the Dwarkesh podcast, why do we need brain inspired anything? As Chollet says a transformer is basically a tool for differentiable program synthesis. Maybe the animal brain is one way to get intelligence, but it might actually be easier to achieve in-silica, given the faster computational ability and higher reproducibility of calculations

polskibus

2 months ago

One of the authors, Adrian, is a very interesting person. Got his PhD at 21, started CS studies at an age when his peers were starting high school. Knowing some of his previous achievements, I’d say his work deserves at least some curiosity.

pyeri

2 months ago

I've just stepped into LLMs, pytorch, transformers, etc. on the learning path, I don't know much about advanced AI concepts yet. But I somehow feel that scale alone isn't going to solve AGI problem, there is something fundamental about the nature of intelligence itself that we don't know yet, cracking that will lead to unleashing of true AGI.

wigster

2 months ago

but what about those tiny brain tubes discovered last week? are they in the model? ;-)

Fawlty

2 months ago

FWIW it’s trending on huggingface papers and there is a more in-depth discussion there[1]. Seems like the early results are promising and someone replicated that independently but it still needs to be proven that the claims work also at the bigger scale. [1] https://huggingface.co/papers/2509.26507

darkbatman

2 months ago

By looking at the paper, memory needed per layer seems to be higher than transformer architecture. Pretty sure that would be blowing up the vram of gpu at scale.

abdibrokhim

2 months ago

attention is all you need btw

View full discussion on Hacker News

ID: 45668408Type: storyLast synced: 11/20/2025, 4:29:25 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN