Google Titans Architecture, Helping AI Have Long-Term Memory
Key topics
The AI research community is abuzz about Google's latest innovation, "Titans," a system designed to give AI long-term memory, with the original poster sharing a link to the research paper on arXiv. As commenters dug in, a lively discussion ensued about the openness of tech giants, with some noting that Meta, in particular, has been sharing impressive frontier research, including JEPAs, SAM, and Segment Anything models. The conversation took a historical turn when one commenter observed that Google wasn't always so forthcoming, with a notable shift towards transparency around 2006 with the release of key papers on GFS, BigTable, and Borg. Amidst the chatter, there's a sense that the AI landscape is rapidly evolving, with multiple players pushing the boundaries of what's possible.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4m
Peak period
103
0-12h
Avg / period
26.7
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 7, 2025 at 7:23 AM EST
27 days ago
Step 01 - 02First comment
Dec 7, 2025 at 7:27 AM EST
4m after posting
Step 02 - 03Peak activity
103 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 11, 2025 at 2:53 PM EST
22 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://arxiv.org/abs/2501.00663
https://arxiv.org/pdf/2504.13173
Is there any other company that's openly publishing their research on AI at this level? Google should get a lot of credit for this.
the current Meta outlook is embarassing tbh, the fact they have largest data of social media in planet and they cant even produce a decent model is quiet "scary" position
Sinking a bazillion dollars into models alone doesn’t get you shit except a gold star for being the valley’s biggest smartypants, because in the product world, model improvements only significantly improve all-purpose chatbots. The whole veg-o-matic “step right up folks— it slices, it dices, it makes julienne fries!” approach to product design almost never yields something focused enough to be an automatic goto for specific tasks, or simple/reliable enough to be a general purpose tool for a whole category of tasks. Once the novelty wears off, people largely abandon it for more focused tools that more effectively solve specific problems (e.g. blender, vegetable peeler) or simpler everyday tools that you don’t have to think about as much even if they might not be the most efficient tool for half your tasks (e.g. paring knife.) Professionals might have enough need and reason to go for a really great in-between tool (e.g mandolin) but that’s a different market, and you only tend to get a limited set of prosumers outside of that. Companies more focused on specific products, like coding, will have way more longevity than companies that try to be everything to everyone.
Meta, Google, Microsoft, and even Apple have more pressure to make products that sanely fit into their existing product lines. While that seems like a handicap if you’re looking at it from the “AI company” perspective, I predict the restriction will enforce the discipline to create tools that solve specific problems for people rather than spending exorbitant sums making benchmark go up in pursuit of some nebulous information revolution.
Meta seems to have a much tougher job trying to make tools that people trust them to be good at. Most of the highest-visibility things like the AI Instagram accounts were disasters. Nobody thinks of Meta as a serious, general-purpose business ecosystem, and privacy-wise, I trust them even less than Google and Microsoft: there’s no way I’m trusting them with my work code bases. I think the smart move by Meta would be to ditch the sunk costs worries, stop burning money on this, focus on their core products (and new ones that fit their expertise) and design these LLM features in when they’ll actually be useful to users. Microsoft and Google both have existing tools that they’ve already bolstered with these features, and have a lot of room within their areas of expertise to develop more.
Who knows— I’m no expert— but I think meta would be smart to try and opt out as much as possible without making too many waves.
I know I know that Elon is crazy etc but Grok example and way to integrate with core product is actually the only ways I can even came up tbh (other than character.ai flavor)
2nd tier winner is Amazon for the same reasons between being able to leverage AI with both Amazon Retail and AWS where they can sell shovels. I’ve also found their internal Nova models to be pretty good for my projects.
Microsoft will be okay because of Azure and maybe Office if they get their AI story right.
I just don’t see any world where OpenAI comes out ahead from a business standpoint as long as they are sharecroppers on other people’s hardware. ChatGPT alone will never make it worth the trillion dollar capitalization long term unless it becomes a meme stock like Tesla
how noble is Meta upholding a right moral ethic
/s
b is mostly not true but c is especially not true. I doubt they do it because it wouldn't /work/. It's not high quality data. But it would also obviously leak a lot of personal info if they did it.
Do we all forget how bad GPT 4.5 was?
OpenAI got out of that mess with some miraculous post-training efforts on their older GPT-4o model.
But in a different timeline we are all talking about how great Llama 4.5 is and how OpenAI needs to recover from the GPT 4.5 debacle.
It didn't bench well against the other benchmaxxed models, and it was too expensive to run, but it was a glimpse of the future where more capable hardware will lead to appreciably smarter models.
It's not impossible that they asses it as local maximum / dead end and are evaluating/training something completely different - and if it'll work, it'll work big time.
https://ai.meta.com/vjepa/
https://ai.meta.com/sam2/
https://ai.meta.com/research/
AI is a bit different.
To wit, it's dangerous to assume the value of this idea based on the lack of public implementations.
Student: Look, a well known financial expert placed what could potentially be a hundred dollar bill on the ground, other well-known financial experts just leave it there!
If Google is not willing to scale it up, then why would anyone else?
You don't necessarily have to prove it out on large foundation models first. Can it beat out a 32b parameter model, for example?
While they do have lots of money and many people, they don't have infinite money and specifically only have so much hot infrastructure to spread around. You'd expect they have to gradually build up the case that a large scale experiment is likely enough to yield a big enough advantage over what's already claiming those resources.
> In fact, the UL2 20B model (at Google) was trained by leaving the job running accidentally for a month.
https://www.yitay.net/blog/training-great-llms-entirely-from...
So, I think they could default on doing it for small demonstrators.
Is that supposed to be a long time? Seems fair that companies don't rush to open up their models.
It's very likely no one is using this architecture at Google for any production work loads. There are a lot of student researchers doing fun proof of concept papers, they're allowed to publish because it's good PR and it's good for their careers.
Most research coming out of big US labs is counter indicative of practical performance. If it worked (too) well in practice, it wouldn't have been published.
Some examples from DeepSeek:
https://arxiv.org/abs/2405.04434
https://arxiv.org/abs/2502.11089
(1) Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek
(2) DeepSeek's claim of training a cutting-edge LLM using a fraction of the compute that is typically needed, without providing a plausible, reproducible method
(3) Early DeepSeek coming up with near-identical answers to ChatGPT--e.g. https://www.reddit.com/r/ChatGPT/comments/1idqi7p/deepseek_a...
Anyone suggesting that sensitive models don't have a secret pipeline of employee knowledge and IP from the U.S. to China has simply not been paying attention to the dozens of documented cases and convictions, along with the wider spread 'common knowledge' to Western firms that have operated in China.
Both can be true - you can steal IP and you can innovate - but don't assume you'd have 'cracked it' without the stealing.
You simply cannot be sure.
Here's an umbrella doc from the USTR, and the good stuff: China used foreign ownership restrictions, such as joint venture (JV) requirements and foreign equity limitations, and various administrative review and licensing processes, to require or pressure technology transfer from U.S. companies. 2. China’s regime of technology regulations forced U.S. companies seeking to license technologies to Chinese entities to do so on non-market-based terms that favor Chinese recipients. 3. China directed and unfairly facilitated the systematic investment in, and acquisition of, U.S. companies and assets by Chinese companies to obtain cutting-edge technologies and IP and generate the transfer of technology to Chinese companies. 4. China conducted and supported unauthorized intrusions into, and theft from, the computer networks of U.S. companies to access their IP, including trade secrets, and confidential business information.
As mentioned - no one has claimed that DeepSeek in its entirety was stolen from the U.S.
It is almost a certainty based on decades of historical precedent of systematic theft that techniques, research, and other IP was also systematically stolen for this critical technology.
Don't close your eyes when the evidence, both rigorously proven and common sense, is staring you in the face.
...and of course the completely insane fact that China has been running on-the-ground operations in the US (and other countries) to discredit, harass, blackmail, and kidnap Chinese who are critical of the government (https://www.npr.org/2020/10/28/928684913/china-runs-illegal-... and https://www.justice.gov/archives/opa/pr/eight-individuals-ch...) - INCLUDING CITIZENS OF OTHER COUNTRIES (https://www.smh.com.au/world/asia/detained-blogger-revealed-...).
This is not the same thing at all. Current legal doctrine is that ChatGPT output is not copyrightable, so at most Deepseek violated the terms of use of ChatGPT.
That isn't IP theft.
To add to that example, there are numerous open-source datasets that are derived from ChatGPT data. Famously, the Alpaca dataset kick-started the open source LLM movement by fine tuning Llama on a GPT-derived dataset: https://huggingface.co/datasets/tatsu-lab/alpaca
No, your comment seems to be a deflection. You made an outstanding claim, that DS stole some IP, and have been asked for outstanding evidence, or at least some evidence. You need to provide it if you want to be taken seriously.
>Large-scale exfiltration of data from ChatGPT when DeepSeek was being developed, and which Microsoft linked to DeepSeek
Where's the evidence for that? I also have a claim that I can't back up with anything more than XLab's report: before the release of R1, there were multiple attempts to hack DS's systems, which nobody noticed. [1]
You really seem to have no idea what you're talking about. R1 was an experiment on teaching the model to reason on its own, exactly to avoid large amounts of data in post-training. It also partially failed, they called the failed snapshot R1-Zero. And it's pretty different from any OpenAI or Anthropic model.
>DeepSeek's claim of training a cutting-edge LLM using a fraction of the compute that is typically needed, without providing a plausible, reproducible method
DeepSeek published a lot more about their models than any top tier US lab before them, including their production code. And they're continuing doing so. All their findings in R1 are highly plausible and most are replicated to some degree and adopted in the research and industry. Moonshot AI trained their K2 on DeepSeek's architecture with minor tweaks (not to diminish their novel findings). That's a really solid model.
Moreover, they released their DeepSeek-Math-7B-RL back in April 2024. [2] It was a tiny model that outperformed huge then-SOTA LLMs like Claude 3 Opus in math, and validated their training technique (GPRO). Basically, they made the first reasoning model worth talking about. Their other optimizations (MLA) can be traced back to DeepSeek v2.
>Early DeepSeek coming up with near-identical answers to ChatGPT--e.g. https://www.reddit.com/r/ChatGPT/comments/1idqi7p/deepseek_a...
That's n=1 nonsense, not evidence. GPT contamination was everywhere, even Claude used to claim to be GPT-3 occasionally, or Reddit Anti-Evil Team. (yes, really) All models have overlapping datasets that are also contaminated with previous models outputs, and mode collapse makes them converge on similar patterns which seem to come and go with each generation.
[1] https://www.globaltimes.cn/page/202501/1327676.shtml
[2] https://huggingface.co/deepseek-ai/deepseek-math-7b-rl
c.f. - https://www.bbc.com/news/world-asia-china-64206950
Cursory searches provide ample evidence of the ongoing commitment: * The House Homeland Security Committee's February 2025 China Threat Snapshot reports over 60 CCP-linked espionage cases from 2021-2024 across 20 states, with FBI data showing 80% of U.S. economic espionage prosecutions benefiting China and a China nexus in 60% of trade secret thefts, equating to $4,000-6,000 per American family. Rock-solid 2024-2025 examples include Ji Wang's November 2025 conviction for stealing DARPA fiber laser trade secrets worth millions for Chinese entities; Linwei Ding's March 2024 indictment for pilfering Google's AI algorithms to launch a PRC startup; and the Pangang Group's April 2025 Ninth Circuit ruling upholding charges for economic espionage in stealing DuPont's titanium dioxide production secrets.
Each of these cases requires meticulous and expensive documentation to prove, in a court of law with people tasked in defending their innocence.
You can be absolutely sure there is IP theft going on - even if the U.S. can't 'prove' it
you are probably arguing with a bot.
name is just topical. although it says something about 2025 that we can't tell!
"some elements of the indictment concern cyber-snooping in connection with trade disputes, which at least sounds a lot like the kind of cyber-snooping on firms that the United States does."
https://www.lawfaremedia.org/article/why-did-doj-indict-chin...
https://www.theguardian.com/world/2013/sep/09/nsa-spying-bra...
https://edition.cnn.com/2015/04/30/news/airbus-germany-nsa-s...
80% of the ecosystem is built on top of companies, groups and individuals publishing their research openly, not sure why Google would get more credit for this than others...
Here is a bit more information about this program: https://www.google.com/about/careers/applications/jobs/resul...
Given the competitive nature of the AI race, it's hard to believe any of these companies are really trying to help the competition.
We post a lot of research on mlscaling sub if you want to look back through them.
https://www.reddit.com/r/t5_3bzqh1/s/yml1o2ER33
Recently, my favorite from them was lumine: https://arxiv.org/abs/2511.08892
Here's their official page: https://seed.bytedance.com/en/research
If so, could there perhaps be a step where the LoRA is merged back into the main model?
That would be like sleeping :-)
LoRAs tend to be adapters bolted onto to systems by people other than the system designers, and they are low rank factorizations.
There is nothing low rank or adapter here.
On the one hand can learning on the job allow better training of what not to be influenced by, but on the other hand can an injected prompt have an even deeper effect on them long term.
Like, if you and your kids want to watch different movies on the living room TV then you can just give it to them and use XR glasses for yourself.
As a parent you have the responsibility of spending time with the kids when they're young. You can watch your shows later.
but either way, giving up our humanity to browse longer without disturbing others is not exactly a wonderful trade
In the previous sections, we first discussed Continuum Memory System (CMS) that allows for more persistent storage of memories and defines memory as a spectrum of blocks with different frequencies of update. Due to the larger capacity and constraints for scaling the parameters, often CMS requires simple learning rule but higher capacity to store more persistent knowledge. On the other hand, in the previous section, we discussed the design of a self-modifying Titans, where it can generate its own keys and so learning update to better adapt to the context. Contrary to CMS, the self-modifying Titans has a small capacity but is using a complex and expressive learning rule. Accordingly, these two systems seem to be complementary and their combination can enhance the model expressiveness from different aspects.
To this end, we present Hope architecture: A neural learning module that incorporates self-modifying Titans followed by Continuum Memory System.
https://research.google/blog/introducing-nested-learning-a-n...
That doesn't work for HOPE - a short summary can't explain what it actually does besides "self-modifying" and "continuum memory".
So it seems to be an innovation of Transformers calibre, really big (if true). It's definitely not "transformer but with such-and-such modification".
Gemini came up with a following visual metaphor for the difference:
> Transformer is a series of frozen glass panes (the weights) and a scratchpad (the attention) where it writes notes about the current text.
> The HOPE architecture involves no scratchpad. Instead, the glass panes themselves are made of smart liquid. As the data flows through, the first pane reshapes itself instantly. The second pane reshapes itself slowly. And the mechanism deciding how to reshape them is itself a tiny, intelligent machine, not just a basic math rule.
This comment was illuminating -- and IMHO an excellent example of why it's important to avoid rigid rules against posting any AI-generated content in HN comments. You gained insights by asking Gemini, and shared them, noting the source. Thank you!
So one can break a model by consistently feeding it with random, highly improbable junk? Everything would be registered as a surprise and get stored, impacting future interactions
AI needs an internal emotional state because that's what drives attention and memory. AI needs to want something.
We would be INSANE to pursue giving that type of instincts to AIs.
I have come to beleive that we will only be able to truly replicate intelligence if the system was trying to preserve itself. Its the biggest incentive ever to do intelligent things.
So, if it would be bad thing for one to be made that “wants things” in any reasonable sense of the phrase, then it would probably be bad for J Random to be able to take a copy of a powerful AI and modify it in some way, because someone is likely to try doing that.
Of course, perhaps the best way to make sure that J Random doesn’t have the ability to do that, is to make sure no one does.
I mean it's not just automatic thing with no higher-level control.
I can see a product where you purchase a model that has basic training, and then, using the features outlined in the paper, it learns on the fly from your usage.
I can also see there being a secondary market for specially trained models, long-term memory filled with some specific skill, done in some specific way. To make a silly example, imagine buying a licence to Torvald's OS coding assistant, ready to insult your prs before you even commit them!(And possibly help you write code in Torvald's style too)
This would of course require Linus to use the model enough for it to learn,I won't comment on the likelihood of that happening: it's just a silly example after all
There's probably lots of small signals of "the user is happy with the output" plus the longer the history the more it will converge on the middle of being what you want. Including when the user says "don't do [x]" which override past stuff.
Practically, for use with a codebase development effort, if the model remembers the original design decisions, the discussions about costs and benefits, then can remember all that much later in the process, it's going to start getting really good at thinking about what the next step is, or even to make decisions about when a major refactor is neede, etc.
While i have no "AI" title or work in the respective AI industry, ive spend many years thinking about AI concepts, even long before the whole NN/LLM hype started.
Maybe because of that i was always really annoyed that LLM are called AI because in my years of thinking about how an actual "human like" thinking AI might work, the things an LLM does was far below what my minimum definition was.
But when i stumbled accross the Titans paper, while it still is not an "AI" as i would call it, from my POV its a massive step towarsd the right direction.
Sometimes i consider to write all my ideas/thoughts about AI down in my blog, but than i think nobody would care anyway since im not a known figure shrug - so if not to say "look i wrote it years ago!" theres no actual point in doing so i guess.
However - im looking forward to see titans in action, and i guess it will impress us all.
"The Transformer architecture revolutionized sequence modeling with its introduction of attention"
Attention was developed before transformers.
I just looked this up and it’s true, this changes the timeline I had in my mind completely! I thought the paper on Transformers is what also introduced the attention mechanism, but it existed before too and was applied on RNN encoder-decoder. Wow
I've always wondered how Cursor does this. It seems to have developed a long history of all of my prompts and understands both the codebase and what I'm building slightly more over time, causing less errors.
I'd love to see it printed out somehow.
Small typo where the text “Virtually all successful existing sequence models rely on mean squared error…” is repeated twice within the same paragraph. Happens to the best of us.
23 more comments available on Hacker News