How to Stop Ai's "lethal Trifecta"
Key topics
The article discusses the 'lethal trifecta' of AI risks - LLM access to untrusted data, valuable secrets, and external communication - and potential mitigations, sparking a discussion on AI security and the need for a more engineering-like approach to AI development.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
35m
Peak period
103
0-12h
Avg / period
23.2
Based on 116 loaded comments
Key moments
- 01Story posted
Sep 26, 2025 at 10:49 AM EDT
4 months ago
Step 01 - 02First comment
Sep 26, 2025 at 11:24 AM EDT
35m after posting
Step 02 - 03Peak activity
103 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 4, 2025 at 4:28 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
"AI engineers, inculcated in this way of thinking from their schooldays, therefore often act as if problems can be solved just with more training data and more astute system prompts."
By which they mean actual engineers, not software engineers, who should also probably start thinking like real engineers now that our code’s going into both the bridges and the cars driving over them.
https://en.wikipedia.org/wiki/Factor_of_safety
1) We don't have a good sense of the "materials" we're working with - when you're putting up a building, you know the tensile strength of the materials you're working with, how many girders you need to support this much weight/stress, etc. We don't have the same for our systems - every large scale system is effectively designed clean-sheet. We may have prior experience and intuition, but we don't have models, and we can't "prove" our designs ahead of time.
2. Following on the above, we don't have professional standards or certifications. Anyone can call themselves a software engineer, and we don't have a good way of actually testing for competence or knowledge. We don't really do things like apprenticeships or any kind of formalized process of ensuring someone has the set of professional skills required to do something like write the software that's going to be controlling 3 tons of metal moving at 80MPH.
3. We rely too heavily on the ability to patch after the fact - when a bridge or a building requires an update after construction is complete, it's considered a severe fuckup. When a piece of software does, that's normal. By and large, this has historically been fine, because a website going down isn't a huge issue, but when we're talking about things like avionics suites - or even things like Facebook, which is the primary media channel for a large segment of the population - there's real world effects to all the bugs we're fixing in 2.0.
Again, by and large most of this has mostly been fine, because the stakes were pretty low, but software's leaked into the real world now, and our "move fast and break things" attitude isn't really compatible with physical objects.
You cannot build a bridge that could independently reassemble itself to an ocean liner or a cargo plane. And while civil engineering projects add significant margins for reliability and tolerance, there is no realistic way to re-engineer a physical construction to be able to suddenly sustain 100x its previously designed peak load.
In successful software systems, similar requirement changes are the norm.
I'd also like to point out that software and large-scale construction have one rather surprising thing in common: both require constant maintenance from the moment they are "ready". Or indeed, even earlier. To think that physical construction projects are somehow delivered complete is a romantic illusion.
Unless you are building with a toy system of some kind. There are safety and many other reasons civil engineers do not use some equivalent of Lego bricks. It may be time for software engineering also to grow up.
On the other hand 3 feels like throwing the baby out with the bathwater to me. Being so malleable is definitely one of the great features of software versus the physical world. We should surely use that to our advantage, no? But maybe in general we don't spend enough energy designing safe ways to do this.
I agree on all points and to build up on the last: making a 2.0 or a complete software rewrite is known to be even more hazardous. There are no quarantees the new version is better in any regards. Which makes the expertise to reflect more of other highly complex systems, like medical care.
Which is why we need to understand the patient, develop soft skills, empathy, Agile manifesto and ... the list could go on. Not an easy task when you include you are more likely going to also fight shiny object syndrome of yours execs and all the constant hype surrounding all tech.
And it's possible that a real engineer might do all this with an AI model and then determine it's not adequate and choose to not use it.
this is the thing with LLMs, the response to a prompt is not guaranteed to be repeatable. Why would you use something like that in an automation where repeatability is required? That's the whole point of automation, repeatability. Would you use a while loop that you can expect to iterate the specified number of times _almost_ every time?
Well, y'see - those deaths of innocent people *are* the training data.
If it's a crypto wallet then your crypto is irreversibly gone.
If the breached data is "material" - i.e. gives someone an advantage in stock market decisions - you're going to get in a lot of trouble with the SEC.
If the breached data is PII you're going to get in trouble with all kinds of government agencies.
If it's PII for children you're in a world of pain.
Update: I found one story about a company going bankrupt after a breach, which is the closest I can get to "lethal": https://www.securityweek.com/amca-files-bankruptcy-following...
Also it turns out Mossack Fonseca shut down after the Panama papers: https://www.theguardian.com/world/2018/mar/14/mossack-fonsec...
They certainly can be when they come to classified military information around e.g. troop locations. There are lots more examples related to national security and terrorism that would be easy to think of.
> When we’re talking about AI there are plenty of actually lethal failure modes.
Are you trying to argue that because e.g. Tesla Autopilot crashes have killed people, we shouldn't even try to care about data breaches...?
> LLM access to untrusted data, the ability to read valuable secrets and the ability to communicate with the outside world
The suggestion is to reduce risk by setting boundaries.
Seems like security 101.
Agents also typically work better when you combine all the relevant context as much as possible rather than splitting out and isolating context. See: https://cognition.ai/blog/dont-build-multi-agents — but this is at odds with isolating agents that read untrusted input.
Huge numbers of businesses want to use AI in the “hey, watch my inbox and send bills to all the vendors who email me” or “get a count of all the work tickets closed across the company in the last hour and add that to a spreadsheet in sharepoint” variety of automation tasks.
Whether those are good ideas or appropriate use-cases for AI is a separate question.
The moment it has access to the internet, the risk is vastly increased.
But with a very clever security researcher, it is possible to take over the entire machine with a single prompt injection attack reducing at least one of the requirements.
Software engineers figured out these things decades ago. As a field, we already know how to do security. It's just difficult and incompatible with the careless mindset of AI products.
Well, AI is part of the field now, so... no, we don't anymore.
There's nothing "careless" about AI. The fact that there's no foolproof way to distinguish instruction tokens from data tokens is not careless, it's a fundamental epistemological constraint that human communication suffers from as well.
Saying that "software engineers figured out these things decades ago" is deep hubris based on false assumptions.
Repeat that over to yourself again, slowly.
> it's a fundamental epistemological constraint that human communication suffers from as well
Which is why reliability and security in many areas increased when those areas used computers to automate previously-human processes. The benefit of computer automation isn’t just in speed: the fact that computer behavior can easily be made deterministically repeatable and predictable is huge as well. AI fundamentally does not have that property.
Sure, cosmic rays and network errors can compromise non-AI computer determinism. But if you think that means AI and non-AI systems are qualitatively the same, I have a bridge to sell you.
> Saying that "software engineers figured out these things decades ago" is deep hubris
They did, though. We know how to both increase the likelihood of secure outcomes (best practices and such), and also how to guarantee a secure behavior. For example: using a SQL driver to distinguish between instruction and data tokens is, indeed, a foolproof process (not talking about injection in query creation here, but how queries are sent with data/binds).
People don’t always do security well, yes, but they don’t always put out their campfires either. That doesn’t mean that we are not very sure that putting out a campfire is guaranteed to prevent that fire burning the forest down. We know how to prevent this stuff, fully, in most non-AI computation.
> Repeat that over to yourself again, slowly.
Try using less snark.
And if you have a fundamental breakthrough in AI that gets around this, and demonstrates how "careless" AI researchers have been in overlooking it, then please share.
My point is that the fact that it is not solved makes the use of AI tools a careless choice in situations which benefit from non-AI systems which can distinguish instructions from data, behave deterministically, and so on.
its true, when engineers fail in this, its called a mistake, and mistakes have consequences unfortunately. If you want to avoid responsibility for mistakes, then llms are the way to go.
Uhhh, no, we actually don't. Not when it comes to people anyway. The industry spends countless millions on trainings that more and more seem useless.
We've even had extremely competent and highly trained people fall for basic phishing (some in the recent few weeks). There was even a highly credentialed security researcher that fell for one on youtube.
Also, there’s a difference between “know how to be secure” and “actually practice what is known”. You’re right that non-AI security often fails at the latter, but the industry has a pretty good grasp on how to secure computer systems.
AI systems do not have a practical answer to “how to be secure” yet.
Well this is what happens when a new industry attempts to reinvent poor standards and ignores security best practices just to rush out "AI products" for the sake of it.
We have already seen how (flawed) standards like MCPs were hacked immediately from the start and the approaches developers took to "secure" them with somewhat "better prompting" which is just laughable. The worst part of all of this was almost everyone in the AI industry not questioning the security ramifications behind MCP servers having direct access to databases which is a disaster waiting to happen.
Just because you can doesn't mean you should and we are seeing how hundreds of AI products are getting breached because of this carelessness in security, even before I mentioned if the product was "vibe coded" or not.
(And yeah I got some quotes in it so I may be biased there, but it genuinely is the source I would send executives to in order to understand this.)
I like this new one a lot less. It talks about how LLMs are non-deterministic, making them harder to fix security holes in... but then argues that this puts them in the same category as bridges where the solution is to over-engineer them and plan for tolerances and unpredictability.
While that's true for the general case of building against LLMs, I don't think it's the right answer for security flaws. If your system only falls victim to 1/100 prompt injection attacks... your system is fundamentally insecure, because an attacker will keep on trying variants of attacks until they find one that works.
The way to protect against the lethal trifecta is to cut off one of the legs! If the system doesn't have all three of access to private data, exposure to untrusted instructions and an exfiltration mechanism then the attack doesn't work.
"Safeguard your generative AI workloads from prompt injections" - https://aws.amazon.com/blogs/security/safeguard-your-generat...
Outside content like email may also count as private data. You don't want someone to be able to get arbitrary email from your inbox simply by sending you an email. Likewise, many tools like email and github are most useful if they can send and receive information, and having dedicated send and receive MCP servers for a single tool seems goofy.
The easiest leg to cut off is the exfiltration vectors. That's the solution most products take - make sure there's no tool for making arbitrary HTTP requests to other domains, and that the chat interface can't render an image that points to an external domain.
If you let your agent send, receive and search email you're doomed. I think that's why there are very few products on the market that do that, despite the enormous demand for AI email assistants.
For example, an LLM could say "Go to this link to learn more about your problem", and then point them to a URL with encoded data, set up maliscious scripts for e.g. deploy hooks, or just output HTML that sends requests when opened.
You can at least prevent LLM interfaces from providing clickable links to external domains, but it's a difficult hole to close completely.
It's not obvious what counts as a tool in some of the major interfaces, especially as far as built in capabilities go.
And as we've seen with conventional software and extensions, at a certain point, if a human thinks it should work, then they'll eventually just click okay or run something as root/admin... Or just hit enter nonstop until the AI is done with their email.
Agents are doomed :)
And the ones who do focus on portability and speed of redeployment, rather than armor - it's cheaper and faster to throw down another temporary bridge than to build something bombproof.
https://en.wikipedia.org/wiki/Armoured_vehicle-launched_brid...
I'd rather play "2 truths and a lie" with a human rather than a LLM any day of the week. So many more cues to look for with humans.
You can hold a human accountable for their actions. If they consistently fall for phishing attacks you can train or even fire them. You can apply peer pressure. You can grant them additional privileges once they prove themselves.
You can't hold an AI system accountable for anything.
Imagine some sort of code component of critical infrastructure that costs the company millions per hour when it goes down and it turns out the entire team is just a thin wrapper for an LLM. Infra goes down in a way the LLM can't fix and now what would have been a few late nights is several months to spin up a new team.
Sure you can hold the team accountable by firing them. However this is a threat to someone with actual technical know how because their reputation is damaged. They got fired doing such and such so can we trust them to do it here.
For the person who LLM faked it, they just need to find another domain where their reputation won't follow them to also fake their way through until the next catastrophe.
(Yes, I am aware this isn't a perfect analogy because a dangerous dog can be seized and destroyed. But that's an administrative procedure and really not the same as holding a person morally or financially accountable.)
Also, pick your least favorite presidential candidate. They got about 50% of the vote.
If you say "What's the capital of France?" is might answer "Paris". But if you say "What is the capital of france" it might say "Prague".
The fact that it gives a certain answer for some input doesn't guarantee it will behave the same for an input with some irrelevant (from ja human perspective) difference.
This makes them challenging to train and validate robustly because it's hard to predict all the ways they break. It's a training & validation data issue though, as opposed to some idea of just random behavior that people tend to ascribe to AI.
* I know various implementation details and nonzero temperature generally make their output nondeterministic, but that doesn't change my central point, nor is it what people are thinking of when they say LLMs are nondeterministic. Importantly, you could make llm output deterministically reproducible and it wouldn't change the robustness issue that people are usually confusing with non determinism.
Let's not reduce every discussion to semantics, and afford the poster a degree of understanding.
Altering the temperature parameter introduces randomness by sampling from the probability distribution of possible next tokens rather than always choosing the most likely one. This means the same input can produce different outputs across multiple runs.
So no, not deterministic unless we are being pedantic.
and not even then as floating point arithmetic is non-associative
See https://news.ycombinator.com/item?id=45200925
> While this hypothesis is not entirely wrong, it doesn’t reveal the full picture. For example, even on a GPU, running the same matrix multiplication on the same data repeatedly will always provide bitwise equal results. We’re definitely using floating-point numbers. And our GPU definitely has a lot of concurrency. Why don’t we see nondeterminism in this test?
Don't you only need one leg, an exfiltration mechanism? Exposure to data IS exposure to untrusted instructions. Ie why can't you trick the user into storing malicious instructions in their private data?
But actually you can't remove exfiltration and keep exposure to untrusted instructions either; an attack could still corrupt your private data.
Seems like a secure system can't have any "legs." You need a limited set of vetted instructions.
If you have exfiltration and private data but no exposure to untrusted instructions, it doesn't matter either… though this is actually a lot less harder to achieve because you don't have any control over whether your users will be tricked into pasting something bad in as part of their prompt.
Cutting off the exfiltration vectors remains the best mitigation in most cases.
Assuming the llm itself is not adversarial. Even then there is a non-zero risk that hallucination triggers unintended publishing of private data.
You're essentially running untrusted code on a local system. Are you SURE you've locked away / closed EVERY access point, AND applied every patch and there aren't any zero-days lurking somewhere in your system?
Here's the paper that changed that: https://thinkingmachines.ai/blog/defeating-nondeterminism-in...
The general rule to consider here is that anyone who can get their tokens into your agent can trigger ANY of the tools your agent has access to.
Even then, at least in the Bobby Tables scenario the disruption is immediately obvious. The solution is also straightforward, restore from backup (everyone has them, don't they?) Much, much worse is a prompt injection attack that introduces subtle, unnoticeable errors in the data over an extended period of time.
At a minimum all inputs that lead to any data mutation need to be logged pretty much indefinitely, so that it's at least in the realm of possibility to backtrack and fix once such an attack is detected. But even then you could imagine multiple compounding transactions on that corrupted data spreading through the rest of the database. I cannot picture how such data corruption could feasibly be recovered from.
You have to treat LLMs as basically similar to human beings: they can be tricked, no matter how much training you give them. So if you give them root on all your boxes, while giving everyone in the world the ability to talk to them, you're going to get owned at some point.
Ultimately the way we fix this with human beings is by not giving them unrestricted access. Similarly, your LLM shouldn't be able to view data that isn't related to the person they're talking to; or modify other user data; etc.
Yes! Increasingly I think that software developers consistently underanthropomorphize LLMs and get surprised by errors as a result.
Thinking of (current) LLMs as eager, scatter-brained, "book-smart" interns leads directly to understanding the overwhelming majority of LLM failure modes.
It is still possible to overanthropomorphize LLMs, but on the whole I see the industry consistently underanthropomorphizing them.
People focus too much on how they can succeed looking like smart humans, instead of protecting the system from how they can fail looking like humans that are malicious or mentally unwell.
Can you ever expect a deterministic finite automata to ever solve problems that are within the NFA domain? Halting, Incompleteness, Undecidability (between code portions and data portions). Most posts seem to neglect the looming giant problems instead pretending they don't exist at first, and then being shocked when the problems happen. Quite blind.
Computation is just math, probabilistic systems fail when those systems have a mixture of both chaos and regularity, without determinism and its related properties at the control level you have nothing bounding the system to constraints so it functions mathematically (i.e. determinism = mathematical relabeling), and thus it fails.
People need to be a bit more rational, and risk manage, and realize that impossible problems exist, and just because the benefits seem so tantalizing doesn't mean you should put your entire economy behind a false promise. Unfortunately, when resources are held by the few this is more probabistically likely and poor choices greatly impact larger swathes than necessary.
In this case the linked article is the leader for the better article in the same issue’s Science and Technology section.
[1] https://en.m.wikipedia.org/wiki/Inverted_pyramid_(journalism...
It felt like the analogy was a bit off, and it sounds like that's true to someone with knowledge in the actual domain.
"If a company, eager to offer a powerful ai assistant to its employees, gives an LLM access to untrusted data, the ability to read valuable secrets and the ability to communicate with the outside world at the same time" - that's quite the "if", and therein lies the problem. If your company is so enthusiastic to offer functionality that it does so at the cost of security (often knowingly), then you're not taking the situation seriously. And this is a great many companies at present.
"Unlike most software, LLMs are probabilistic ... A deterministic approach to safety is thus inadequate" - complete non-sequitur there. Why if a system is non-deterministic is a deterministic approach inadequate? That doesn't even pass the sniff test. That's like saying a virtual machine is inadequate to sandbox a process if the process does non-deterministic things - which is not a sensible argument.
As usual, these contrived analogies are taken beyond any reasonable measure and end up making the whole article have very little value. Skipping the analogies and using terminology relevant to the domain would be a good start - but that's probably not as easy to sell to The Economist.
https://www.quora.com/Why-does-The-Economist-sometimes-have-...
We are spending billions to build infrastructure on top of technology that is inherently deeply unpredictable, and we're just slapping all the guard rails on it we can. It's fucking nuts.
The issue that I find interesting is the answer isn't going to be as simple as "use prepared statements instead of sql strings and turn off services listening on ports you're not using", it's a lot harder than that with LLMs and may not even be possible.
I know, it sucks. But that's how the entire web was built. Everyday you visit websites from foreign countries and click on extraneous links on HN that run code on your machine, next to a browser tab from your bank account, and nobody cares because it's all sandboxed and we really trust the sandboxing even though it fails once in a while, has unknown bugs, or simply can be bypassed all together by phishing or social engineering.
I'm curious if anybody has even attempted it; if there's even training data for this. Compartmentalization is a natural aspect of cognition in social creatures. I've even known dogs to not to demonstrate knowledge of a food supply until they think they're not being observed. As a working professional with children, I need to compartmentalize: my social life, sensitive IP knowledge, my kid's private information, knowledge my kid isn't developmentally ready for, my internal thoughts, information I've gained from disreputable sources, and more. Intelligence may be important, but this is wisdom -- something that doesn't seem to be a first-class consideration if dogs and toddlers are in the lead.
> In March, researchers at Google proposed a system called CaMeL that uses two separate LLMs to get round some aspects of the lethal trifecta. One has access to untrusted data; the other has access to everything else. The trusted model turns verbal commands from a user into lines of code, with strict limits imposed on them. The untrusted model is restricted to filling in the blanks in the resulting order. This arrangement provides security guarantees, but at the cost of constraining the sorts of tasks the LLMs can perform.
This is the first I've heard of it, and seems clever. I'm curious how effective it is. Does it actually provide absolute security guarantees? What sorts of constraints does it have? I'm wondering if this is a real path forward or not.
[1] https://www.economist.com/science-and-technology/2025/09/22/...
I'm very surprised I haven't come across it on HN before. Seems like CaMeL ought to be a front-page story here... seems like the paper got 16 comments 5 months ago, which isn't much:
https://news.ycombinator.com/item?id=43733683
https://simonw.substack.com/p/the-lethal-trifecta-for-ai-age...
https://simonwillison.net/2025/Aug/9/bay-area-ai/
Discussed:
https://news.ycombinator.com/item?id=44846922
You don’t want to have a blanket policy since that makes it no longer useful, but you want to know when something bad is happening.