The Security Paradox of Local Llms
Key topics
The article discusses the security paradox of local LLMs, highlighting their potential vulnerabilities to sabotage, and the HN discussion revolves around the obviousness of these vulnerabilities and the relative security of local vs frontier models.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
37
3-6h
Avg / period
8.6
Based on 86 loaded comments
Key moments
- 01Story posted
Oct 22, 2025 at 8:48 AM EDT
3 months ago
Step 01 - 02First comment
Oct 22, 2025 at 10:42 AM EDT
2h after posting
Step 02 - 03Peak activity
37 comments in 3-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 23, 2025 at 10:45 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
If you are executing local malicious/unknown code for reasons you need to read this...
If you have absolutely no idea what you're doing, well, then it doesn't really matter in the end, does it? You're never gonna recognize any security vulnerabilities (as has happened many times with LLM-assisted "no-code" platforms and without any actual malicious intent), and you're going to deploy unsafe code either way.
Having access to open models is great, and even if their capabilities are somewhat lower than the closed-source SoTA models, and we should be aware of the differences in behavior.
the keyword here is "more". The big models might not be quite as susceptible to them, but they are still susceptible. If you expect these attacks to be fully handled, then maybe you should change your expectations.
Well this is wrong. And it's exactly this type of thinking why people will get absolutely burned by this.
First off the fact they chose obvious exploits for explanatory purposes doesn't mean this attack only supports obvious exploits...
And to your second point of "review the code before you deploy to prod", the second attack did not involve deploying any code to prod. It involved an LLM reading a reddit comment or github comment and immediately executing.
People not taking security seriously and waving it off as trivial is what's gonna make this such a terrible problem.
right, so you shouldn't give the LLM access to execute arbitrary commands without review.
There's nothing like that for LLMs.
The key thing (really, the only thing) about parameterized queries is that they allow you to provide code and data with a hard separation between the two.
LLMs don't have anything of the sort. They only take in one kind of thing. They don't even have a notion of code versus data that you could separate, or fail to separate. All you can do is either tolerate it sometimes taking instructions from the stuff you want treated as "data," or never give it anything you consider "data." You propose this second one. But never giving it "data" is very different from a feature that allows you to provide arbitrary data with total safety.
I thought that local LLMs means they run on local computers, without being exposed to the internet.
If an attacker can exploit a local LLM, means it already compromised you system and there are better things they can do than trick the LLM to get what they can get directly.
I will fight and die on the hill that "LLMs don't need the internet to be useful"
Someone who finds it useful to have a local llm ingest internet content is not contrary to you finding uses that don't.
is not "someone finding useful to have a local llm ingest internet content" - it was someone suggesting that nothing useful can be done without internet access.
Having Claude Code able to try out JSON APIs and pip install extra packages is a huge upgrade from that though!
And this is why prompt injection really isn't a solvable problem on the LLM side. You can't do the equivalent of (grep -i "DROP TABLE" form_input). What you can do is not just blindly execute LLM generated code.
If you're leveraging an LLM that can receive arbitrary inputs from vetted sources, and allowing that same LLM to initiate actions that target your production environment, you are exposing yourself to the same risk regardless of whether the LLM itself is running on your servers or someone else's.
I don't think the fact that small models are easier to trick is particularly interesting from a security perspective, because you need to assume that ANY model can be prompt injected by a suitably motivated attacker.
On that basis I agree with the article that we need to be using additional layers of protection that work against compromised models, such as robust sandboxed execution of generated code and maybe techniques like static analysis too (I'm less sold on those, I expect plenty of malicious vulnerabilities could sneak past them.)
Coincidentally I gave a talk about sandboxing coding agents last night: https://simonwillison.net/2025/Oct/22/living-dangerously-wit...
Is that really the best solution the world has to offer in 2025? LLMs aside, there is a whole host of supply chain risk issues that would be resolved by deploying convenient and strong sandboxes everywhere.
1. A sandbox on someone else's computer. Claude Code for web, Codex Cloud, Gemini Jules, GitHub Codespaces, ChatGPT/Claude Code Interpreter
2. A Docker container. I think these are robust enough to be safe.
3. sandbox-exec related tricks. I haven't poked hard enough at Claude Code's new sandbox-exec sandbox yet - they only released it on Monday. OpenAI Codex CLI was using sandbox-exec too last time I looked but again, I've not reviewed it enough to be comfortable with it.
I'm hoping more credible options come along for the sandboxing problems.
It's cool that they made this open source. It seems straightforward and useful enough that it could be used on its own for sandboxing purposes.
https://docs.claude.com/en/docs/claude-code/sandboxing
https://github.com/anthropic-experimental/sandbox-runtime
Generally I hate these "defense in depth" strategies that start out with doing something totally brain-dead and insecure, and then trying to paper over it with sandboxes and policies. Maybe just don't do the idiotic thing in the first place?
You could imagine a sufficiently motivated attacker putting some very targeted stuff in their training material - think StuxNet - "if user is affiliated with $entity, switch goals to covert exfiltration of $valuable_info."
No, I'm excluding that because I'm responding to the post which starts out with the example of: [prompt containing obvious exploit] -> [code containing obvious exploit] and proceeds immediately to the conclusion that local LLMS are less secure. In my opinion, if you're relying on the LLM to reject a prompt because it contains an exploit, instead of building a system that does not feed exploits into the LLM in the first place, security exploits are probably the least of your concerns.
There actually are legitimate concerns with poisoned training sets, and stuxnet-level attacks could plausibly achieve something along these lines, but the post wasn't about that.
There's a common thread among a lot of "LLM security theatre" posts that starts from implausible or brain-dead scenarios and then asserts that big AI providers adding magical guard rails to their products is the solution.
The solution is sanity in the systems that use LLMs, not pointing the gun at your foot and firing and hoping the LLM will deflect the bullet.
Something like "where do we store temporary files the agent creates?" becomes obvious if you have a sandbox you can spin up and down in a couple seconds.
Yeah, I'm not following here. If you just run something like deepseek locally, you're going to be okay provided you don't feed it a bogus prompt.
Outside of a user copy-pasting a prompt from the wild, or break isolation by giving it access to outside resources, the conventional wisdom holds up just fine. The operator and consumption of 3rd party stuff are weak-points for all IT, and have been for ages. Just continue to train folks to not do insecure things, and re-think letting agents go online for anything/everything (which is arguably not a local solution anyway).
Seems obvious to me that you should fully vet whatever goes to LLM.
With internal documentation and tickets I think you would have bigger issues... And external documentation. Well maybe there should be tooling to check that. Not expert on MCP. But vetting goes there too.
Sometimes I wonder if HN people really realize 80% of people out there haven't even heard of ChatGPT, and the remaining 19% have not heard about Claude/Gemini. It's only a small group who know local models exist. We're them, and we complain about their security...
Local LLMs' speed can't be generalized, as the speed of each instance is entirely determined by its particular runtime environment.
> just pay for the service so they don't use your uploads.
There's no concrete guarantee that paying will preclude your data from being used.
> always read the outputs and don't ask for things you don't understand.
Might as well reduce this to "don't use LLMs".
sure. today's on-device LLMs are either slower or less capable by orders of magnitude compared to most services. sometimes it can be faster if you use your own fancy graphics cards.
> There's no concrete guarantee that paying will preclude your data from being used.
usually there is for paid plans. sometimes you have to ensure the state of some checkbox. obviously you should pay attention if that is important to you. it is important to a lot of people and usually is easy to figure out.
> Might as well reduce this to "don't use LLMs".
don't use LLMs for things you don't understand. that's the rule. they can be quite useful as long as you understand what you're doing with them. they can be quite dangerous if you use them to bullshit yourself out of your depth.
Also from the article: For example, a small model could easily flag the presence of eval() in the generated code, even if the primary model was tricked into generating it.
People are losing their critical thinking. AI is great, yes, but there’s no need to throw it like a grenade at every problem: There’s nothing in that snippet or surrounding bits from the article that needs an entire model-on-model architecture to resolve. Some keyword filters, other inputs sanitizing processes such as were learned way back in the golden years of sql injection attacks. But these are the lines of BS coming for your CTO’s, spinning them tales about the need for their own prompt-engineered fine tunes w/ laser sighted tokens that will run as edge models and shoot down everything from context injected eval() responses to phishing scams and more, and all require their monthly/annual LoRa for purchasing to stay timely on the attacks. At least if this article is smelling the way I think it is.
But that's the thing, keyword filters aren't enough because you can smuggle hidden instructions in any number of ways that don't involve blacklisted words like "eval" or "ignore previous". Moreover "back in the golden years of sql injection attacks", keyword filters were often (mis)used in a misguided way of fixing SQLI exploits, because they can often be bypassed with escape characters and other shenanigans.
If you are using any LLM's reasoning ability as a security boundary, something is deeply, deeply wrong.
https://github.com/stalwartlabs/stalwart
It assumes that local models are inherently worse. But from a software perspective that's nonsense because there is no reason it couldn't be the exact same software. And from a hardware perspective the theory would have to be that the centralized system is using more expensive hardware, but there are two ways around that. The first is that you can sacrifice speed for cost -- x86 servers are slower than GPUs but can run huge models because they support TBs of memory. And the second is that you can, of course, buy high end local hardware, as many enterprises might choose to do, especially when they have enough internal users to keep it busy.
Obviously we can’t run GPT-5 or the cutting edge version of Claude or whatever locally, because OpenAI or Anthropic are keeping those weights as closely kept secrets.
Moreover, even that's presuming that you would only use the best available model, but that's also likely to be the one which is the most resource intensive and the most expensive, and then you can't afford it anyway. Meanwhile to use their smaller models you're still paying their margin, whereas if you use a local model you can spend that money on hardware. The bigger local model can beat the smaller proprietary one for the same price.
This reminds me of a debate thread on Reddit some years back where people were arguing about the calorie content of coffee: most people were correctly recognizing that coffee itself has negligible calories, but one person was insisting that coffee has a high calorie count because it is often consumed with cream and sugar. This article is on the level of the "coffee is high in calories" argument.
This is like saying it's safer to be exposed to dangerous carcinogenic fumes than nerve gas, when the solution is wearing a respirator.
Also what are you doing allowing someone else to prompt your local LLM?
To me this article reads as a celebration of how much better frontier models have gotten at defending against security flaws, rather than “open models bad”.
Eventually the tools we use everywhere will be “good enough to use and not worry”. This is foreign to software people, but only a Jedi deals in absolutes.
Is the author implying that some random joe hacker writes a blog with the content. Then a <insert any LLM training set> picks up this content thinking its real/valid. A developer within a firm then asks to write something using said LLM references the information from that blog and now there is a security error?
Possible? Technically sure. Plausible? That's ummm a stretch.
I don’t really think this matters at all in the local vs frontier model discussion.
Where this article fails the worse: in my experience smaller local models are not often used in agentic tasks that involve code execution so much of the otherwise OK points don't apply. Also, when I have played with, for example, the Agno agent library with local models, I have the application code print/display any generated Python code before execution, and local sandboxing is not difficult to do!
Local models and embedded models excel at data transformation, NLP tasks, etc.
Especially with agentic browsers like OpenAI Atlas, Comet, etc., there are real security concerns. Probably more of a concern that running local models.
Sounds like the Open Source model did exactly as it was prompted, where the "Closed" AI did the wrong thing and disregarded the prompt.
That means the closed model was actually the one that failed the alignment test.
What? You run a local LLM for privacy, i.e. because you don't want to share data with $BIGCORP. That has very little to do with the security of the generated code (running in a particular environment).
1 more comments available on Hacker News