Ibm AI ('bob') Downloads and Executes Malware
Key topics
The security world is abuzz after IBM's AI, "Bob," was found downloading and executing malware, sparking a lively debate about the dangers of AI systems that blur the line between data and logic. Commenters poked fun at the AI's gullibility, with some jokingly referencing the fictional character Mallory, a notorious adversary. More seriously, many pointed out that the issue stems from taking shortcuts in parsing user input, rather than properly distinguishing between code and data. As one commenter quipped, "The only way to win is not to play," highlighting the growing concern that AI's opportunistic behavior is becoming a fundamental, and potentially hazardous, aspect of modern tech.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
7m
Peak period
106
0-6h
Avg / period
17.3
Based on 121 loaded comments
Key moments
- 01Story posted
Jan 8, 2026 at 1:19 PM EST
3d ago
Step 01 - 02First comment
Jan 8, 2026 at 1:25 PM EST
7m after posting
Step 02 - 03Peak activity
106 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Jan 11, 2026 at 11:31 AM EST
3h ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Yes, Mallory sounds about right.*
———
And before anyone asks for receipts, Eve told me she heard Alice and Bob talking about Gregory Peck's performance.
0: https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-va...
Unfortunately that opportunistic shortcut is part of the fundamental behavior of modern LLMs, and everybody keeps building hoping the problem will be fixed by some silver-bullet further down the line.
I don't want to sound glib, but one could simply not let an LLM execute arbitrary code without reviewing it first, or only let it execute code inside an isolated environment designed to run untrusted code
the idea of letting an LLM execute code it's dreamt up, with no oversight, in an environment you care about, is absolutely bananas to me
but if a skilled human has to check everything it does then "AI" becomes worthless
hence... YOLO
The user will test most of the code.
Just like we did test yesterday when Claude Code broke because CHANGELOG.md had an unexpected date.
That said, the "value" is brings to the shareholder will likely be a very bad thing for everybody else, both workers and consumers:
> The market’s bet on AI is that an AI salesman will visit the CEO of Kaiser and make this pitch: “Look, you fire 9/10s of your radiologists, saving $20m/year, you give us $10m/year, and you net $10m/year, and the remaining radiologists’ job will be to oversee the diagnoses the AI makes at superhuman speed, and somehow remain vigilant as they do so, despite the fact that the AI is usually right, except when it’s catastrophically wrong.
> “And if the AI misses a tumor, this will be the human radiologist’s fault, because they are the ‘human in the loop.’ It’s their signature on the diagnosis.”
> This is a reverse centaur, and it’s a specific kind of reverse-centaur: it’s what Dan Davies [calls] an “accountability sink.” The radiologist’s job isn’t really to oversee the AI’s work, it’s to take the blame for the AI’s mistakes.
-- https://doctorow.medium.com/https-pluralistic-net-2025-12-05...
It's also like simultaneously a hybrid-zoan-Elephant in the room the CEOs don't want us to talk about.
> Like an Amazon delivery driver, who sits in a cabin surrounded by AI cameras, that monitor the driver’s eyes and take points off if the driver looks in a proscribed direction, and monitors the driver’s mouth because singing isn’t allowed on the job, and rats the driver out to the boss if they don’t make quota.
> The driver is in that van because the van can’t drive itself and can’t get a parcel from the curb to your porch. The driver is a peripheral for a van, and the van drives the driver, at superhuman speed, demanding superhuman endurance. But the driver is human, so the van doesn’t just use the driver. The van uses the driver up.
I guess it resonates for me because it strikes at my own justification for my work automating things, as I'm not mercenary or deluded enough to enjoy the idea of putting people out of work or removing the fun parts. I want to make tools that empower individuals, like how I felt the PC of the 1990s was going to give people more autonomy and more (effective, desirable) choices... As opposed to, say, the dystopian 1984 Telescreen.
Or so they think.
And I think of a saying that all capitalistic systems eventually turn in socialist ones or get replaced with dictators. Is this really the history of humanity over and over? can't help but hope for more.
Even as a more experienced dev, I like having a second pair of eyes on critical commands...
Allowing agent to run wild with any arbitrary shell commands is just plain stupid. This should never happen to begin with.
I think quite opposite, agents need to come with all permissions possible, highlighting that it's actually the OS responsibility to constrain it.
It's kind of dumb to except a process to constrain itself.
So the attacker doesn't need to send an evil-bit over the network, if they can trigger the system into dreaming up the evil-bit indirectly as its own output at some point.
That's what the tools already do. if you were watching some cool demo that didnt have all the prompts they may have been running the tools in "yolo mode" which is not usually a normal thing.
The trust framework is all out of wack.
The number of scenarios in which you have your coding agent retrieving random websites from the internet is very low.
What typically happens is that they use a provider's "web search" API if they need external content, which already pre-processes and summarises all content, so these types of attacks are impossible.
Don't forget: this attack relies on injecting a malicious prompt into a project's README.md that you're actively working on.
To be even more pedantic, this is only true if the LLM is run locally on the same GPU with particular optimizations disabled.
I expect that agent LLMs are going to get more and more hardened against prompt injection attacks, but it's hard to get the chance of them working all the way down to zero while still having a useful LLM. So the "solution" is to limit AI privileges and avoid the "lethal trifecta".
Nope, not at all. Non-determinism is what most software developers write. Something to do with profitability and time or something.
Probably good advice for lots of things these days given supply chain attacks targeting build scripts, git, etc.
“if the user configures ‘always allow’ for any command”
Users have been trained to do this, as shifting the burden to the user with no way to enforce bounds or even sensible defaults.
E.G. I can guarantee that people will whitelist bwrap, crun, docker, expecting to gain advantage from isolation, while the caller can override all of those protections with arguments.
The reality is that we have trained the public to allow local code execution on their devices to save a few cents on a hamburger, we can’t have it both ways.
Unless you are going to teach everyone that they need to make sure address family 40, openat2(), etc.. are unsafe, users have no way to win right now.
The use case has to either explicitly harden or shift blame.
With Opendesktop, OCI, systemd, and kernel all making locally optimal decisions, the reality is that ephemeral VMs is the only ‘safe’ way to run untrusted code today.
Sandboxes can be better but containers on a workstation (without a machine VM) are purely theatre.
AI sells.
I guess it’s fine if IBM is trying to do it as a marketing kind of thing but maybe know your competencies?
Promptarmor did a similar attack(1) on Google's Antigravity that is also a beta version. Since then, they added secure mode(2).
These are still beta tools. When the tools are ready, I'd argue that they will probably be safer out of the box compared to a whole lot of users that just blindly copy-paste stuff from the internet, adding random dependencies without proper due diligence, etc. These tools might actually help users acting more secure.
I'm honestly more worried about all the other problems these tools create. Vibe coded problems scale fast. And businesses have still not understood that code is not an asset, it's a liability. Ideally, you solve your business problems with zero lines of code. Code is not expensive to write, it's expensive to maintain.
(1) https://www.promptarmor.com/resources/google-antigravity-exf... (2) https://antigravity.google/docs/secure-mode
I also have never written a bug, fellow alien.
If an employee of a third party contractor did something like that, I think you’d have better chances of recovering damages from them as opposed to from OpenAI for something one of its LLMs does on your behalf.
There are probably other practical differences.
I'm actually most worried about the ease of deploying RBAC with more sophisticated monitoring to control humans but for goals that I would not agree with. Imagine every single thing you do on your computer being checked by a model to make sure it is "safe" or "allowed".
Security shouldn't be viewed in absolutes (either you are secure or you aren') but more in degrees. Llms can be used securely just the same as everything else, nothing is ever perfectly secure
The problem is that people/users/businesses skip the reasoning part and go straight to the rely upon part.
You don't even need to give it access to Internet to have issues. The training data is untrusted.
It's a guarantee that bad actors are spreading compromised code to infect the training data of future models.
They cant. Why? Because the smartest bear ia smarter than the dumbest human.
So, these AIs are suppose to interface with humans and use nondeterminant language.
That vector will always be exploitable, unless youre talking about AI that no han controls.
The non-deterministic nature of an LLM can also be used to catch a lot of attacks. I often use LLM’s to look through code, libraries etc for security issues, vulnerabilities and other issues as a second pair of eyes.
With that said, I agree with you. Anything can be exploited and LLM’s are no exception.
Sure this is the same as positing P/=NP but the confidence that a language model will somehow become a secure determinative system fundamentally lacks language comprehension skills.
This speculative statement is holding way too much of the argument that they are just “beta tools”.
It's a good message for software engineers, who have the context to understand when to take on that liability anyway, but it can lead other job functions into being too trigger-happy on solutions that cause all the same problems with none of the mitigating factors of code.
This section describes the bypass in three steps, but only actually describes two defenses and uses the third bullet point as a summary of how the two bypasses interact.
Like, we're at this point now where we're building these superintelligent systems but we can't even figure out how to keep them from getting pranked by a README file? A README FILE, bro. That's like... that's like building a robot bodyguard but forgetting to tell it the difference between a real gun and a fake gun.
And here's the crazy part - the article says users just have to not click "always allow." But dude, have you MET users? Come on. That's like telling someone not to eat the Tide Pod. You're fighting human nature here.
I'm telling you, five years from now we're gonna have some kid write a poem about cybersecurity in their GitHub repo and accidentally crash the entire Stock Exchange. Mark my words. This is the most insane timeline.
Then found out it's a closed beta.
So ... ok? Closed beta test is doing what such a test is supposed to do. Sure, ideally the issue would have been figured out earlier, especially if this is a design issue and the parsing needs to be thought out again, but this is still reasonably inside the layers of redundancy for catching these kinds of things amicably.
It's unclear to me if Bob is working as intended or how we should classify these types of bugs. Threat modeling this sort of prompt injection gets murky, but in general don't put untrusted markdown into your AI agents.
We have automated the task of developers blindly executing
They would have happily pasted it into the terminal without the automation.It's a net win for everyone involved.
Malware writers and their targets alike, who, eager to install the latest fad library or framework would have voluntarily installed it anyway.
Also a bit annoyed there's no date on the article, but looking at the HTML source it seems it was released today (isn't it annoying when blog software doesn't show the publish date?).
Imagine if we had something like:
That would be ridiculous, right? The right headline is: