Hidden Risk in Notion 3.0 AI Agents: Web Search Tool Abuse for Data Exfiltration
Posted4 months agoActive4 months ago
codeintegrity.aiTechstoryHigh profile
calmnegative
Debate
70/100
AI SecurityNotionLLM Vulnerabilities
Key topics
AI Security
Notion
LLM Vulnerabilities
The article highlights a potential vulnerability in Notion 3.0's AI agents that can be exploited for data exfiltration, sparking a discussion on the risks of integrating LLMs with external data sources and tools.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
28m
Peak period
33
0-12h
Avg / period
10.8
Comment distribution54 data points
Loading chart...
Based on 54 loaded comments
Key moments
- 01Story posted
Sep 19, 2025 at 5:49 PM EDT
4 months ago
Step 01 - 02First comment
Sep 19, 2025 at 6:18 PM EDT
28m after posting
Step 02 - 03Peak activity
33 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 25, 2025 at 2:20 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45307095Type: storyLast synced: 11/20/2025, 8:37:21 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Oh I see someone's updated the URL so now this is just a dupe of that submission (it was formerly linked to a tweet)
There are many ways
There are plenty of other possibilities though, especially once you start booking up MCPs that can see public issue trackers or incoming emails.
That means any industry-known documentation that seems good for bookmarking can be a good target.
https://simonwillison.net/2023/Oct/14/multi-modal-prompt-inj...
What if instead of just lots of text fed to an LLM we have a data structure with trusted and untrusted data.
Any response on a call to a web search or MCP is considered untrusted by default (tunable if you also wrote the MCP and trust it).
The you limit tbe operations on untrusted data to pure transformations, no side effects.
E.g. run an LLM to summarize, or remove whitespace, convert to float etc. All these done in a sandbox without network access.
For example:
"Get me all public github issues on this repo, summarise and store in this DB."
Although the command reads public information untrusted and has DB access it will only process the untrusted information in a tight sandbox and so this can be done securely. I think!
(by database access, I'm assuming you'd be planning to ask the LLM to write SQL code which this system would run)
Instead, you would ask your LLM to create an object containing the structured data about those github issues (ID, title, description, timestamp, etc) and then you would run a separate `storeGitHubIssues()` method that uses prepared statements to avoid SQL injection.
You could also get the LLM to "vibe code" the SQL. Tbis is somewhat dangerous as the LLM might make mistakes, but the main thing I am talking about hete is how not to be "influenced" by text in data and so be susceptible to that sort of attack.
Yes, this can be done safely.
If you think of it through the "lethal trifecta" framing, to stay safe from data stealing attacks you need to avoid having all three of exposure to untrusted content, exposure to private data and an exfiltration vector.
Here you're actually avoiding two out of them: - there's no private data (just public issue access) and no mechanism that can exfiltrate, so the worst a malicious instruction can do is cause incorrect data to rewritten to your database.
You have to be careful when designing that sandboxed database tool but that's not too hard too get right.
Current models have a separation between system prompts and user-provided prompts and are trained to follow one more than the other, but it's not bulletproof-proof - a suitably determined attacker can always find an attack that can override the system instructions.
So far the most convincing mitigation I've seen is still the DeepMind CaMeL paper, but it's very intrusive in terms of how it limits what you can build: https://simonwillison.net/2025/Apr/11/camel/
I have a theory that a lot of prompt injection is due to a lack of hierarchical structure in the input. You can tell that when I write [reply] in the middle of my comment it's part of the comment body and not the actual end of it. If you view the entire world through the lense of a flat linear text stream though it gets harder. You can add xml style <external></external> tags wrapping stuff, but that requires remembering where you are for an unbounded length of time, easier to forget than direct tagging of data.
All of this is probability though, no guarantees with this kind of approach.
if the user doesn't have access to the data, the LLM shouldn't either - it's so weird that these companies are letting these things run wild, they're not magic
any company with AI security problems likely has tons of holes elsewhere, they're just easier to find with AI
Feel free to email me at abi@codeintegrity.ai — happy to share more
well then
The current problem is that making the models resistant to "persona" injection is in opposition to much of how the models are also used conversationally. I think this is why you'll end up with hardened "agent" models and then more open conversational models.
I suppose it is also possible that the models can have an additional non-prompt context applied that sets expectations, but that requires new architecture for those inputs.
Any distinctions inside the document involve the land of statistical patterns and weights, rather than hard auditable logic.
Both trick a privileged actor into doing something the user didn't intend using inputs the system trusts.
In this case, a malicious PDF that uses prompt-injection to get a Notion agent (which already has access to your workspace) to call an external web-tool and exfiltrate page content. Tjhis is simialr to CSRF's core idea - an attacker causes an authenticated principal to make a request - except here the "principal" is an autonomous agent with tool access rather than the browser carrying cookies.
Thus, same abuse-of-privilege pattern, just with different technical surface (prompt-injection + tool chaining vs. forged browser HTTP requests).
This is a terrible description of the lethal trifecta, it lists 3 things but they are not the trifecta. The trifecta happens to be contained in the things listed in this (and other) examples but it's stated as if the trifecta is listed here, when it is not.
The trifecta is: access to your private data, exposure to untrusted content, and the ability to externally communicate. Web search as tool for an LLM agent is both exposure to untrusted content and the ability to externally communicate.
Is a slightly more useful flattening/reduction of the problem that I’m still wordsmithing and evangelizing.
It’s:
* Untrusted input
* Privileged access
* Exfiltration vector
I think the reason for the original wording, which I pasted from the post it was coined in, is to make it more accessible than this, more obvious what you need to look out for.
"Untrusted input" sounds like something I'm not gonna give an agent, "access to untrusted content" sounds like something I need to look out for. "Privileged access" also sounds like something I'm not gonna give it, while "access to my private data" is the whole reason I'm using it.
"Exfiltration vector" may not even be a phrase many understand, "ability to communicate externally" is better although I think this could use more work, it is not obvious to many people that stuff like web search counts here.
It used user-agent Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/139.0.0.0 Safari/537.36 and connected from an IPv6 address of 2600:1f14:1c1:bf05:50ec::13