Giving Users Choice with Cloudflare's New Content Signals Policy
Posted3 months agoActive3 months ago
blog.cloudflare.comTechstory
controversialnegative
Debate
80/100
AIContent RegulationWeb StandardsCopyright
Key topics
AI
Content Regulation
Web Standards
Copyright
Cloudflare introduces a new Content Signals Policy allowing website owners to restrict AI model training on their content, sparking debate about the implications for web freedom and AI development.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
47m
Peak period
15
0-3h
Avg / period
5.3
Comment distribution21 data points
Loading chart...
Based on 21 loaded comments
Key moments
- 01Story posted
Sep 24, 2025 at 2:20 PM EDT
3 months ago
Step 01 - 02First comment
Sep 24, 2025 at 3:07 PM EDT
47m after posting
Step 02 - 03Peak activity
15 comments in 0-3h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 26, 2025 at 10:02 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45364103Type: storyLast synced: 11/20/2025, 3:29:00 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
the centralized walled gardens we have today are a direct result of people confusing "free" to mean "no cost" instead of "I can do whatever I want with it"
They just don't grow as fast as the bad parts of the web.
But like email, we're perhaps drowning in spam.
For the crawlers that ignore robots.txt nothing changes of course, and for the ones which claim to support a training opt-out you just have to take them at their word that it actually does anything.
(yes I realize that was more about ad companies refusing to accept it as an "opt-in", but only as an "opt-out")
They add to the robots.txt file:
"As a condition of accessing this website, you agree to abide by the following content signals ..."
which means robot operators are now better off if they never download the robots.txt file, because then you know for sure that you won't accidentally encounter these conditions.
This creates a legal risk if you try to adhere to the robots.txt, so it'll make future AI bots less controllable and less restricted.
This isn't even like a shrinkwrap license where the bot would have to press the "I agree" button.
Cloudflare's other initiative where they help websites to block bots seems more likely to work.
On the other hand if an LLM sees this then maybe it will be confused by it.
> # ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.
But that legal restriction only applies to Europeans, so this only prevents European AI companies from developing models on par with American and Chinese AI. Correct me if I'm wrong.
Also open-source AI models like Llama and DeepSeek will be trained on all websites and released for free anyway, and those models can be used by Europeans, so in practice this policy won't really keep your website out of AI use anyway.
Ultimately it's just serving to prevent Europeans from developing equally good AI models and AI search apps.
It would also stop the brain drain.
There's also Really Simple Licensing[3] that covers these concepts (with additional capabilities and a slightly different purpose/angle).
References:
[0] https://datatracker.ietf.org/wg/aipref/about/
[1] https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/
[2] https://datatracker.ietf.org/doc/draft-ietf-aipref-attach/
[3] https://rslstandard.org/
https://worldprivacyforum.org/posts/privacy-identity-and-tru...
Not that it matters much anyway, since thankfully users always have the choice to just ignore it.