Giving Users Choice with Cloudflare's New Content Signals Policy

Posted3 months agoActive3 months ago

thm

21 points

21 comments

blog.cloudflare.comTechstory

controversialnegative

Debate

80/100

AIContent RegulationWeb StandardsCopyright

Key topics

Content Regulation

Web Standards

Cloudflare introduces a new Content Signals Policy allowing website owners to restrict AI model training on their content, sparking debate about the implications for web freedom and AI development.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

47m

Peak period

0-3h

Avg / period

5.3

Comment distribution21 data points

Loading chart...

Based on 21 loaded comments

Key moments

01Story posted
Sep 24, 2025 at 2:20 PM EDT
3 months ago
Step 01
02First comment
Sep 24, 2025 at 3:07 PM EDT
47m after posting
Step 02
03Peak activity
15 comments in 0-3h
Hottest window of the conversation
Step 03
04Latest activity
Sep 26, 2025 at 10:02 AM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (21 comments)

Showing 21 comments

xnx

3 months ago

3 replies

The web has fallen so far from "information wants to be free".

dingnuts

3 months ago

that definition of free was always about liberty, not cost! serving information costs money. requesters aren't entitled to the bandwidth costs!

the centralized walled gardens we have today are a direct result of people confusing "free" to mean "no cost" instead of "I can do whatever I want with it"

jopsen

3 months ago

The good parts still exist.

They just don't grow as fast as the bad parts of the web.

But like email, we're perhaps drowning in spam.

userbinator

3 months ago

"Information wants to be monetized."

bangaladore

3 months ago

1 reply

As this is a robots.txt, it's still not enforced. So how much good can this really do?

jsheard

3 months ago

1 reply

If nothing else, it might provide a more consistent way to signal to the crawlers which do respect robots.txt. For example Google, Bing and Apple already offer ways to signal that your site approves of search indexing but not training, but they each require a different non-standard signal for the same thing.

For the crawlers that ignore robots.txt nothing changes of course, and for the ones which claim to support a training opt-out you just have to take them at their word that it actually does anything.

bangaladore

3 months ago

1 reply

Feels like if any bot doesn't respect it, then inevitably the data ends up in every bot's training data.

LocalH

3 months ago

Same thing unfairly killed "Do Not Track". Thanks Microsoft

(yes I realize that was more about ad companies refusing to accept it as an "opt-in", but only as an "opt-out")

fxtentacle

3 months ago

1 reply

This is massively counterproductive!

They add to the robots.txt file:

"As a condition of accessing this website, you agree to abide by the following content signals ..."

which means robot operators are now better off if they never download the robots.txt file, because then you know for sure that you won't accidentally encounter these conditions.

This creates a legal risk if you try to adhere to the robots.txt, so it'll make future AI bots less controllable and less restricted.

skybrian

3 months ago

2 replies

That would be an interesting court case. I'm doubtful that companies will be held to agreements that they didn't even see and even their bots didn't explicitly agree to?

This isn't even like a shrinkwrap license where the bot would have to press the "I agree" button.

Cloudflare's other initiative where they help websites to block bots seems more likely to work.

On the other hand if an LLM sees this then maybe it will be confused by it.

yjftsjthsd-h

3 months ago

1 reply

It is my impression - feel free to correct me, I'm no lawyer - that in the USA this case was interesting, and has already happened: https://en.wikipedia.org/wiki/HiQ_Labs_v._LinkedIn

skybrian

3 months ago

That is interesting, but the implications seem complicated. I wonder how a court would rule in a different case?

LocalH

3 months ago

Under current precedent, I don't really think clicking "I agree" means anything.

serebii_

3 months ago

2 replies

They also suggest you add this line to the robots.txt file:

> # ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET.

But that legal restriction only applies to Europeans, so this only prevents European AI companies from developing models on par with American and Chinese AI. Correct me if I'm wrong.

Also open-source AI models like Llama and DeepSeek will be trained on all websites and released for free anyway, and those models can be used by Europeans, so in practice this policy won't really keep your website out of AI use anyway.

Ultimately it's just serving to prevent Europeans from developing equally good AI models and AI search apps.

Yeul

3 months ago

If the EU had balls it would block American AI just as China told Google to fuck off.

It would also stop the brain drain.

LocalH

3 months ago

It's always a race to the bottom. Either be unscrupulous, or have your lunch eaten by unscrupulous ones

straks

3 months ago

1 reply

I'm curious why they decided to go after their own implementation while the IETF AI Preference working group[0] is actively working on a similar concept to define vocabulary[1] and a way to attach[2] it to HTTP responses and robots.txt. (Which still has plenty open dicussions, but still)

There's also Really Simple Licensing[3] that covers these concepts (with additional capabilities and a slightly different purpose/angle).

References:

[0] https://datatracker.ietf.org/wg/aipref/about/

[1] https://datatracker.ietf.org/doc/draft-ietf-aipref-vocab/

[2] https://datatracker.ietf.org/doc/draft-ietf-aipref-attach/

[3] https://rslstandard.org/

straks

3 months ago

I stand corrected, seems like they have at least informed the working group of its existence.

jimanchowah

3 months ago

curious why c2pa/cawg signals for expressing creator assertions are not used here? Cloudflare adoption of c2pa is mentioned here, along with details about the cawg stuff:

https://worldprivacyforum.org/posts/privacy-identity-and-tru...

1gn15

3 months ago

From the title, I thought this was about giving (end) users choice, not giving website owners the choice to restrict their users.

Not that it matters much anyway, since thankfully users always have the choice to just ignore it.

daft_pink

3 months ago

I’m really curious how this plays out when you start using an ai browser like comet.

View full discussion on Hacker News

ID: 45364103Type: storyLast synced: 11/20/2025, 3:29:00 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN