Web Bot Auth

4 months ago

2 replies

Do you have a better alternative?

ATechGuy

4 months ago

1 reply

Have you looked into open-source alternatives? I'm assuming that it's a pressing problem for you, and you have already explored alternatives.

https://anubis.techaro.lol/ ?

4 months ago

1 reply

I have, sadly they are basically worthless and often worse then worthless as they negatively impact the site.

ATechGuy

4 months ago

Interesting. Care to list them here so that we all can learn.

yjftsjthsd-h

4 months ago

1 reply

4 months ago

1 reply

Your browser is configured to disable cookies. Anubis requires cookies for the legitimate interest of making sure you are a valid client. Please enable cookies for this domain.

Thing is, my browser isn’t configured that way. So works well, I guess.

yjftsjthsd-h

4 months ago

1 reply

The target was better than cloudflare, which also demands cookies but with more tracking. This is still better.

4 months ago

I have not disabled cookies. Cloudflare works fine. Users being able to access a website is a pretty important metric when considering which is ‘better’.

specialp

4 months ago

1 reply

I will tell you that we have had bot super fight mode on for a year and since then we have not had to address abusing traffic nor deal with legitimate people blocked. There is no way we could have achieved such balance. prior to that it was me blocking every Chinese AS under the sun as they shifted and bombarded us with traffic

1gn15

4 months ago

1 reply

> nor deal with legitimate people blocked

How are you so sure of that? Their marketing?

account42

4 months ago

Simple: If you ignore people who get blocked or just also make sure they get blocked the same way in all ways they could reach you, then you don't have to deal with them and can just ignore the issue. Fun times ahead for us.

observationist

4 months ago

2 replies

Then put up a goddamn login wall.

The internet was designed to work the way it does for good reasons.

You not understanding those reasons is not an excuse for allowing a giant tech company to step in and be the gatekeeper for a huge portion of the internet. Nor to monetize, enshittify, balkanize, and fragment the web with no effective recourse or oversight.

Cloudflare shouldn't be allowed to operate, in my view.

4 months ago

> You not understanding those reasons is not an excuse for allowing a giant tech company to step in and be the gatekeeper for a huge portion of the internet.

Are you somehow under the impression that Cloudflare is forcing their service on other companies? They’re not stepping in, the people who own those sites have decided paying them is a better deal than building their own alternatives.

nemothekid

4 months ago

>Then put up a goddamn login wall.

They did exactly that, they just outsourced it to cloudflare. The problem became bad enough that a lot of other people did the same thing.

If your argument is "companies shouldn't be allowed to outsource components to other companies, or cloudflare specifically", then sure, but good luck ever enforcing that.

bobbiechen

4 months ago

5 replies

I disagree with the other top-level comments at the moment: I believe Web Bot Auth is a useful and non-centralized emerging standard for self-identifying bots and agents.

This press release today is a better statement of _why_ this feature exists (as opposed to the submission link, which is nuts-and-bolts of implementing): https://blog.cloudflare.com/signed-agents/

Web Bot Auth is a way for bots to self-identify cryptographically. Unlike the user agent header (which is trivially spoofed) or known IPs (painful to manage), Web Bot Auth uses HTTP Message Signatures using the bot's key, which should be published at some well-known location.

This is a good thing! We want bots to be able to self-identify in a way that can't be impersonated. This gives website operators the power to allow or deny well-behaved bots with precision. It doesn't change anything about bots who try to hide their identity, who are not going to self-identify anyways.

It's worth reading the proposal on the details: https://datatracker.ietf.org/doc/html/draft-meunier-web-bot-... . Nothing about this is limited to Cloudflare.

I'm also working on support for Web Bot Auth for our Agent Identification project at Stytch https://www.isagent.dev . Well-behaved bots benefit from this self-identification because it enables a better Agent Experience: https://stytch.com/blog/introducing-is-agent/

binarymax

4 months ago

2 replies

I agree in principle, but I disagree that it should be designed and mandated by a private gatekeeper

jrochkind1

4 months ago

1 reply

What's now at the top has links to IETF drafts in the first paragraph. What am I missing?

A way to authenticate identity for crawlers so I can allow-list ones I want to get in, exempt them from turnstile/captcha, etc -- is something I need.

I'm not following what makes this controversial. Cryptographic verification of identity for web requests, sounds right.

binarymax

4 months ago

3 replies

I think about failure modes. What happens if cloudflare decides you are a bot and you’re not. What recourse do you have? What are the formal mechanisms to ensure a person is not blocked from the majority of the web because cloudflare is a middleman and you are a false positive?

justincormack

4 months ago

This is not a spec sbout false positives, ir is about self identification as a bot.

jrochkind1

4 months ago

I am not following what any of that has to do with the Web Bot Auth protocol?

it seems like complaints about Cloudflare's anti-DOS protection services and how they have a monopoly on such, I get that.

I'm not seeing the connection to a protocol for bots/crawlers voluntarily cryptographically signing their http requests, so sites (anyone implementing the protocol not just cloudflare) can use it to authenticate known actors?

I am interested in using it to exempt bots/crawlers I trust/support/have an agreement with from the anti-bot measures I, like many, am being forced to implement to keep our sites up under an enormously increased wave of what is apparently AI-training-motivated repeat crawling. Right now these measures are keeping out bots I don't want to keep out too. I would like to be able to securely identify them to let them in.

delroth

4 months ago

Don't use a user agent that sends signed headers identifying you as a bot? How are any of the failure modes you mention not /improved/ by the spec proposal this comment section is about?

jacobn

4 months ago

Isn't that how most web standards got their start? One of the interested parties pushed something, then things evolved through the standards process?

(And then it can of course get derailed, but that's a separate story)

everfrustrated

4 months ago

1 reply

Isn't this somewhat equilivent to ensuring cookies are required?

Obviously this technology is different but the same sort of result.

What's the end game here? All humans end up having to use a unique encryption key to prove their humanness also?

pmontra

4 months ago

I understand your concern and we are probably headed into that direction, but that does not prove humanness any more than the subject of this post proves botness. They prove the knowledge of the value of a key.

account42

4 months ago

1 reply

> This is a good thing! We want bots to be able to self-identify in a way that can't be impersonated.

Who is we? I absolutely don't want that.

estearum

4 months ago

1 reply

Earnest question: why not? I would think "option to prove who you are and guarantee not to be impersonated" is a pretty broadly appealing capability except to people trying to do the impersonating.

skeezyboy

4 months ago

1 reply

>"option to prove who you are and guarantee not to be impersonated"

guaranteed as long as no attacker gets hold of the private key, which cannot be guaranteed

estearum

4 months ago

1 reply

Yeah, I don't find this to be a compelling argument at all.

That's an argument against all authentication anywhere.

skeezyboy

4 months ago

1 reply

> That's an argument against all authentication anywhere.

its a problem isnt it

estearum

4 months ago

sneak

4 months ago

The problem with this is that key generation is free, so being a well-behaved unknown bot is the same as being an unidentified bot, which means that you go in the block/captcha/throttle bucket.

It is only useful for whitelisting bots, not for banning bad ones, as bad ones can rotate keys.

Whitelisting clients by identity is the death of the open web, and means that nobody will ever be able to compete with capital on even footing.

marginalia_nu

4 months ago

I generally agree it's a good thing. It stacks the incentives so that bots can meaningfully build a good reputation, and be rewarded for behaving well.

That said, I do think it's the whole procedure is more than a bit overcomplicated to the degree where I doubt it will be widely implemented. You could likely achieve almost the full effect with a request signing alone.

realityfactchex

4 months ago

1 reply

No offense, but screw CloudFlare, screw their captchas for humans, and screw their wedging themselves between web operators and web users.

They can offer what they want for bots. But stop ruining the experience for humans first.

4 months ago

3 replies

> screw their wedging themselves between web operators and web users

Web operators choose to use them; hell they even pay Cloudflare to be between them. Seriously I just think you don't understand how bad it is to run a site without someone in-front of it.

4 months ago

1 reply

They don't have to, but they're tricked into doing so. Via marketing.

4 months ago

1 reply

I miss the 90s, too, but these days anyone who wants to deal with current levels of bot traffic is probably going to look at a service like Cloudflare as much cheaper than the amount of ops time they’d otherwise spend keeping things up and secure.

4 months ago

1 reply

You could just, like, not make a website that takes several seconds to handle each request.

I let bots hit Gitea 2-3 times per second on a $10/month VPS, and the only actual problem was that it doesn't seem to ever delete zip snapshots, filling up the disk when enough snapshot links are clicked. So I disabled that feature by setting the snapshots folder read-only. There were no other problems. I mention Gitea because people complain about having to protect Gitea a lot, for some reason.

4 months ago

1 reply

Sure, I’ve been doing that since the 90s. I still pay for hardware and egress, and it turns out that everything has limits for the amount of traffic it can handle which bots can easily saturate. I’ve had sites which were mostly Varnish serving cached content at wire speed go down because they saturated the upstream.

4 months ago

2 replies

I hope 2-3 requests per second is not that limit, or you're fucked.

vntok

4 months ago

It is on a simple WordPress install with the top 4 most used plugins, when you don't have a Caching Reverse Proxy like Cloudflare to filter bad traffic and serve fully cached pages from POP nodes located near the visitors.

The alternative, of course, is to set up a caching system server-side (like Redis), which most people who set up their WordPress blog don't have the first idea how to do in a secure way.

4 months ago

It’s not, but you’re off by 3+ orders of magnitude on the traffic volume and ignoring the cost of serving non-trivial responses.

mcspiff

4 months ago

Couldn’t agree more — Much like running my own DNS or email server, I don’t think I’ll ever go back to running my own website directly on the internet. It’s just not worth the hassle. For stuff only I use, it sits behind my VPN. For anything that _must_ be public, it’s going behind a WAF someone else can run.

specialp

4 months ago

I run a site that is a primary source of information. We also have customers that subscribe and are very sensitive to heavy handed controls. Before cloudflare and after "AI" we had bots from all over just destroying our endpoints with bursts of mining traffic. While we would love to have more discoverability this is not that. Cloudflare is in a tough spot trying to arbitrate good traffic vs bad. From my experience they are doing this as good as one can.

nerdsniper

4 months ago

3 replies

Why use a "web bot" instead of an API? Either can be driven by an AI "agent"...but this just seems like an "API key for a visual api interface", and rather wasteful in cost and resources. If a company could afford to pay a partner for an API key they wouldn't need this. If they can't afford to pay the partner for access -- they'd still be blocked with or without "Web Bot Auth". I don't understand what this is for.

I suspect I'm missing something, what am I missing?

notatoad

4 months ago

if you already have an api that exposes all the information that your parter who is willing to pay for an API key wants, then sure, that's perfect. but what if you don't have an API, or your API doesn't expose the information that crawlers are looking for? they want to crawl your website, they're willing to pay for the ability to crawl your website, but you don't want to build an API...

i'm sure the next step here will be a cloudflare product that sits in front of your website and blocks all bot traffic except for the bots that are verified to have paid for access. (or maybe that already exists?)

mediaman

4 months ago

The website the human sees is the new API.

That's needed because many APIs are either nonexistent or extremely marginal in design and content coverage.

observationist

4 months ago

Part of it, at least, is people thinking they've solved some perceived problem and being told by their chatbot that it's a terrific, brilliant new innovation and they should build a whole new protocol spec for it.

cuuupid

4 months ago

2 replies

Cloudflare is the last party that should be running this for two reasons.

1. THey have already proven to be a bad faith actor with their "DDoS protection."

2. This is pretty much the typical Cloudflare HN playbook. They release soemthing targeted at the current wave and hide behind an ideological barrier; meanwhile if you try to use them for anything serious they require a call with sales who jumps you with absurdly high pricing.

Do other cloud providers charge high fees for things they have no business charging for? Absolutely. But they typically tell you upfront and don't run ideological narratives.

This is not a company we should be putting much trust in, especially not with their continued plays to become the gatekeepers of the internet.

esseph

4 months ago

2 replies

Have you seen large cloud provider billing?????

There is a whole segment of tech designed around helping you understand and manage cloud costs, through consultations, automations, etc. It has spawned companies and career paths!

4 months ago

Ime, cloud cost centres are intentionally confusing and annoying. I get emails telling me to check their dashboard for billing info which I inevitably never do. It’s designed that way.

cuuupid

4 months ago

Yes but they don’t hide that behind ideological nonsense, they own up to it. They’re a good faith actor with a high price tag

4 months ago

1) how so? Pretty much everything they do for DDoS protection is at their customers choice. You might not like what people want for their site but lets not pretend that most companies aren't very happy with it.

2) Then don't use them? Either they provide enough value to pay them or they don't.

mtrovo

4 months ago

2 replies

As much as I understand this is needed it rubs me the wrong way.

The standard looks fine as a distributed protocol until you have to register to pay a rent to Cloudflare, which they say will eventually trickle down into publishers pocket but you know what having a middleman this powerful means to the power dynamics of the market. Publishers have a really bad hand no matter what we do to save them, content as we know it will have to adapt.

Give it a couple more iterations and some MBA will come up with the brilliant idea of introducing an internet toll to humans and selling a content bundle with unlimited access to websites.

maxwellg

4 months ago

Cloudflare is only the first to market with a solution. If this proposal catches on every WAF vendor under the sun will have it implemented before the next sales cycle. Enforcement of this standard will be commoditized down to nothing.

4 months ago

There is just too much spam and it's not clear that is a solvable problem without Cloudflare (or some other similar service). Maybe if they get big enough the incentives to spam will vanish and non Cloudflare sites can exist in peace (at-least until enough people leave Cloudflare that spam become profitable again).

bgwalter

4 months ago

1 reply

Cloudflare is playing both sides: grok.com is served by Cloudflare.