Benchmarking Leading AI Agents Against Google Recaptcha V2
Postedabout 2 months agoActiveabout 2 months ago
research.roundtable.aiTechstoryHigh profile
calmmixed
Debate
70/100
Artificial IntelligenceCaptchaSecurity
Key topics
Artificial Intelligence
Captcha
Security
A benchmarking study evaluates the performance of leading AI agents against Google reCAPTCHA v2, sparking discussion on the effectiveness of CAPTCHAs and the implications of AI advancements.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
20m
Peak period
61
0-6h
Avg / period
10.7
Comment distribution96 data points
Loading chart...
Based on 96 loaded comments
Key moments
- 01Story posted
Nov 10, 2025 at 11:38 AM EST
about 2 months ago
Step 01 - 02First comment
Nov 10, 2025 at 11:58 AM EST
20m after posting
Step 02 - 03Peak activity
61 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 13, 2025 at 11:30 AM EST
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45877698Type: storyLast synced: 11/20/2025, 5:30:06 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Will be interesting to see how Gemini 3 does later this year.
Also, when they ask you to identify traffic lights, do you select the post? And when it’s motor/bycicles, do you select the guy riding it?
Either that or it was never about the buses and fire hydrants.
The worst offenders will just loop you forever, no matter how many solves you get right.
I sincerely wish all the folx at Google directly responsible for this particular user acquisition strategy to get every cancer available in California.
They just link it to your IP address, browser, operating system, screen resolution, set of fonts, plugins, timezone, mouse movements, GPU, number of CPU cores, and of course the fact you've got third party cookies disabled.
I'm sure they're doing other sketchy things but wouldn't make sense to lie in such a blindingly obvious way. (I just tested it, and indeed, it works as expected)
https://9to5google.com/2020/02/06/google-chrome-x-client-dat...
It doesn't catch OpenAI even though the mouse/click behavior is clearly pretty botlike. One hypothesis is that Google reCAPTCHA is overindexing on browser patterns rather than behavioral movement
Yes.
For example, hesitation/confusion patterns in CAPTCHAs are different between humans and bots and those can actually be used to validate humans
The same is true with, say, buses. See an image of a delivery van? Bus! It asks you select all cars and you see no car but a vague pixel blob that someone stupid would identify as a car? Car!
One of the few things that this doesn't work with is stairs, because the side of stairs being stairs or not is something apparently no one can agree on.
I know it's still not justified, but it's the easy solution that works for preventing DOS attacks.
Kinda wild that someone scraping google's data would prevent me from getting into my PAID (>90$/yr) Dropbox account. That experience is a big part of why I pay extra to host my host data on my own server now.
Decentralization, hosting your own stuff, is great until you run into DDOS attacks and have to make maintaining your server a full time job. Sure you have the skills (or can acquire it), but do you have the time ?
What is the purpose of such loop? Bots can simply switch to another residential proxy when the captcha success rate gets low. For normal humans, it is literally "computer says no".
This type of captcha is too infuriating so I always skip it until I get the ones where I’m just selecting an entire image, not parts of an image
Google’s captchas are too ambiguous and might as well be answered philosophically with an essay-length textbox
Just select as _you_ would. As _you_ do.
Imperfection and differing judgments are inherent to being human. The CAPTCHA also measures your mouse movement on the X and Y axes and the timing of your clicks.
If it says traffic lights just click on the ones you can see lit and not the posts and ignore them if they are too far in the distance. Seems to work for me.
Now I think of it, it's really a failure that AI didn't use this and went with guessing which square of an image to select.
Reload are challenging because of how the agent-action loop works. But the models were pretty good at identifying when a tile contained an item.
Either that, or just be honest and allow anonymous posting lol
And how often does this happen? Do you have any proof? Most YC companies building browser agents have built-in captcha solvers.
It's like the last hype over using generative AI for trading.
You might use it for sentiment analysis, summarization and data pre-processing. But classic forecast models will outperform them if you feed them the right metrics.
It would be nice to see comparisons to some special-purpose CAPTCHA solvers though.
https://ai.google.dev/gemini-api/docs/image-understanding
I also perform poorly on cross-tile, I never know whether to count a tiny bit of a bicycle in a square as "a bike in that square".
Can we just get rid of them now, they are so annoying and basically useless.
I’m guessing Google is evaluating more than whether the answer was correct enough (ie does my browser and behavior look like a bot?), so that may be a factor
1 more comments available on Hacker News