Claude for Chrome

https://www.youtube.com/watch?v=aFQFB5YpDZE&t=599s

4 months ago

> Then it's a great time to be a LLM security researcher then.

This reminded me of Jon Stewart’s Crossfire interview where they asked him “which candidate do you supposed would provide you better material if he won?” because he has “a stake in it that way, not just as citizen but as a professional comic”. Stewart answered he held the citizen part to be much more important.

I mean, yes, it’s “probably a great time to be an LLM security researcher” from a business standpoint, but it would be preferable if that didn’t have to be a thing.

whatever1

4 months ago

2 replies

Also IP and copyright is apparently no biggie. Sorry Aaron.

renewiltord

4 months ago

1 reply

Funny. According to you the only way to immortalize Aaron Schwartz is to entrench strongly the things he fought against. He died for a cause so it would be bad for the cause to win. Haha.

whatever1

4 months ago

1 reply

I don’t care about his cause. I care about the fact that I don’t see Altman or Dario being prosecuted and threatened with jail time.

renewiltord

4 months ago

Yeah, things have changed. Turing was chemically castrated. Some do argue that gay people should be so treated today but I disagree.

mdaniel

4 months ago

You left off the important qualifier: for corporations with monster legal teams. For people, different rules apply

ACCount37

4 months ago

1 reply

Nothing new. We've allowed humans to use computers for ages.

Security-wise, this is closer to "human substitute" than it is to a "browser substitute". With all the issues of letting a random human have access to critical systems, on top of all the early AI tech jank. We've automated PEBKAC.

4 months ago

2 replies

I don’t know any human who’ll transfer their money or send their private information to a malicious third party because invisible text on a webpage says so.

captainkrtek

4 months ago

1 reply

Yeah this isn’t a substitute, it’s automation taking action based on inputs the user may not even see, and doing it so fast without the likelihood a user would intervene.

If it’s a substitute its no better than trusting someone with the keys to your house, only for them to be easily instructed to rob your house by a 3rd party.

rustc

4 months ago

1 reply

This is like `curl | bash` but you automatically execute the code on every webpage you visit with full access to your browser.

captainkrtek

4 months ago

Basically undoing years of effort to isolate web properties from affecting other properties.

ACCount37

4 months ago

The only weird thing is the "invisible" part. The rest is consistent with known user behavior.

jjice

4 months ago

2 replies

My theory is that the average user of an LLM is close enough to the average user of a computer and I've found that the general consensus is that security practices are "annoying" and "get in the way". The same kind of user who hates anything MFA and writes their password on a sticky note that they stick to their monitor in the office.

woodrowbarlow

4 months ago

3 replies

it has been revelatory to me to realize that this is how most people want to interact with computers.

i want a computer to be predictable and repeatable. sometimes, i experience behavior that is surprising. usually this is an indication that my mental model does not match the computer model. in these cases, i investigate and update my mental model to match the computer.

most people are not willing to adjust their mental model. they want the machine to understand what they mean, and they're willing to risk some degree of lossy mis-communication which also corrupts repeatability.

maybe i'm naive but it wasn't until recently that i realized predictable determinism isn't actually something that people universally want from their personal computers.

brendoelfrendo

4 months ago

I think you're right, but I think the mental model of the average computer user does not assume that the computer is predictable and repeatable. Most conventional software will behave in the same way, every time, if you perform the same operations, but I think the average user views computers as black boxes that are fundamentally unpredictable. Complex tasks will have a learning curve, and there may be multiple paths that arrive at the same end result; these paths can also be changed at the will of the person who made the software, which is probably something the average user is used to in our days of auto-updating app stores, OS upgrades, and cloud services. The computer is still deterministic, but it doesn't feel that way when the interface is constantly shifting and all of the "complicated" bits that expose what the software is actually doing are obfuscated or removed (for user convenience, of course).

mywacaday

4 months ago

I think most people don't want to interact with computers and people will use anything that reduces the amount of time spent and will be be embraced en-mass regardless of security or privacy issues.

williamscales

4 months ago

I think most people want computers to be predictable and repeatable _at a level that makes sense to them_. That's going to look different for non-programmers.

Having worked helping "average" users, my perception is that there is often no mental model at any level, let alone anywhere close to what HN folks have. Developing that model is something that most people just don't do in the first place. I think this is mostly because they have never really had the opportunity to and are more interested in getting things done quickly.

When I explain things like MFA in terms of why they are valuable, most folks I've helped see usefulness there and are willing to learn. The user experience is not close to universally seamless however which is a big hangup.

TeMPOraL

4 months ago

> the general consensus is that security practices are "annoying" and "get in the way".

Because they usually are and they do.

> The same kind of user who hates anything MFA and writes their password on a sticky note that they stick to their monitor in the office.

This kind of user has a better feel for threat landscape than most armchair infosec specialists.

People go around security measures not out of some ill will or stupidity, but because those measures do not recognize the reality of the situation and tasks at hand.

With keeping passwords in the open or sharing them, this is common because most computer systems don't support delegation of authority - in fact, the very idea that I might want someone to do something in my name, is alien to many security people, and generally not supported explicitly, except for few cases around cloud computing. But delegation of authority is very common thing done by everyday people on many occasions. In real life, it's simple and natural to do. In digital world? Giving someone else your password is the only direct way to do this.

guelo

4 months ago

1 reply

No, it's because big tech has taken control of our data and locked it all down so we don't have control over it. AI browser automation is going to blow open all these militarized containers that use our own data and networks against us with the fig leaf of supposed security. I'm looking forward to the revival of personal data mashups like the old Yahoo Pipes.

pton_xd

4 months ago

2 replies

> AI browser automation is going to blow open all these militarized containers that use our own data against us.

I'm not sure what you mean by this. Do you mean that AI browser automation is going to give us back control over our data? How?

Aren't you starting a remote desktop session with Anthropic everytime you open your browser?

rvz

4 months ago

> Do you mean that AI browser automation is going to give us back control over our data? How?

Narrator: It won't.

guelo

4 months ago

There's a million ways. Just off the top of my head: unified calendars, contacts and messaging across Google, Facebook, Microsoft, Apple, etc. The agent figures out which platform to go to and sends the message without you caring about the underlying platform.

parhamn

4 months ago

3 replies

With regards to llm injection, we sorta need the cat and mouse games to play out a bit, no? I have my concerns but I'm not ready to throw out the baby with the bathwater. You could never release an OS if "no zero days" was a requirement. Every piece of software we use has and will have its vulnerabilities (see Apple's recent RCE), we play the arms race and things look asymptotically fine.

This seems to be the case in llms too. They're getting better and better (with a lot of research) at avoiding doing the bad things. I don't see why its fundamentally intractable to fence system/user/assistant/tool messages to prevent steering from non-trusted inputs, and building new fences for cases we want the steering.

Why is this piece of software particularly different?

asgraham

4 months ago

2 replies

First of all, you absolutely cannot release an OS with a known zero day. IANAL but that feels a lot like negligence that creates liability.

But even ignoring that, the gulf between zero days and plain-text LLM prompt injection is miles wide.

Zero days require intensive research to find, and expertise to exploit.

LLM prompt injections obviously exist a priori, and exploiting them requires only the ability to write.

knowannoes

4 months ago

>First of all, you absolutely cannot release an OS with a known zero day.

There is no such thing as a 'known zero day' vulnerability.

Zero day vulnerability means it is a newly discovered one. Today. The day zero.

warkdarrior

4 months ago

> you absolutely cannot release an OS with a known zero day. IANAL but that feels a lot like negligence that creates liability.

You would think Microsoft, Apple, and Linux would have been sued like crazy by now over 0-days.

mynameismon

4 months ago

At the same time, manufacturers do not release operating systems with extremely obvious flaws that have (atleast so far) no reasonable guardrails and pretend that they are the next messiah.

freeone3000

4 months ago

Because the flaws are glaring, obvious, and easily avoidable.

herval

4 months ago

1 reply

while at the same time talking nonstop about how "AI alignment" and "AI safety" are extremely important

strange_quark

4 months ago

Anthropic is the worst about this. Every product release they have is like "Here's 10 issues we found with this model, we tried to mitigate, but only got 80% of the way there. We think it's important to still release anyways, and this is definitely not profit motivated." I think it's because Anthropic is run by effective altruism AI doomers and operates as an insular cult.

falcor84

4 months ago

4 replies

> it's slightly annoying to have to write your own emails.

I find that to be a massive understatement. The amount of time, effort and emotional anguish that people expend on handling emails is astronomical. According to various estimates, email-handling takes somewhere around 25% of the work time of an average knowledge worker, going up to over 50% for some roles, and that most people check and reply to emails on evenings and over weekends at least occasionally.

I'm not sure it's possible, but it is my dream that I'd have a capable AI "secretary" that would process my email and respond in my tone based on my daily agenda, only interrupting for exceptional situations where I actually need to make a choice, or to pen a new idea to further my agenda.

xenobeb

4 months ago

At my job it takes about 50% of my time. I love LLMs but I don't see how they can possible help me with email.

I would have to write a prompt that is almost exactly the same as writing the email. It is not like I am writing a fictional story that the LLM could somehow compress the main ideas. I feel like the LLM would have to be able to read my mind to properly respond to my inbox.

Loic

4 months ago

I am French living in Germany, the amount of time Claude saves me every week by reviewing the emails I send to contractors, customers is incredible. It is very hard to write good idiomatic German while ensuring no grammar and spelling mistakes.

I second you, just for that, I would continue paying for a subscription, that I can also use it for coding, toying with ideas, quickly look for information, extract information out of documents, everything out of a simple chat interface is incredible. I am old, but I live in the future now :-)

polynomial

4 months ago

Do you have any citations for various estimates? This is super interesting to me.

edaemon

4 months ago

Email is just communication. It seems appropriate that knowledge workers spend a lot of time communicating.

chankstein38

4 months ago

This comment kind of boils down the entire AI hype bubble into one succinct sentence and I appreciate it! Well said! You could basically put anything instead of "security" and find the same.

bbarnett

4 months ago

I can accept a bit of form-letter from help desks, or in certain business cases. And the same for crafting a generic, informative letter being sent to thousands.

But as soon it gets one on one, the use of AI should almost be a crime. It certainly should be a social taboo. It's almost akin to talking to a person, one on one, and discovering they have a hidden earpiece, and are being prompted on how to respond.

And if I send an email to an employee, or conversely even the boss of a company I work for, I won't abide someone pretending to reply, but instead pasting junk from an AI. Ridiculous.

There isn't enough context in the world, to enable an AI to respond with clarity and historical knowledge, to such emails. People's value has to do as much with their institutional knowledge, shared corporate experiences, and personal background, not genericized AI responses.

It's kinda sad to come to a place, where you begin to think the Unibomber was right. (Though of course, his methods were wrong)

edit:

I've been hit by some downvotes. I've noticed that some portion of HN is exceptionally AI pro, but I suspect instead it may have something to do with my Unabomber comment.

For context, at least what I gathered from his manifesto, there was a deep distrust of machines, and how they were interfering with human communication and happiness.

Fast forward to social media, mobile phones, AI, and more... and he seems to have been on to something.

From wikipedia:

"He wrote that technology has had a destabilizing effect on society, has made life unfulfilling, and has caused widespread psychological suffering."

Again, clearly his methods were wrong. Yet I see the degradation of US politics into the most simplistic, team-centric, childish arguments... all best able to spread hate, anger, and rage on social media. I see people, especially youth deeply unhappy from their exposure to social media. I see people spending more time with an electronic box in their hand, than with fellow humans.

We always say that we should approach new technology with open eyes, but we seldom mean this about examining negatives. And as a society we've ignored warnings, and negatives with social media, with phones, and we are absolutely not better off as a result.

So perhaps we should use those lessons, and try to ensure that AI is a plus, not a minus in this new world?

For me, replacing intimate human communication with AI, replacing one-on-one conversations with the humans we work with, play with, are friends with, with AI? That's sad. So very, very, very sad.

Once, many years ago a friend of mine was upset. A conservative politician was going door to door, trying to get elected. This politician was railing against the fact that there was a park down the street, paid for by the city. He was upset that taxes paid for it, and that the city paid to keep it up.

Sure, this was true, but my friend after said to me "We're trying to have a society here!".

And I think that's part of what bugs me about AI. We're trying to have a society here!, and part of that is communicating with each other.

SchemaLoad

4 months ago

What I suspect happens is that Apple ensures that apps can not be interacted with automatically, and anything sensitive like banking moves away from websites and purely app only where the compute environment integrity is verified and bot free.

mikojan

4 months ago

2 replies

Can somebody explain this security problem to me please.

How is there not an actual deterministic traditionally programmed layer in-between the LLM and whatever it wants to do? That layer shows you exactly what changes it is going to apply and it is going to ask you for confirmation.

What is the actual problem here?

knowannoes

4 months ago

As soon as you send text to a text completion API, local or remote, and it returns some text completion that some code parses, finds commands and runs them, all bets are off.

All the semantics around "stochastic (parrot)", "non-deterministic", etc tries to convey this. But of course some people will latch on to the semantics and triumphantly "win" the argument by misunderstanding the point entirely.

Automation trades off generality. General automation is an oxymoron. But yeah by all means, plug a text generator to your hands off work flow and pray. Why not? I wouldn't touch such a contraption with a 10 feet pole.

raincole

4 months ago

How are you going to present this information to users? I mean average users, not programmers.

LLM: I'm going to call the click event on: {spewing out a bunch of raw DOM).

Not like this, right?

If you can design an 'actual deterministic traditionally programmed layer' that presents what's actually happening at lower level in a user-friendly way and make it work for arbitrary websites, you'll get Turing Award. Actually Turing Award is downplaying your achievement. You'll be remembered as someone who invented (not even 'reinvented') the web.

lucasmullens

4 months ago

It has a big banner that says "Research preview: The browser extension is a beta feature with unique risks—stay alert and protect yourself from bad actors.", and it says "Join the research preview", and then takes you to a form with another warning, "Disclaimer: This is an experimental research preview feature which has several inherent risks. Before using Claude for Chrome, read our safety guide which covers risks, permission limitations, and privacy considerations."

I would also imagine that it warns you again when you run it for the first time.

I don't disagree with you given how uniquely important these security concerns are, but they seem to be doing at least an okay job at warning people, hard to say without knowing how their in-app warnings look.

prodigycorp

4 months ago

Besides prompt injection, be ready to kiss your privacy goodbye. You should be assuming you're handing over your entire browsing contents/history to Anthropic. Any of your content that doesn't follow Anthropic's very narrow acceptable use policy will be automatically flagged and stored on their servers indefinitely.

theptip

4 months ago

I think you’re being way too cynical. The first sentence talks about risks:

> When AI can interact with web pages, it creates meaningful value, but also opens up new risks

And the majority of the copy in the page is talking about risks and mitigations.

Eg reviewing commands before they are executed.

cube2222

4 months ago

1 reply

> We’re launching with 1,000 Max users and expanding gradually based on what we learn. This measured approach helps us validate safeguards before broader deployment.

Somewhat comforting they’re not yolo-ing it too much, but I frankly don’t see how the prompt injection issues with browser agents that act on your behalf can be surmounted - maybe other than the company guaranteeing “we’ll reimburse you for any unintentional financial losses incurred by the agent”.

Cause it seems to me like any straightforward methods are really just an arms race between prompt injection and heuristic safeguards.

hombre_fatal

4 months ago

1 reply

Since the LLM has to inherently make tool/API calls to do anything, can't you gate those behind a confirmation box that describes what it wants to do?

And you could whitelist APIs like "Fill form textarea with {content}" vs more destructive ones like "Submit form" or "Make request to {url} with {body}".

Edit: It seems to already do this.

Granted, you'd still have to be eternally vigilant.

cube2222

4 months ago

1 reply

When every operation needs to be approved (every button click, every form entry, etc.) does it even make sense to use an agent?

And it’s not like you can easily “always allow” let’s say, certain actions on certain websites, because the issue is less with the action, and more with the data passed to it.

hombre_fatal

4 months ago

Sure, just look at the examples in TFA like finding emails that demand a response or doing custom queries on Zillow.

You probably are just going to grant it read access.

That said, having thought about it, the most successful or scarier injections probably aren't going to involve things like crafting noisy destructive actions but rather silently changing what the LLM does during trusted/casual flows like reading your emails.

So I can imagine a dichotomy between pretty low risk things (Zillow/Airbnb queries) and things that demand scrutiny like doing anything in your email inbox where the LLM needs to read emails, and I can imagine the latter requiring such vigilance that you might be right.

It'll be very interesting and probably quite humbling to see this whole new genre of attacks pop up in the wild.

biggestfan

4 months ago

5 replies

According to their own blog post, even after mitigations, the model still has an 11% attack success rate. There's still no way I would feel comfortable giving this access to my main browser. I'm glad they're sticking to a very limited rollout for now. (Sidenote, why is this page so broken? Almost everything is hidden.)

Szpadel

4 months ago

2 replies

well, at least they are honest about it and don't try to hide it in any way. They probably want to gather more real world data for training and validation, that's why this limited release. openai have browser agent for some time already but I didn't hear about any security considerations. I bet they have the same issues

pharrington

4 months ago

Honesty would be Anthropic paying the 1000 alpha testers a fair wage for their very dangerous QA work.

4 months ago

> at least they are honest about it and don't try to hide it in any way.

Seems more likely they’re trying to cover their own ass, so when anything inevitably goes wrong they can point and say “see, we told you it was dangerous, not our fault”.

mark242

4 months ago

4 replies

11% success rate for what is effectively a spear-phishing attempt isn't that terrible and tbh it'll be easier to train Claude not to get tricked than it is to train eg my parents.

zaphirplane

4 months ago

1 reply

What ! 1 in 10 successfully phished is ok ? 1 in 10 page views. That has to approach 100% success rate over a week say month of browsing the web with targeted ads and/or link farms to get the page click

IanCal

4 months ago

This is where rates hide the issue.

One in ten cases that take hours on a phone talking to a person with detailed background info and spoofed things is one issue. One in ten people that see a random message on social media is another.

Like 1 in 10 traders on the street might try and overcharge me is different from 1 in 10 pngs I see can drain my account.

whatevertrevor

4 months ago

The kind of attack vector is irrelevant here, what's important is the attack surface. Not to mention this is a tool facilitating the attack, with little to no direct interaction with the user in some cases. Just because spear-phishing is old and boring doesn't mean it cannot have real consequences.

(Even if we agree with the premise that this is just "spear-phishing", which honestly a semantics argument that is irrelevant to the more pertinent question of how important it is to prevent this attack vector)

https://i.imgur.com/E4HloO7.png

4 months ago

>Claude not to get tricked than it is to train eg my parents.

One would think but apparently from this blog post it is still succeptible to the same old prompt injections that have always been around. So I'm thinking it is not very easy to train Claude like this at all. Meanwhile with parents you could probably eliminate an entire security vector outright if you merely told them "bank at the local branch," or "call the number on the card for the bank don't try and look it up."

lelanthran

4 months ago

With spear phishing there are a limited number of attack attempts, maybe one a day and the target will wise up.

With this you can probably try a few thousand attempts per minute.

mkozlows

4 months ago

1 reply

The strong sense I got from reading this is that they don't believe it's possible to safely do this sort of thing right now, and they want to warn people away from Perplexity etc. so they can avoid losing market share while also not launching a not-yet-ready product.

(The more interesting question will be whether they have any means to eventually make it safe. I'm pretty skeptical about it in the near term.)

AdieuToLogic

4 months ago

> The strong sense I got from reading this is that they don't believe it's possible to safely do this sort of thing right now, and they want to warn people away ...

This is directly contradicted by one of the first sentences in the article:

  We've spent recent months connecting Claude to your 
  calendar, documents, and many other pieces of software. The 
  next logical step is letting Claude work directly in your 
  browser.

Ascribing altruism to the quoted intent is dissembling at best.

aquova

4 months ago

I'm honestly dumbfounded this made it off the cutting room floor. A 1 in 9 chance for a given attack to succeed? And that's just the tests they came up with! You couldn't pay me to use it, which is good, because I doubt my account would keep that money in it for long.

rvz

4 months ago

> According to their own blog post, even after mitigations, the model still has an 11% attack success rate.

That is really bad. Even after all those mitigations imagine the other AI browsers being at their worst. Perplexity's Comet showed how a simple summarization can lead to your account being hijacked.

> (Sidenote, why is this page so broken? Almost everything is hidden.)

They vibe-coded the site with Claude and didn't test it before deploying. That is quite a botched amateur launch for engineers to do at Anthropic.

Yeroc

4 months ago

3 replies

Most browser extensions you need to manually enable in incognito mode. This is an extension that should be disabled in normal mode and only enabled in incognito mode!

layman51

4 months ago

1 reply

In my opinion, if it shouldn’t be enabled in normal mode, it certainly shouldn’t be enabled in Incognito Mode either where it will give you a false sense of security.

darknavi

4 months ago

Perhaps an excuse for a new "mode". Or using something like Firefox containers to keep it in its own space.

mkl

4 months ago

1 reply

Just make a separate browser profile for it. That's easy in Chrome.

dotproto

4 months ago

1 reply

Also pretty easy with Firefox's new profile manager https://support.mozilla.org/kb/profile-management

mkl

4 months ago

Oh, excellent. I use profiles in Firefox too, but it's been quite awkward in comparison.

nicce

4 months ago

Rather completely different browser, and in the sandbox.

cdrini

4 months ago

3 replies

Hmm is it just me or is this webpage loading with all the text invisible? Firefox+Android.

cdrini

4 months ago

Update: appears fixed now

alach11

4 months ago

Same with Firefox+Windows 11. I guess they really only care about Chrome...

poly2it

4 months ago

Same on Vanadium.

coffeecoders

4 months ago

9 replies

Not sure if its only me, but most of the texts in this page aren't showing up.

vunderba

4 months ago

I don't know if this site was built by dogfooding with their own agents, but this just outlines a massive limitation where automated TDD doesn't come close to covering the basic question "does my site look off?" when vibe coding.

jampa

4 months ago

The blog works for me: https://www.anthropic.com/news/claude-for-chrome

rafram

4 months ago

They say a picture is worth a thousand words.

(It's not even a font rendering issue - the text is totally absent from the page markup. I wonder how that can happen.)

nzach

4 months ago

I've got the same error on my side. At first I thought it was some weirdness with Firefox, but opening on Chrome gives the same result.

I don't know what causes this bug specifically, but encountered similar behavior when I asked claude to create some frontend for me. It may not even be the same bug, but I find it an interesting coincidence.

Nizoss

4 months ago

Same issue here, dark mode on mobile.

hotfixguru

4 months ago

Same for me, Safari on an iPhone.

iammjm

4 months ago

Yes, it’s broken

solardev

4 months ago

It's Web 4.0. You're supposed to bring your own GPT and let it make up the text as you go.

4 months ago

It’s not only you. I tested in three different web browsers, each with their own rendering engine (Webkit, Chromium, Gecko), and all of them show no text. It’s not invisible, it’s plain not there.

Did they tell their AI to make a website and push to production without supervision?

montroser

4 months ago

1 reply

Hard pass, thanks. Claude code can be pretty amazing, but I need those guide rails -- being able to limit the scope of access, track changes with version control, etc.

thrown-0825

4 months ago

claude code should be shipped in a sandbox by default, its crazy that it isnt.

this product shouldnt be shipped at all.

recov

4 months ago

1 reply

Probably the better link: https://www.anthropic.com/news/claude-for-chrome

dang

4 months ago

Changed above. Thanks!

aliljet

4 months ago

7 replies

Having played a LOT with browser use, playwright, and puppeteer (all via MCP integrations and pythonic test cases), it's incredibly clear how quickly Claude (in particular) loses the thread as it starts to interact with the browser. There's a TON of visual and contextual information that just vanishes as you begin to do anything particularly complex. In my experience, repeatedly forcing new context windows between screenshots has dramatically improved the ability for claude to perform complex intearctions in the browser, but it's all been pretty weak.

When Claude can operate in the browser and effectively understand 5 radio buttons in a row, I think we'll have made real progress. So far, I've not seen that eval.

MattSayar

4 months ago

4 replies

Same. When I try to get it to do a simple loop (eg take screenshot, click next, repeat) it'll work for about five iterations (out of a hundred or so desired) then say, "All done, boss!"

I'm hoping Anthropic's browser extension is able to do some of the same "tricks" that Claude Code uses to gloss over these kinds of limitations.

robots0only

4 months ago

2 replies

Claude is extremely poor at vision when compared to Gemini and ChatGPT. i think anthropic severely overfit their evals to coding/text etc. use cases. maybe naively adding browser use would work, but I am a bit skeptical.

user453

4 months ago

Is it overfitting if it makes them the best at those tasks?

bdangubic

4 months ago

I have a completely different experience. Pasting a screenshot into CC is my de-facto go-to that more often than not leads to CC understanding what needs to be done etc…

CSMastermind

4 months ago

1 reply

This has been exactly my experience using all the browser based tools I've tried.

ChatGPT's agents get the furthest but even then they only make it like 10 iterations or something.

rzzzt

4 months ago

1 reply

I have better success with asking for a short script that does the million iterations than asking the thing to make the changes itself (edit: in IDEs, not in the browser).

seunosewa

4 months ago

If you need precision, that's the way to go, and it's usually cheaper and faster too.

felarof

4 months ago

I'm wondering if they are using vanilla claude or if they are using a fine-tuned version of claude specifically for browser use.

RL fine-tuning LLMs can have pretty amazing results. We did GRPO training of Qwen3:4B to do the task of a small action model at BrowserOS (https://www.browseros.com/) and it was much better than running vanilla Claude, GPT.

tripplyons

4 months ago

Hopefully one of those "tricks" involves training a model on examples of browser use.

4 months ago

3 replies

I have built a custom "deep research" internally that uses puppeteer to find business information, tech stack and other information about a company for our sales team.

My experience was that giving the LLM a very limited set of tools and no screenshots worked pretty damn well. Tbf for my use case I don't need more interactivity than navigate_to_url and click_link. Each tool returning a text version of the page and the clickable options as an array.

It is very capable of answering our basic questions. Although it is powered by gpt-5 not claude now.

panarky

4 months ago

1 reply

Just shoving everything into one context fails after just a few turns.

I've had more success with a hierarchy of agents.

A supervisor agent stays focused on the main objective, and it has a plan to reach that objective that's revised after every turn.

The supervisor agent invokes a sub-agent to search and select promising sites, and a separate sub-sub-agent for each site in the search results.

When navigating a site that has many pages or steps, a sub-sub-sub-agent for each page or step can be useful.

The sub-sub-sub-agent has all the context for that page or step, and it returns a very short summary of the content of that page, or the action it took on that step and the result to the sub-sub-agent.

The sub-sub-agents return just the relevant details to their parent, the sub-agent.

That way the supervisor agent can continue for many turns at the top level without exhausting the context window or losing the thread and pursuing its own objective.

4 months ago

Hmm my browser agents each have about 50-100 turns (takes roughly 3-5 minutes for each one) and one focused objective I make use of structured output to group all the info it found into a standardized format at the end.

I have 4 of those "research agents" with different prompts running after another and then I format the results into a nice slack message + Summarize and evaluate the results in one final call (with just the result jsons as input).

This works really well. We use it to score leads as for how promising they are to reach out to for us.

4 months ago

1 reply

Seems navigate_to_url and click_link would be solved with just a script running puppeteer vs having an llm craft a puppeteer script to hopefully do this simple action reliably? What is the great advantage with the llm tooling in this case?

4 months ago

1 reply

Oh the tools are hand coded (or rather built with Claude Code) but the agent can call them to control the browser.

Imagine a prompt like this:

You are a research agent your goal is to figure out this companies tech stack: - Company Name

Your available tools are: - navigate_to_url: use this to load a page e.g. use google or bing to search for the company site It will return the page content as well as a list of available links - click_link: Use this to click on a specific link on the currently open page. It will also return the current page content and any available links

A good strategy is usually to go on the companies careers page and search for technical roles.

This is a short form of what is actually written there but we use this to score leads as we are built on postgres and AWS and if a company is using those, these are very interesting relevancy signals for us.

4 months ago

1 reply

I still don't understand what the llm does. One could do this with a few lines of curl and a list of tools to query against.

4 months ago

1 reply

The LLM understands arbitrary web pages and finds the correct links to click. Not for one specific page but for ANY company name that you give it.

It will always come back with a list of technologies used if available on the companies page. Regardless of how that page is structured. That level of generic understanding is simply not solveable with just some regex and curls.