A2ui: a Protocol for Agent-Driven Interfaces

20 days ago

I recently heard that when automobiles were new the USA quickly ended up in a state with 80 competing manufacturing brands. In a couple decades, the market figured out what customers actually want and what styles and features mattered, and the competition ecosystem consolidated to 5 brands.

The same happened with GPUs in the 90s. When Jensen formed Nvidia there were 70 other companies selling Graphics Cards that you could put in a PCI slot. Now there are 2.

askl

20 days ago

1 reply

Obligatory https://xkcd.com/927/

18 days ago

Oh yes. I send this all the time. And I also see the irony.

I can justify A2UI as doing something not otherwise accomplishable in the market today, but you saw how long the blog post was trying to explain that. :shrug:

mystifyingpoi

20 days ago

1 reply

> Sounds like a lot of wasted manhours to me

Sounds like a lot of people got paid because of it. That's a win for them. It wasn't their decision, it was company decision to take part in the race. Most likely there will be more than 1 winner anyway.

20 days ago

I'm one of these people. We have to start working on the problem many months before the competition announces that they exist. So we are all just doing parallel evolution here. Everyone agrees that to sit and wait for a standard means you wouldn't waste energy, but you'd also have no influence.

Like you mentioned, its a good time to be employed.

shireboy

20 days ago

2 replies

AGUI sounds similar: https://github.com/ag-ui-protocol/ag-ui

20 days ago

1 reply

Same team! AGUI uses a2UI as the protocol under the hood.

swiftlyTyped

19 days ago

1 reply

Hi, one of the AG-UI authors here.

AG-UI is a launch partner of A2UI, but it is a separate project by CopilotKit, not google.

We have a day-0 handshake between AG-UI & A2UI

https://www.copilotkit.ai/ag-ui-and-a2ui

18 days ago

And thank you CopilotKit team!

I think AG UI is great if you are building the UI and the Agent at the same time and want a high bandwidth sync between them and the UI supports AG UI as an adaptor layer (they have done a lot of work making this easier for folks).

A2UI is most interesting for it's LLM generation options (not tools but structured output), it's remote message passing options (if you don't own the UI), and it's general-purpose-ness (same fairly simple standard which can work for many models, transports, and renderers).

They do fit nicely together. Sorry the naming conventions are complicated. There are 2 hard things in computer science: naming things, cache invalidation, off by one errors.

meander_water

19 days ago

This provides a bit more detail on how they relate to each other

adamesque

19 days ago

1 reply

Unlike many of those approaches which concern themselves with delivery of human-designed static UI, this seems to be a tool designed to support generative UIs. I personally think that's a non-starter and much prefer the more incremental "let the agent call a tool that renders a specific pre-made UI" approach of MCP UI/Apps, OpenAI Apps SDK, etc for now.

18 days ago

Legitimate curiosity - why?

Making an agent call a tool to manipulate a UI does feel like normal application development and an event driven interaction... I get that.

What else drives your preference?

p_v_doom

20 days ago

We should make one new standard for everyone to use ...

fortydegrees

20 days ago

Like Material-UI and Tailwind-UI? Seems to me like design is inherently opinionated and there will always be multiple approaches/libraries in this space.

hobofan

20 days ago

MCP-UI and OpenAI Apps are converging into the MCP Apps extension specification: https://blog.modelcontextprotocol.io/posts/2025-11-21-mcp-ap...

codethief

20 days ago

5 replies

> A2UI lets agents send declarative component descriptions that clients render using their own native widgets. It's like having agents speak a universal UI language.

(emphasis mine)

Sounds like agents are suddenly able to do what developers have failed at for decades: Writing platform-independent UIs. Maybe this works for simple use cases but beyond that I'm skeptical.

mentalgear

20 days ago

1 reply

It still needs language-specific libraries [1] (and no sveltekit even announced yet :( ).

[1] https://a2ui.org/renderers/

ddrdrck_

20 days ago

1 reply

Well it is open source and they expect the community to add more renderers. So if you are a sveltekit specialist this could actually be an opportunity.

20 days ago

1 reply

Plus 1! We’d love community contributions here!

https://github.com/google/A2UI/pull/352

18 days ago

I tell you what, I'll add Svelte/Kit to the list we want to target:

Thanks for the recommendation.

hurturue

20 days ago

3 replies

platform independent UIs exist - HTML and Electron

codethief

16 days ago

HTML is not rendered to native widgets (or barely so, in a few <select> cases).

hulitu

14 days ago

They run in a browser.

20 days ago

Sure. HTML is a Markup-Language (it's in the acronym). Markdown is also a Markup Language. LLMs are super good at Markdown and just about every chatbot frontend now has a renderer built in.

A2UI is a superset, expanding in to more element types. If we're going to have the origin of all our data streams be string-output-generators, this seems like an ok way to go.

I've joined an effort inside Google to work in this exact space, though what we're doing has no plan to become open source, other groups are working on stuff like A2UI and we collaborate with them.

My career previous to this was nearly 20 years of native platform UI programming and things like Flutter, React Native, etc have always really annoyed me. But I've come around this year to accept that as long as LLMs on servers are going to be where the applications of the future live, we need a client-OS agnostic framework like this.

giancarlostoro

19 days ago

I've thought about how to write a platform independent UI framework that doesn't care what language you write it in, and every time I find myself reinventing X.org or at least my gut tells me I'm just reinventing a cross-platform X server implementation.

observationist

19 days ago

Nope, it's just a repackaging of the same problem, except in this case, the problem is solved with APIs and CLI and not jumping through hoops in order to get the AI to do what humans do.

It's about accomplishing a task, not making a bot accomplish a task using the same tools and embodiment context as a human - there's no upside, unless the bot is actually using a humanoid embodiment, and even then, using a CLI and service API is going to be preferable to doing things with UI in nearly every possible case, except where you want to limit to human-ish capabilities, like with gaming, or you want to deceive any monitors into thinking that a human is operating.

It's going to be infinitely easier to wrap a json get/push wrapper around existing APIs or automation interfaces than to universalize some sort of GUI interactions, because LLM's don't have the realtime memory you need to adapt to all the edge cases on the fly. It's incredibly difficult for humans, and hundreds of billions of dollars have been spent trying to make software universally accessible and dumbed down for users, and still ends up being either stupidly limited, or fractally complex in the tail, and no developer can ever account for all the possible ways in which users interact with a feature for any moderately complex piece of software.

Just use existing automation patterns. This is one case where if an AI picks up this capability alongside other advances, then awesome, but any sort of middleware is going to be a huge hack that immediately gets obsoleted by frontier models as a matter of course.

rockwotj

20 days ago

this isn’t the right way to look at it. It’s really server side rendering where the LLM is doing the markup language generation instead of a template. The custom UI is usually higher level. Airbnb has been doing this for years: https://medium.com/airbnb-engineering/a-deep-dive-into-airbn...

lowsong

20 days ago

1 reply

> A2UI lets agents send declarative component descriptions that clients render using their own native widgets. It's like having agents speak a universal UI language.

Why the hell would anyone want this? Why on earth would you trust an LLM to output a UI? You're just asking for security bugs, UI impersonation attacks, terrible usability, and more. This is a nightmare.

vidarh

20 days ago

1 reply

If done in chat, it's just an alternative to talking to you freeform. Consider Claude Code's multiple-choice questions, which you can trigger by asking it to invoke the right tool, for example.

DannyBee

20 days ago

1 reply

None of the issues go away just because it's in chat?

Freeform looks and acts like text, except for a set of things that someone vetted and made work.

If the interactive diagram or UI you click on now owns you, it doesn't matter if it was inside the chat window or outside the chat window.

Now, in this case, it's not arbitrary UI, but if you believe that the parsing/validation/rendering/two way data binding/incremental composition (the spec requires that you be able to build up UI incrementally) of these components: https://a2ui.org/specification/v0.9-a2ui/#standard-component...

as handled/renderered/etc by N library implementations, is not going to have security issues, i've got a bridge to sell you.

vidarh

20 days ago

> None of the issues go away just because it's in chat?

There is a wast difference in risk between me clicking a button provided by Claude in my Claude chat, on the basis of conversations I have had with Claude, and clicking a random button on a random website. Both can contain a malicious. One is substantially higher risk. Separately, linking a UI constructed this way up to an agent and let third parties interact with it, is much riskier to you than to them.

> If the interactive diagram or UI you click on now owns you, it doesn't matter if it was inside the chat window or outside the chat window.

In that scenario, the UI elements are irrelevant barring a buggy implementation (yes, I've read the rest, see below), as you can achieve the same things as you can do that way with just presenting the user with a basic link and telling them to press it.

> as transported/renderered/etc by NxM combinations of implementations (there are 4 renderers and a bunch of transports right now), is not going to have security issues, i've got a bridge to sell you.

I very much doubt we'll see many implementations that won't just use a web view for this, and I very much doubt these issues will even fall in the top 10 security issues people will run into with AI tooling. Sure, there will be bugs. You can use this argument against anything that requires changes to client software.

But if you're concerned about the security of clients, mcp and hooks is a far bigger rats nest of things that are inherently risky due to the way they are designed.

wongarsu

20 days ago

1 reply

I wouldn't want this anywhere near production, but for rapid prototyping this seems great. People famously can't articulate what they want until they get to play around with it. This lets you skip right to the part where you realize they want something completely different from what was first described without having to build the first iteration by hand

turnsout

20 days ago

1 reply

Honestly the point of this is not to help app developers—it's to replace the need for apps altogether.

The vision here is that you can chat with Gemini, and it can generate an app on the fly to solve your problem. For the visualized landscaping app, it could just connect to landscapers via their Google Business Profile.

As an app developer, I'm actually not even against this. The amount of human effort that goes into creating and maintaining thousands of duplicative apps is wasteful.

verdverm

19 days ago

1 reply

This sounds like they creators think that even more duplicative apps that no one knows how it works or what the code even looks like... is a better idea?

How many times are users going to spin GPUs to create the same app?

turnsout

19 days ago

If Google's paying for the GPU time, I guess it's up to them how they want to cache apps for frequently-used queries. Glad I'm not paying for it!

ceuk

20 days ago

1 reply

A few days ago I was predicting to some colleagues a revival of ideas around "server-driven UI" (which never really seemed to catch on) in order to facilitate agentic UIs.

Feels good to have been on the money, but I'm also glad I didn't start a project only to be harpooned by Google straight away

20 days ago

1 reply

Server Driven UI has absolutely caught on. Not including all the Electron apps out there, things like Instagram's native mobile apps have about half of their screens being SDUI at this point because Meta needs to be able to change them instantly, not with a 3 week release cycle.

ceuk

19 days ago

Didn't know Instagram used it, that's cool

barbazoo

20 days ago

1 reply

This sounds like a way to have the LLM client render dynamic UI. I don’t get it that’s for use during the chat session or yet another way to build actual applications?

20 days ago

1 reply

Google PM here. Right now, it’s designed for rendering UI widgets inline with a chat conversation - it’s an extension to a2a that lets you stream JSON defining UI components in addition to chat messages.

20 days ago

Google SWE working in this space here. Look up my username (minus the digit) on Moma, let's talk. I can't ID you from your HN handle.

20 days ago

1 reply

I see how useful a universal UI language working across platforms is, but when I look at some examples from this protocol, I have the feeling it will eventually converge to what we already have, html. Instead of making all platforms support this new universal markup language, why not make them support html, which some already do, and which llms are already trained on.

Some examples from the documentation: { "id": "settings-tabs", "component": { "Tabs": { "tabItems": [ {"title": {"literalString": "General"}, "child": "general-settings"}, {"title": {"literalString": "Privacy"}, "child": "privacy-settings"}, {"title": {"literalString": "Advanced"}, "child": "advanced-settings"} ] } } }

{ "id": "email-input", "component": { "TextField": { "label": {"literalString": "Email Address"}, "text": {"path": "/user/email"}, "textFieldType": "shortText" } } }

20 days ago

4 replies

A key challenge with HTML is client side trust. How do I enable an agent platform (say Gemini, Claude, OpenAI) to render UI from an untrusted 3p agent that’s integrated with the platform? This is a common scenario in the enterprise version of these apps - eg I want to use the agent from (insert saas vendor) alongside my company’s home grown agents and data.

Most HTML is actually HTML+CSS+JS - IMO, accepting this is a code injection attack waiting to happen. By abstracting to JSON, a client can safely render UI without this concern.

20 days ago

2 replies

Right this makes sense, I wonder if it would then be a good idea to abstract html to JSON, making it impossible to include css and js into it

oooyay

20 days ago

1 reply

If you play with A2UIs generator that's effectively what it does, just layer of abstraction or two above what you're describing.

20 days ago

1 reply

That's what I thought too skimming through the documentation, my thinking is that since it does that, which makes sense to avoid script injection, why not do it with "jsonized" html.

oooyay

19 days ago

I was thinking that raw html might be too verbose, but canned components have signatures and types.

20 days ago

Curious to learn more what you are thinking?

One challenge is you do likely want JS to process/capture the data - for example, taking the data from a form and turning it into json to send back to the agent

epicurean

20 days ago

1 reply

Perhaps the protocol, is then html/css/js in a strict sandbox. Component has no access to anything outside of component bounds (no network, no dom/object access, no draw access, etc).

20 days ago

I think you can do that with an iframe, but it always makes me nervous

hulitu

14 days ago

> A key challenge with HTML is client side trust. How do I enable an agent platform (say Gemini, Claude, OpenAI) to render UI from an untrusted 3p agent that’s integrated with the platform?

Just like you do with your web browser. A web browser is a Remote Code Execution engine.

lunar_mycroft

20 days ago

If the JSON protocol in question supports arbitrary behaviors and styles, then you still have an injection problem even over JSON. If it doesn't support them you don't need to support those in an HTML protocol either, and you can solve the injection problem the way we already do: sanitizing the HTML to remove all/some (depending on your specific requirements) script tags, event listeners, etc.

alexgotoi

20 days ago

1 reply

So we're reinventing SOAP but for AI agents. Not saying that's bad - sometimes you need to remake old mistakes before you figure out what actually works.

The real question: do UIs even make sense for agents? Like the whole point of a UI is to expose functionality to humans with constraints (screens, mice, attention). Agents don't have those constraints. They can read JSON, call APIs directly, parse docs. Why are we building them middleware to click buttons?

I think this makes sense as a transition layer while we figure out what agent-native architecture looks like. But long-term it's probably training wheels.

Will include this in my https://hackernewsai.com/ newsletter.