A2ui: a Protocol for Agent-Driven Interfaces
Key topics
The emergence of A2UI, a protocol for agent-driven interfaces, has sparked a lively debate about the proliferation of competing standards in the UI/UX space. While some commenters, like qsort, see the potential for dynamic interfaces and exciting use cases, others, like mbossie, worry about the fragmentation of the market, pointing out that multiple variants, including MCP-UI and Google's A2UI, are being introduced to solve the same problem. However, others argue that divergence and exploration are necessary in a field with many unknowns, and that the market will eventually settle on what works best, as fortydegrees notes that design is inherently opinionated and will always have multiple approaches. The xkcd comic about standards has become an obligatory reference, with zeroasterisk acknowledging the irony, yet justifying A2UI as a novel solution to a current market need.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1h
Peak period
68
0-12h
Avg / period
13
Based on 78 loaded comments
Key moments
- 01Story posted
Dec 16, 2025 at 4:16 AM EST
18 days ago
Step 01 - 02First comment
Dec 16, 2025 at 5:19 AM EST
1h after posting
Step 02 - 03Peak activity
68 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 22, 2025 at 1:32 AM EST
12 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
How many more variants are we introducing to solve the same problem. Sounds like a lot of wasted manhours to me.
I completely agree, though I'm personally sitting out all of these protocols/frameworks/libraries. In 6 months time half of them will have been abandoned, and the other half will have morphed into something very different and incompatible.
For the time being, I just build things from scratch, which–as others have noted¹–is actually not that difficult, gives you understanding of what goes on under the hood, and doesn't tie you to someone else's innovation pace (whether it's higher or lower).
¹ https://fly.io/blog/everyone-write-an-agent/
The same happened with GPUs in the 90s. When Jensen formed Nvidia there were 70 other companies selling Graphics Cards that you could put in a PCI slot. Now there are 2.
I can justify A2UI as doing something not otherwise accomplishable in the market today, but you saw how long the blog post was trying to explain that. :shrug:
Sounds like a lot of people got paid because of it. That's a win for them. It wasn't their decision, it was company decision to take part in the race. Most likely there will be more than 1 winner anyway.
Like you mentioned, its a good time to be employed.
AG-UI is a launch partner of A2UI, but it is a separate project by CopilotKit, not google.
We have a day-0 handshake between AG-UI & A2UI
I think AG UI is great if you are building the UI and the Agent at the same time and want a high bandwidth sync between them and the UI supports AG UI as an adaptor layer (they have done a lot of work making this easier for folks).
A2UI is most interesting for it's LLM generation options (not tools but structured output), it's remote message passing options (if you don't own the UI), and it's general-purpose-ness (same fairly simple standard which can work for many models, transports, and renderers).
They do fit nicely together. Sorry the naming conventions are complicated. There are 2 hard things in computer science: naming things, cache invalidation, off by one errors.
https://www.copilotkit.ai/ag-ui-and-a2ui
Making an agent call a tool to manipulate a UI does feel like normal application development and an event driven interaction... I get that.
What else drives your preference?
(emphasis mine)
Sounds like agents are suddenly able to do what developers have failed at for decades: Writing platform-independent UIs. Maybe this works for simple use cases but beyond that I'm skeptical.
[1] https://a2ui.org/renderers/
https://github.com/google/A2UI/pull/352
Thanks for the recommendation.
A2UI is a superset, expanding in to more element types. If we're going to have the origin of all our data streams be string-output-generators, this seems like an ok way to go.
I've joined an effort inside Google to work in this exact space, though what we're doing has no plan to become open source, other groups are working on stuff like A2UI and we collaborate with them.
My career previous to this was nearly 20 years of native platform UI programming and things like Flutter, React Native, etc have always really annoyed me. But I've come around this year to accept that as long as LLMs on servers are going to be where the applications of the future live, we need a client-OS agnostic framework like this.
It's about accomplishing a task, not making a bot accomplish a task using the same tools and embodiment context as a human - there's no upside, unless the bot is actually using a humanoid embodiment, and even then, using a CLI and service API is going to be preferable to doing things with UI in nearly every possible case, except where you want to limit to human-ish capabilities, like with gaming, or you want to deceive any monitors into thinking that a human is operating.
It's going to be infinitely easier to wrap a json get/push wrapper around existing APIs or automation interfaces than to universalize some sort of GUI interactions, because LLM's don't have the realtime memory you need to adapt to all the edge cases on the fly. It's incredibly difficult for humans, and hundreds of billions of dollars have been spent trying to make software universally accessible and dumbed down for users, and still ends up being either stupidly limited, or fractally complex in the tail, and no developer can ever account for all the possible ways in which users interact with a feature for any moderately complex piece of software.
Just use existing automation patterns. This is one case where if an AI picks up this capability alongside other advances, then awesome, but any sort of middleware is going to be a huge hack that immediately gets obsoleted by frontier models as a matter of course.
Why the hell would anyone want this? Why on earth would you trust an LLM to output a UI? You're just asking for security bugs, UI impersonation attacks, terrible usability, and more. This is a nightmare.
Freeform looks and acts like text, except for a set of things that someone vetted and made work.
If the interactive diagram or UI you click on now owns you, it doesn't matter if it was inside the chat window or outside the chat window.
Now, in this case, it's not arbitrary UI, but if you believe that the parsing/validation/rendering/two way data binding/incremental composition (the spec requires that you be able to build up UI incrementally) of these components: https://a2ui.org/specification/v0.9-a2ui/#standard-component...
as handled/renderered/etc by N library implementations, is not going to have security issues, i've got a bridge to sell you.
There is a wast difference in risk between me clicking a button provided by Claude in my Claude chat, on the basis of conversations I have had with Claude, and clicking a random button on a random website. Both can contain a malicious. One is substantially higher risk. Separately, linking a UI constructed this way up to an agent and let third parties interact with it, is much riskier to you than to them.
> If the interactive diagram or UI you click on now owns you, it doesn't matter if it was inside the chat window or outside the chat window.
In that scenario, the UI elements are irrelevant barring a buggy implementation (yes, I've read the rest, see below), as you can achieve the same things as you can do that way with just presenting the user with a basic link and telling them to press it.
> as transported/renderered/etc by NxM combinations of implementations (there are 4 renderers and a bunch of transports right now), is not going to have security issues, i've got a bridge to sell you.
I very much doubt we'll see many implementations that won't just use a web view for this, and I very much doubt these issues will even fall in the top 10 security issues people will run into with AI tooling. Sure, there will be bugs. You can use this argument against anything that requires changes to client software.
But if you're concerned about the security of clients, mcp and hooks is a far bigger rats nest of things that are inherently risky due to the way they are designed.
The vision here is that you can chat with Gemini, and it can generate an app on the fly to solve your problem. For the visualized landscaping app, it could just connect to landscapers via their Google Business Profile.
As an app developer, I'm actually not even against this. The amount of human effort that goes into creating and maintaining thousands of duplicative apps is wasteful.
How many times are users going to spin GPUs to create the same app?
Feels good to have been on the money, but I'm also glad I didn't start a project only to be harpooned by Google straight away
Some examples from the documentation: { "id": "settings-tabs", "component": { "Tabs": { "tabItems": [ {"title": {"literalString": "General"}, "child": "general-settings"}, {"title": {"literalString": "Privacy"}, "child": "privacy-settings"}, {"title": {"literalString": "Advanced"}, "child": "advanced-settings"} ] } } }
{ "id": "email-input", "component": { "TextField": { "label": {"literalString": "Email Address"}, "text": {"path": "/user/email"}, "textFieldType": "shortText" } } }
Most HTML is actually HTML+CSS+JS - IMO, accepting this is a code injection attack waiting to happen. By abstracting to JSON, a client can safely render UI without this concern.
One challenge is you do likely want JS to process/capture the data - for example, taking the data from a form and turning it into json to send back to the agent
Just like you do with your web browser. A web browser is a Remote Code Execution engine.
The real question: do UIs even make sense for agents? Like the whole point of a UI is to expose functionality to humans with constraints (screens, mice, attention). Agents don't have those constraints. They can read JSON, call APIs directly, parse docs. Why are we building them middleware to click buttons?
I think this makes sense as a transition layer while we figure out what agent-native architecture looks like. But long-term it's probably training wheels.
Will include this in my https://hackernewsai.com/ newsletter.
1. Establish SSE connection
... user event
7. send updates over origin SSE connection
So the client is required to maintain an SSE capable connection for the entire chat session? What if my network drops or I switch to another agent?
Seems an onerous requirement to maintain a connection for the life-time of a session, which can span days (as some people have told us they have done with agents)
Well this makes no sense. Models are trained on a ton of HTML and JavaScript and are pretty good at generating those, for simple use cases like the dashboards shown on this site and for more advanced ones.
HTML and JavaScript are already a "universal UI language", there is no need to build this kind of spec over it. Any spec expressive enough to specify UI behavior is the implementation itself.
However, I'm happy it's happening because you don't need an LLM to use the protocol.
In a context where you're chatting with an LLM, I suppose the user would expect some lag, but it would be unwelcome in regular apps.
This also means that a lot of other UI performance issues don't matter - form submission is going to be slow anyway, so just be transparent about the delay.
It is simple, effective and feels more native to me than some rigid data structure designed for very specific use-cases that may not fit well into your own problem.
Honestly, we should think of Emacs when working with LLMs and kind of try to apply the same philosophy. I am not a fan of Emacs per-se but the parallels are there. Everything is a file and everything is a text in a buffer. The text can be rendered in various ways depending on the consumer.
This is also the philosophy that we use in our own product and it works remarkably well for diverse set of customers. I have not encountered anything that cannot be modelled in this way. It is simple, effective and it allows for a great degree of flexibility when things are not going as well as planned. It works well with streaming too (streaming parsers are not so difficult to do with simple text structures and we have been doing this for ages) and LLMs are trained very well how to produce this type of output - vs anything custom that has not been seen or adopted yet by anyone.
Besides, given that LLMs are getting good at coding and the browser can render iframes in seamless mode, a better and more flexible approach would be to use HTML, CSS and JavaScript instead of what Slack has been doing for ages with their block kit API which we know is very rigid and frustrating to work with. I get why you might want to have a data structures for UI in order to cover CLI tools as well but at the end of the day browsers and clis are completely different things and I don not believe you can meaningfully make it work for both of them unless you are also prepared to dumb it down and target only the lowest common dominator.
Yes yes we claim the user doesn’t know what they want. I think that’s largely used as an excuse to avoid rethinking how things should meet the users needs and keep status quo where people are made to rely on systems and walled gardens. The goal of this article is UIs should work better for the user. What better way then to let them imagine (or even nudge them with example actions, buttons, text to click to render specific views) in the UI! I’ve been wanting to build something where I just ask in English from options I know I have or otherwise play and hit edges to discover what’s possible and not.
Anyone else thinking along this direction or think I’m missing something obvious here?
The genuinely interesting bit here is the security boundary: agents can only speak in terms of a vetted component catalog, and the client owns execution. If you get that right, you can swap the agent for a rules engine or a human operator and keep the same protocol. My guess is the spec that wins won’t be the one with the coolest demos, but the one boring enough that a product team can live with it for 5-10 years.
https://research.google/blog/generative-ui-a-rich-custom-vis...
What scares me is that even without arbitrary code generation, there's the potential for hallucinations and prompt injection to hit hard if a solution like this isn't sandboxed properly. An automatically generated "confirm purchase" button like in the shown example is... probably something I'd not make entirely unsupervised just yet.