Building Your Own CLI Coding Agent with Pydantic-AI
Key topics
The debate around Pydantic-AI is heating up, with some developers raving about its ease of use in building coding agents, while others express frustration with its limitations and quirks. As one commenter noted, Pydantic-AI has been a game-changer for their coding agent CLI project, but others have struggled with issues like non-streaming models. A maintainer of Pydantic-AI chimed in to clarify that streaming is now supported against various models, addressing some of the concerns. Meanwhile, a parallel discussion emerged about Pydantic's role in the Python ecosystem, with some wishing it were part of the core language and others suggesting alternative libraries like attrs + cattrs could fill the gap.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
1h
Peak period
21
0-6h
Avg / period
5
Based on 40 loaded comments
Key moments
- 01Story posted
Aug 28, 2025 at 2:34 PM EDT
4 months ago
Step 01 - 02First comment
Aug 28, 2025 at 3:51 PM EDT
1h after posting
Step 02 - 03Peak activity
21 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 30, 2025 at 11:40 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://github.com/caesarnine/rune-code
Part of the reason I switched to it initially wasn't so much it's niceties versus just being disgusted at how poor the documentation/experience of using LiteLLM was and I thought the folks who make Pydantic would do a better job of the "universal" interface.
Towards coding agents, I wonder if there are any good / efficient ways to measure how much different implementations work on coding? SWE-bench seems good, but expensive to run. Effectively I’m curious for things like: given tool definition X vs Y (eg. diff vs full file edit), prompt for tool X vs Y (how it’s described, does it use examples), model choice (eg. MCP with Claude, but python-exec inline with GPT-5), sub-agents, todo lists, etc. how much across each ablation, does it matter? And measure not just success, but cost to success too (efficiency).
Overall, it seems like in the phase space of options, everything “kinda works” but I’m very curious if there are any major lifts, big gotchas, etc.
I ask, because it feels like the Claude code cli always does a little bit better, subjectively for me, but I haven’t seen a LLMarena or clear A vs B, comparison or measure.
I've been using it a lot lately and anything beyond basic usage is an absolute chore.
I agree on the first part, but Pydantic is not important to Python. It's important to a lot of people, but it's absolutely unnecessary.
I don't know why people like this pattern of "fat" classes with built in de/serialisation/persistence logic at all. It makes so much more sense to have that at the edge and build entities and/or value objects directly. Using stuff like Pydantic or Django ORM you either end up completely coupling domain logic to serialisation logic, or you end up manually writing data mappers to/from your domain when you could have just used, for example, cattrs or SQLAlchemy. I guess it's the "easy vs simple" thing.
In my experience backends never stay simple enough for SQLModel so you might as well decouple from the get go. If it's literally just copy/paste between SQLAlchemy models and Pydantic just do the copy/pasting. Get an LLM to do it if you have to. It will be worth it in the long run. You'll want to change your db schema without breaking your API.
Pydantic still sees multiple commits per week, which is less than it was at one point, but I'd say that's a sign of its maturity and stability more than a lack of attention.
If the issue is still showing on the latest version, seeing the Pydantic model/schema would be very helpful.
Thanks for being a proactive kind of maintainer. The world is better because of people like you.
Nevertheless i would strongly recommend to not use directly the libraries of the ai providers as you get quickly locked in a extremely fast paced market where today's king can change weekly.
We do have a proprietary observability and evals product Pydantic Logfire (https://pydantic.dev/logfire), but Pydantic AI works with other observability tools as well, and Logfire works with other agent frameworks.
I strongly believe you guys should be compensated very well for what you bring to the ecosystem but the probability of open source projects being enshittified by private interests is non-trivially high.
And taking this 1 step further, its not that investors are evil people who want to bad things, but its their explicit job to make returns on their investment - its the basic mechanisms of idiom "show me an incentive and i'll show you the outcome"
The vast majority of bugs we encounter are not in Pydantic AI itself but rather in having to deal with supposedly OpenAI Chat Completions-compatible APIs that aren't really, and with local models ran through e.g. Ollama or vLLM that tend to not be the best at tool calling.
The big three model providers (OpenAI, Claude, Gemini) and enterprise platforms (Bedrock, Vertex, Azure) see the vast majority of usage and our support for them is very stable. It remains a challenge to keep up with their pace of shipping new features and models, but thanks to our 200+ contributors we're usually not far behind the bleeding edge in terms of LLM API feature coverage, and as you may have seen we're very responsive to issues and PRs on GitHub, and questions on Slack.
I haven't looked deeply into pydantic's implementation but this might be related to tool-usage vs completion [0], the backend LLM model, etc. All I know is that with the same LLM models, `openai.client.chat.completions` + a custom prompt to pass in the pydantic JSON schema + post-processing to instantiate SomePydanticModel(*json) creates objects successfully whereas vanilla pydantic-ai rarely does, regardless of the number of retries.
I went with what works in my code, but didn't remove the pydantic-ai dependency completely because I'm hoping something changes. I'd say that getting dynamic prompt context by leveraging JSON schemas, model-and-field docs from pydantic, plus maybe other results from runtime-inspection (like the actual source-code) is obviously a very good idea. Many people want something like "fuzzy compilers" with structured output, not magical oracles that might return anything.
Documentation is context, and even very fuzzy context is becoming a force multiplier. Similarly languages/frameworks with good support for runtime-inspection/reflection and have an ecosystem with strong tools for things like ASTs really should be the best things to pair with AI and agents.
[0]: https://github.com/pydantic/pydantic-ai/issues/582
That's very odd, would you mind sharing the Pydantic model / schema so I can have a look? (I'm a maintainer) What you're doing with a custom prompt that includes the schema sounds like our Prompted output mode (https://ai.pydantic.dev/output/#prompted-output), but you should get better performance still with the Native or Tool output modes (https://ai.pydantic.dev/output/#native-output, https://ai.pydantic.dev/output/#tool-output) which leverage the APIs' native strict JSON schema enforcement.
One thing I can say though.. my models differ from the docs examples mostly in that they are not "flat" with simple top-level data structures. They have lots of nested models-as-fields.
Thanks, this is a very interesting thread on multiple levels. It does seem related to my problem and I also learned about field docstrings :) I'll try moving my dependency closer to the bleeding edge
"AI" and vibe coding fits very well into that list.
0 - https://www.definite.app/
1 - https://pydantic.dev/articles/building-data-team-with-pydant...
Its agent model feels similar to OpenAI's: flexible and dynamic without needing to predefine a DAG. Execution is automatically traced and can be exported to Logfire, which makes observability pretty smooth too. Looking forward to their upcoming V1 release.
Shameless plug: I've been working on a DBOS [2] integration into Pydantic-AI as a lightweight durable agent solution.
[1] https://github.com/pydantic/pydantic-ai/pull/2638
[2] https://github.com/dbos-inc/dbos-transact-py