Building Your Own CLI Coding Agent with Pydantic-AI

Posted4 months agoActive4 months ago

vinhnx

197 points

40 comments

martinfowler.comTech Discussionstory

informativepositive

Debate

20/100

Pydantic-AICommand Line ToolDevelopment_tools

Key topics

Pydantic-AI

Command Line Tool

Development_tools

The debate around Pydantic-AI is heating up, with some developers raving about its ease of use in building coding agents, while others express frustration with its limitations and quirks. As one commenter noted, Pydantic-AI has been a game-changer for their coding agent CLI project, but others have struggled with issues like non-streaming models. A maintainer of Pydantic-AI chimed in to clarify that streaming is now supported against various models, addressing some of the concerns. Meanwhile, a parallel discussion emerged about Pydantic's role in the Python ecosystem, with some wishing it were part of the core language and others suggesting alternative libraries like attrs + cattrs could fill the gap.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

0-6h

Avg / period

Comment distribution40 data points

Loading chart...

Based on 40 loaded comments

Key moments

01Story posted
Aug 28, 2025 at 2:34 PM EDT
4 months ago
Step 01
02First comment
Aug 28, 2025 at 3:51 PM EDT
1h after posting
Step 02
03Peak activity
21 comments in 0-6h
Hottest window of the conversation
Step 03
04Latest activity
Aug 30, 2025 at 11:40 PM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (40 comments)

Showing 40 comments

binalpatel

4 months ago

1 reply

Pydantic-AI is lovely - I've been working on a forever, fun project to build a coding agent CLI for a year plus now. IMO it does make constructing any given agent very easy, though the lower level APIs are a little painful to use but they seem to be aware of that.

https://github.com/caesarnine/rune-code

Part of the reason I switched to it initially wasn't so much it's niceties versus just being disgusted at how poor the documentation/experience of using LiteLLM was and I thought the folks who make Pydantic would do a better job of the "universal" interface.

ziftface

4 months ago

1 reply

I had the opposite experience. I liked the niceties of Pydantic AI, but had trouble with it that I found difficult to deal with. For example, some of the models wouldn't stream, but the OpenAI models did. It took months to resolve, and well before that I switched to LiteLLM and just hand-rolled the agentic logic stuff. LiteLLM's docs were simple and everything worked as expected. The agentic code is simple enough that I'm not sure what the value-add for some of these libraries is besides adding complexity and the opportunity for more bugs. I'm sure for more complex use cases they can be useful, but for most of the applications I've seen, a simple translation layer like LiteLLM or maybe OpenRouter is more than enough.

DouweM

4 months ago

I'm not sure how long ago you tried streaming with Pydantic AI, but as of right now we (I'm a maintainer) support streaming against the OpenAI, Claude, Bedrock, Gemini, Groq, HuggingFace, and Mistral APIs, as well as all OpenAI Chat Completions-compatible APIs like DeepSeek, Grok, Perplexity, Ollama and vLLM, and cloud gateways like OpenRouter, Together AI, Fireworks AI, Azure AI Foundry, Vercel, Heroku, GitHub and Cerebras.

bluecoconut

4 months ago

1 reply

After maintaining my own agents library for a while, I’ve switched over to pydantic ai recently. I have some minor nits, but overall it's been working great for me. I’ve especially liked combining it with langfuse.

Towards coding agents, I wonder if there are any good / efficient ways to measure how much different implementations work on coding? SWE-bench seems good, but expensive to run. Effectively I’m curious for things like: given tool definition X vs Y (eg. diff vs full file edit), prompt for tool X vs Y (how it’s described, does it use examples), model choice (eg. MCP with Claude, but python-exec inline with GPT-5), sub-agents, todo lists, etc. how much across each ablation, does it matter? And measure not just success, but cost to success too (efficiency).

Overall, it seems like in the phase space of options, everything “kinda works” but I’m very curious if there are any major lifts, big gotchas, etc.

I ask, because it feels like the Claude code cli always does a little bit better, subjectively for me, but I haven’t seen a LLMarena or clear A vs B, comparison or measure.

venkyk

4 months ago

https://www.tbench.ai/ the article also refers to this benchmark

iLoveOncall

4 months ago

3 replies

I really wish Pydantic invested in... Pydantic, instead of some AI API wrapper garbage.

I've been using it a lot lately and anything beyond basic usage is an absolute chore.

dcreater

4 months ago

3 replies

I wish Python would improve to bridge the gaps between pydantic and dataclasses, so that we don't have to rely on pydantic. It's too foundational a piece to not be part of the core python anymore

devjab

4 months ago

1 reply

> I wish Python would improve to bridge the gaps between pydantic and dataclasses, so that we don't have to rely on pydantic. It's too foundational a piece to not be part of the core python anymore

I agree on the first part, but Pydantic is not important to Python. It's important to a lot of people, but it's absolutely unnecessary.

dcreater

4 months ago

its the ideal way to do structured outputs with LLMs and Python has become the default language for AI applications. I would wager that structured outputs are probably the no.1 usecase in AL applications

globular-toast

4 months ago

1 reply

It's just de/serialisation. IMO if you feel totally dependent on a single library for doing this for you then you're doing too much in that layer. This should be a tiny thin layer right at the edge of your application just for getting data in/out. I've seen people build almost their entire application in Pydantic classes. You're doing it wrong.

I don't know why people like this pattern of "fat" classes with built in de/serialisation/persistence logic at all. It makes so much more sense to have that at the edge and build entities and/or value objects directly. Using stuff like Pydantic or Django ORM you either end up completely coupling domain logic to serialisation logic, or you end up manually writing data mappers to/from your domain when you could have just used, for example, cattrs or SQLAlchemy. I guess it's the "easy vs simple" thing.

dcreater

4 months ago

1 reply

would you say the same of SQLModel? Which I think is an ideal solution (theoretically), better than SQLAlchemy

globular-toast

4 months ago

SQLModel is even worse as it couples everything together. It's only good for basically building a JSON interface on top of a database. I guess it could be useful for a very basic web app. But the moment you start needing logic on the backend beyond basic read/write permissions you're screwed. At that point you either rewrite or proceed to build a big ball of mud.

In my experience backends never stay simple enough for SQLModel so you might as well decouple from the get go. If it's literally just copy/paste between SQLAlchemy models and Pydantic just do the copy/pasting. Get an LLM to do it if you have to. It will be worth it in the long run. You'll want to change your db schema without breaking your API.

JimDabell

4 months ago

attrs + cattrs is pretty close. I know it’s not in the stdlib, but dataclasses were modelled on attrs in the first place and using attrs + cattrs feels quite a bit more idiomatic than Pydantic.

DouweM

4 months ago

1 reply

I'm curious what issues you've run into, do you happen to have GitHub links so I can have a look? (I'm a maintainer.)

Pydantic still sees multiple commits per week, which is less than it was at one point, but I'd say that's a sign of its maturity and stability more than a lack of attention.

__MatrixMan__

4 months ago

1 reply

My experience is that pretty frequently the LLM just refuses to actually supply json conforming to the model and summarizes the input instead. Even with several retries configured I still couldn't rely on it. I only spent an afternoon with it though so it's possible I'm just doing it wrong (either in how I'm prompting or in how I'm configuring pydantic-ai).

DouweM

4 months ago

1 reply

How recently was that? I made a few improvements earlier this month: https://news.ycombinator.com/item?id=45058214

If the issue is still showing on the latest version, seeing the Pydantic model/schema would be very helpful.

__MatrixMan__

4 months ago

It was about a month ago. I'll take another swing at it and make an issue if I can't overcome it.

Thanks for being a proactive kind of maintainer. The world is better because of people like you.

drdeafenshmirtz

4 months ago

how so? Was thinking of using it for my next project - would love to hear some of the risks

siva7

4 months ago

2 replies

These abstractions are nice to not get locked in with one llm provider - but like with langchain - once you use some more niche feature the bugs do shine through. I tried it out with structured output for azure openai but had to give up since somewhere somewhat was broken and it's difficult to figure out if it's the abstraction or the library of the llm provider which the abstraction uses.

Nevertheless i would strongly recommend to not use directly the libraries of the ai providers as you get quickly locked in a extremely fast paced market where today's king can change weekly.

dcreater

4 months ago

1 reply

In this example, you get locked into pydantic_ai, another proprietary provider.

DouweM

4 months ago

1 reply

How do you mean? Pydantic AI (which I'm a maintainer of) is completely open source.

We do have a proprietary observability and evals product Pydantic Logfire (https://pydantic.dev/logfire), but Pydantic AI works with other observability tools as well, and Logfire works with other agent frameworks.

dcreater

4 months ago

1 reply

thanks for clarifying. I guess my comment was more directed to the fact that pydantic, the company is 1) VC backed 2) Unclear how/when/what you will monetize 3) how that will affect the open source stuff.

I strongly believe you guys should be compensated very well for what you bring to the ecosystem but the probability of open source projects being enshittified by private interests is non-trivially high.

gazpacho

4 months ago

1 reply

I work at Pydantic and while the future is obviously unpredictable I can vow for all of us in that we do not intend to ever start charging for any of our open source things. We’ve made a very clear delineation between what is free (pydantic, pydantic-ai, the logfire SDK, etc) and what is a paid product (the Logfire SaaS platform). Everything open source is liberally licensed such that no matter the fate of the company it can be forked. Even the logfire SDK, the thing most integrated to our commercial offering, speaks OTLP and hence you can point it at any other provider, basically no lock in.

dcreater

4 months ago

I appreciate that and honestly I never doubt the employees, or perhaps even the founders. When looking into the future, Its the investors that are not to be trusted and they call the shots commensurate to their ownership stake - which is again opaque to us in the case of pydantic.

And taking this 1 step further, its not that investors are evil people who want to bad things, but its their explicit job to make returns on their investment - its the basic mechanisms of idiom "show me an incentive and i'll show you the outcome"

DouweM

4 months ago

1 reply

Pydantic AI maintainer here! Did you happen to file an issue for the problem you were seeing with Azure OpenAI?

The vast majority of bugs we encounter are not in Pydantic AI itself but rather in having to deal with supposedly OpenAI Chat Completions-compatible APIs that aren't really, and with local models ran through e.g. Ollama or vLLM that tend to not be the best at tool calling.

The big three model providers (OpenAI, Claude, Gemini) and enterprise platforms (Bedrock, Vertex, Azure) see the vast majority of usage and our support for them is very stable. It remains a challenge to keep up with their pace of shipping new features and models, but thanks to our 200+ contributors we're usually not far behind the bleeding edge in terms of LLM API feature coverage, and as you may have seen we're very responsive to issues and PRs on GitHub, and questions on Slack.

siva7

4 months ago

Thanks for working on pydantic-ai. I digged up the issue - it seems to have been fixed with the recent releases related to how strictness is handled.

photonthug

4 months ago

1 reply

I wanted to love pydantic AI as much as I love pydantic but the killer feature is pydantic-model-completion and weirdly.. it has always seemed to work better for me when I naively build it from scratch without pydantic AI.

I haven't looked deeply into pydantic's implementation but this might be related to tool-usage vs completion [0], the backend LLM model, etc. All I know is that with the same LLM models, `openai.client.chat.completions` + a custom prompt to pass in the pydantic JSON schema + post-processing to instantiate SomePydanticModel(*json) creates objects successfully whereas vanilla pydantic-ai rarely does, regardless of the number of retries.

I went with what works in my code, but didn't remove the pydantic-ai dependency completely because I'm hoping something changes. I'd say that getting dynamic prompt context by leveraging JSON schemas, model-and-field docs from pydantic, plus maybe other results from runtime-inspection (like the actual source-code) is obviously a very good idea. Many people want something like "fuzzy compilers" with structured output, not magical oracles that might return anything.

Documentation is context, and even very fuzzy context is becoming a force multiplier. Similarly languages/frameworks with good support for runtime-inspection/reflection and have an ecosystem with strong tools for things like ASTs really should be the best things to pair with AI and agents.

[0]: https://github.com/pydantic/pydantic-ai/issues/582

DouweM

4 months ago

1 reply

> All I know is that with the same LLM models, `openai.client.chat.completions` + a custom prompt to pass in the pydantic JSON schema + post-processing to instantiate SomePydanticModel(*json) creates objects successfully whereas vanilla pydantic-ai rarely does, regardless of the number of retries.

That's very odd, would you mind sharing the Pydantic model / schema so I can have a look? (I'm a maintainer) What you're doing with a custom prompt that includes the schema sounds like our Prompted output mode (https://ai.pydantic.dev/output/#prompted-output), but you should get better performance still with the Native or Tool output modes (https://ai.pydantic.dev/output/#native-output, https://ai.pydantic.dev/output/#tool-output) which leverage the APIs' native strict JSON schema enforcement.

photonthug

4 months ago

1 reply

Thanks for the reply. Native output is indeed what I'm shooting for. I can't share the model directly right now, but putting together a min-repro and moving towards and actual bug report is something on todo list.

One thing I can say though.. my models differ from the docs examples mostly in that they are not "flat" with simple top-level data structures. They have lots of nested models-as-fields.

DouweM

4 months ago

1 reply

Thanks, a reproducible example would be very useful. Note that earlier this month I made Pydantic AI try a lot harder to use strict JSON mode (in response to feedback from Python creator Guido of all people: https://github.com/pydantic/pydantic-ai/issues/2405), so if you haven't tried it in a little while, the problem you were seeing may very well have been fixed already!

photonthug

4 months ago

> https://github.com/pydantic/pydantic-ai/issues/2405

Thanks, this is a very interesting thread on multiple levels. It does seem related to my problem and I also learned about field docstrings :) I'll try moving my dependency closer to the bleeding edge

gck1

4 months ago

1 reply

Curiously, I explicitly tell all my LLM agents to never touch Pydantic models or environment stuff - it’s even in big, uppercase, bold text in my custom instructions for Roo-Code. LLMs seem to trip a lot over Pydantic’s magic.

__MatrixMan__

4 months ago

pydantic-AI is a bit different than pydantic. The LLM isn't prompted to generate the pydantic model, instead it's encouraged to take input in the form of one model and produce output in the form of another.

yahoozoo

4 months ago

1 reply

Am I correct in thinking that it would cost more if you used your own agent with Sonnet 4 than going through Claude Code since you would have to go through the Anthropic API? What models do folks with custom agents usually use? And what kind of prompts seem to provide the same responses that Claude Code would give you?

seunosewa

4 months ago

1 reply

Claude Code is free if you have a subscription to Claude. It's much more affordable than using the API key directly.

wasteofelectron

4 months ago

Doubt it, based on my ccusage output. I do about the monthly charge in tokens in a day.

bgwalter

4 months ago

Fowler has pushed Agile, UML, Design Patterns, NoSQL, Extreme Programming, Pair Programming.

"AI" and vibe coding fits very well into that list.

tinodb

4 months ago

So what makes this agent any different than using Claude Code with those MCPs? They talk about making it “custom to the project”, but don’t show that?

mritchie712

4 months ago

we[0] have a pretty complex agent[1] running on Pydantic AI. The team is very responsive to bugs / feature requests. If I had to do it over again, I'd pick Pydantic AI again.

0 - https://www.definite.app/

1 - https://pydantic.dev/articles/building-data-team-with-pydant...

qianli_cs

4 months ago

I've been building an integration [1] with Pydantic AI and the experience has been great. Questions usually get answered within a few hours, and the team is super responsive and supportive for external contributors. The public API is easy to extend for new functionality (in my case, durable agents).

Its agent model feels similar to OpenAI's: flexible and dynamic without needing to predefine a DAG. Execution is automatically traced and can be exported to Logfire, which makes observability pretty smooth too. Looking forward to their upcoming V1 release.

Shameless plug: I've been working on a DBOS [2] integration into Pydantic-AI as a lightweight durable agent solution.

[1] https://github.com/pydantic/pydantic-ai/pull/2638

[2] https://github.com/dbos-inc/dbos-transact-py

View full discussion on Hacker News

ID: 45055439Type: storyLast synced: 11/20/2025, 3:41:08 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN