Pypi in 2025: a Year in Review
Key topics
The PyPI team's 2025 year in review has sparked a lively discussion about the long-deprecated `pip search` feature and the challenges of implementing a scalable search interface for the Python package repository. Commenters weighed in on the importance of CLI search functionality, with some arguing it's no longer crucial, while others pointed out that a well-funded infrastructure could make it happen. The PyPI team clarified that search is a complex, unbounded context that doesn't lend itself to caching, making it harder to implement than the existing package hosting interface [miketheman]. As one commenter noted, alternatives like querying the published package dump or using the web search interface are available, but the debate highlights the ongoing need for a seamless search experience.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4h
Peak period
35
Day 1
Avg / period
9.8
Based on 39 loaded comments
Key moments
- 01Story posted
Dec 31, 2025 at 2:08 PM EST
11 days ago
Step 01 - 02First comment
Dec 31, 2025 at 6:11 PM EST
4h after posting
Step 02 - 03Peak activity
35 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Jan 11, 2026 at 2:00 AM EST
22h ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Search is an unbounded context and does not lend itself to caching very well, as every search can contain anything
And anyway, hit rates are going to be pretty good. You're not taking arbitrary queries, the domain is pretty narrow. Half the queries are going to be for requests, pytorch, numpy, httpx, and the other usual suspects.
2. apt repositories are cryptographically signed, centrally controlled, and legally accountable.
3. apt search is understood to be approximate, distro-scoped, and slow-moving. Results change slowly and rarely break scripts. PyPI search rankings change frequently by necessity
4. Turning PyPI search into an apt-like experience would require distributing a signed, periodically refreshed global metadata corpus to every client. At PyPI’s scale, that is nontrivial in bandwidth, storage, and governance terms
5. apt search works because the repository is curated, finite, and opinionated
(Which isn’t to say I disagree with you about scale not being the main issue, just to offer some nuance. Another piece of nuance is the fact that distributions are the source of metadata but users think in terms of projects/releases.)
Why would you build a dedicated tool for this instead of just using a search engine? If I'm looking for a specific keyword in some project's very long README I'm searching kagi, not npm.
I'd expect that the most you should be indexing is the data in the project metadata (setup.py). That could be unbounded but I can't think of a compelling reason not to truncate it beyond a reasonable length.
(Note PyPI can’t index metadata from a `setup.py` however, since that would involve running arbitrary code. PyPI needs to be given structured metadata, and not all distributions provide that.)
Even including those, it's what? Sub-20-30GB.
The moment you expose that same service to a ubiquitous CLI like pip, the workload changes qualitatively.
PyPI has the /simple endpoint that the CDN can handle.
It’s PyPI philosophy that search happens on the website and pip has aligned to that. Pip doesn’t want to make a web scraper understandably so the function of searching remains disabled
For simple use cases, you have the web search, and you can curl it.
(I think the biggest blocker on CLI search isn’t infrastructure, but that there’s no clear agreement on the value of CLI search without a clear scope of what that search would do. Just listing matches over the package names would be less useful than structured metadata search for example, but the latter makes a lot of assumptions about the availability of structured metadata!)
However, I get a lot of mileage out of package repository search with package managers like pacman, apt, brew, winget, chocolatey and npm.
> I think the biggest blocker on CLI search isn’t infrastructure
It's why it was shut down, the API was getting hammered and it cost too much to run at a reasonable speed and implement rate limiting or whatever.
Sort of: the original search API used a POST and was structured with XML-RPC. PyPI’s operators went to great efforts to scale it, but that wasn’t a great starting point. A search API designed around caching (like the one used on PyPI’s web UI) wouldn’t have those problems.
Side issue: anyone else seeing that none of the links in the article work? They're all 404s.
Happy New Year!
Edit: my bad it seems you meant the opposite. Absolutely fantasy but a man can certainly dream lol
Why do people come up with such unbelievably complex solutions that don’t actually achieve what a simple solution could do?
Trusted Publishing approximately involves a service like GitHub proving to somebody that some release artifact came from a GitHub Actions workflow file with a particular name, possibly in a particular commit. Never mind that GitHub Actions is an unbelievable security nightmare and that it’s probably not particularly hard for a malicious holder of GitHub credentials to stealthily or even completely silently compromise their own Actions workflow to produce malicious output.
But even ignoring that, it’s wildly unclear what is “trusted”. PyPI encourages developers to also use “attestations”. Read this and try to tell me what is being attested to:
https://docs.pypi.org/attestations/producing-attestations/
But I did learn that this is based on Sigstore. Sigstore is very impressive: it’s a system by which GitHub can attest via OIDC to various state, and a service called Fulcio (which we’re supposed to trust) uses its secret key to sign a message stating that GitHub produced some data and proved its identity via OIDC at a certain time. There’s even a transparency log. Except that, for some reason, Fulcio doesn’t do that at all. Instead it issues an X.509 certificate with an expiration in the near future and the Sigstore client (which is hopefully a bit trustworthy) is supposed to use the private key (which it knows, in the clear, but is supposed to immediate forget) to sign a message. And then a separate transparency log records the signature and supposedly timestamps it so everyone one can verify the attestation later even though the certificate is expired! Why not just sign the message on the Fulcio server (which has an HSM, hopefully) directly?
All of this is trying to cryptographically tie a package on PyPI.org to a git tag. But: why not just do it directly? For most pure Python packages, which is a whole lot of packages, the distribution artifact is literally a zip file containing files from git, verbatim, plus some metadata. PyPI could check the GitHub immutable tag, read the commit hash, and verify the whole chain of hashes from the files to the tree to the commit. Or PyPI could even run the build process itself in a sandbox. (If people care about .pyc files, PyPI could regenerate them (again, in a sandbox), but omitting them might make sense too — after all, uv doesn’t even build them by default.) This would give much stronger security properties with a much more comprehensible system and no dependence on the rather awful security properties of GitHub Actions.
Why not Just(TM) enforce a reproducible build process? That brings some of its own challenges, but would represent a real upgrade over building out some Swiss cheese like this.
> PyPI could check the GitHub immutable tag, read the commit hash, and verify the whole chain of hashes from the files to the tree to the commit.
Tags are neither immutable nor durable on GitHub. This also breaks in the presence of any non-trivial build backend, including anything that produces a non-pure-Python wheel. Complexity is often bad, but just about every complex aspect of PyPI's attestation scheme has a reason behind it.
[1]: https://docs.pypi.org/attestations/publish/v1/
> More than 130,000 new projects created
So what exactly is going to prevent PyPI from becoming a morass of supply chain attacks like NPM etc.? The cited security measures seem like they won't, but it also seems like a very hard problem to solve.
That's something like triple the amount from 2023, yes?