Python Workers Redux: Fast Cold Starts, Packages, and a Uv-First Workflow
Key topics
The debate around Python Workers' advancements has sparked a lively discussion, with commenters scrutinizing the comparison to AWS Lambda and pointing out the omission of AWS's "SnapStart for Python" feature, which significantly reduces cold start times. The author clarified that the omission was unintentional and has since updated the blog post to reflect this. Meanwhile, some commenters are questioning the practical applications of Python Workers beyond simple scripts, while others are weighing in on the trade-offs between using wasm and containers for deployment. As the discussion unfolds, a nuanced understanding of the benefits and limitations of Python Workers is emerging.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
57m
Peak period
38
60-72h
Avg / period
11.3
Based on 79 loaded comments
Key moments
- 01Story posted
Dec 8, 2025 at 9:42 AM EST
25 days ago
Step 01 - 02First comment
Dec 8, 2025 at 10:39 AM EST
57m after posting
Step 02 - 03Peak activity
38 comments in 60-72h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 13, 2025 at 3:07 AM EST
20 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://pyodide.org/en/stable/project/changelog.html#version...
Bummer, looks like a lot of useful geo/data tools got removed from the Pyodide distribution recently. Being able to use some of these tools in a Worker in combination with R2 would unlock some powerful server-side workflows. I hope they can get added back. I'd love to adopt CF more widely for some of my projects, and seems like support for some of this stuff would make adoption by startups easier.
That said, it's worth noting that Lambda's SnapStart costs money and needs to be enabled explicitly. Python Workers use snapshots by default and we don't charge extra for it.
> AWS Lambda (No SnapStart) Mean Cold Start: 2.513s Data Points: 1008 > AWS Lambda (SnapStart) Mean Cold Start: 0.855s Data Points: 17 > Google Cloud Run Mean Cold Start: 3.030s Data Points: 394 > Cloudflare Workers Mean Cold Start: 1.004s Data Points: 981
https://cold.edgeworker.net
In practice, Workers + Pyodide is forcing a much sharper line between init-time and request-time state than most Python codebases have today. If you lean into that model, you get very cheap isolates and global deploys with fast cold starts. If your app depends on the broader CPython/C-extension ecosystem behaving like a mutable Unix process, you are still in container land for now. My hunch is the long-term story here will be less about the benchmark numbers and more about how much of “normal” Python can be nudged into these snapshot-friendly constraints.
I have warm pool of lightweight containers that can be reused between runs. And that's the crucial detail that makes or breaks it. The good news is that you can lock it down with seccomp while still allowing normal execution. This will give you 10-30ms starts with pre-compiled python packages inside container. Cold start is as fast as spinning new container 200-ish ms. If you run this setup close to your data, you can get fast access to your files which is huge for data related tasks.
But this is not suitable for type of deployment Cloudflare is doing. The question is whether you even want that global availability because you will trade it for performance. At the end of the day, they are trying to reuse their isolates infra which is very smart and opens doors to other wasm-based deployments.
This hampers the per user databases workflow.
Would be awesome if a fix lands.
I believe this is possible, you can create D1 databases[1] using Cloudflare's APIs and then deploy a worker using the API as well[2].
1 - https://developers.cloudflare.com/api/resources/d1/subresour...
2 - https://developers.cloudflare.com/api/resources/workers/subr...
Extensions are easy to enable, file a bug on https://github.com/cloudflare/workerd . (Though this one might be trickier than most as we might have to do some build engineering.)
Re filing an issue - sounds straightforward, will do!
That's not possible without updating worker bindings like you showed and further - there is an upper limit of 5000 bindings per worker and just 5000 users then becomes the upper limit although D1 allows 50,000 databases easily with further possible by requesting a limit increase.
What is a Durable Object? It's just a Worker that has a name, so you can route messages specifically to it from other Workers. Each one also has its own SQLite database attached. In fact, the SQLite database is local, so you can query it synchronously (no awaits), which makes a lot of stuff faster and easier. You can easily create millions of Durable Objects.
(I am the lead engineer for Workers.)
In contrast, AWS provides this as the base thing, you choose where your services run. In a world where you can’t do anything without 100s of compliance and a lot of compliances require geolocation based access control or data retention, this is absurd.
There is no paid business plan that supports this. You have to be millions of dollars worth of enterprise on their enterprise plan to get it through your dedicated account manager.
Your app works distributed/globally on the go.
Additionally, every Enterprise feature will become available in time ( discussed during their previous quarter earnings)
I can't say I've ever experienced this. Are you sure it's not related to other things in the script?
I wrote a single file Python script, it's a few thousand lines long. It can process a 10,000 line CSV file and do a lot of calculations to the point where I wrote an entire CLI income / expense tracker with it[0].
The end to end time of the command takes 100ms to process those 10k lines, that's using `time` to measure it. That's on hardware from 2014 using Python 3.13 too.
[0]: https://github.com/nickjj/plutus
It's because of module imports, primarily and generally. It's worse with many small files than a few large ones (Python 3 adds a little additional overhead because of needing extra system calls and complexity in the import process, to handle `__pycache__` folders. A great way to demonstrate it is to ask pip to do something trivial (like `pip --version`, or `pip install` with no packages specified), or compare the performance of pip installed in a venv to pip used cross-environment (with `--python`).
Either way, at least on my system with cached file attributes, python can startup in 10ms, so it's not clear whether you truly need to optimize much more than that (by identifying remaining bits to optimize), versus solving the problem another way (not statting 500 files, most of which don't exist, every time you start up).
`time pip3 --version` takes 230ms on my machine.
`time pip3 --version` takes ~200ms on my machine. `time go help` takes 25, and prints out 30x more lines than pip3 --version.
Probably a decent chunk of that actually is the Python runtime starting up. I don't know what all you `import` that isn't implied at startup, though.
Another chunk might be garbage collection at process exit.
This benchmark is a little bit outdated but Python problem remains the same.
Interpreter initialization: Python builds and initializes its entire virtual machine and built-in object structures at startup. Native programs already have their machine code ready and need very little runtime scaffolding.
Dynamic import system: Python’s module import machinery dynamically locates, loads, parses, compiles, and executes modules at runtime. A compiled binary has already linked its dependencies.
Heavy standard library usage: Many Python programs import large parts of the standard library or third-party packages at startup, each of which runs top-level initialization code.
This is especially noticeable if you do not run on an M1 Max Ultra, but on some slower hardware. From the results on Rasperberry PI 3:
C: 2.19 ms
Go: 4.10 ms
Python3: 197.79 ms
This is is about 200ms startup latency for a print("Hello World!") in Python3.
Anyway, your analysis of causes reads like something AI generated and pasted in. It's awkward in the context of the rest of your post, and 2 of the 3 points are clearly irrelevant to a "hello world" benchmark.
A go program with
takes < 10ms.Compare:
to I get: (different hardware as I'm at home).I wrote another that counts the lines in a file, and tested it against https://www.gutenberg.org/cache/epub/2600/pg2600.txt
I get:
These are toy programs, but IME that these gaps stay as your programs get biggerI believe in the past people have looked at putting the standard library in a zip file instead of splatted out into a bunch of files in a dirtree. In that case, I think python would just do a few stats, find the zipfile, loaded the whole thing into RAM, and then index into the file.
"If python was implemented totally different it might be fast" - sure, but it's not!
It's tooling agnostic and there are a couple ways to generate them, but the easiest it to just use pants build.
Pants also does dependency traversal (that's the main reason we started using it, deploying a microservices monorepo) so it only packages the necessary modules.
I haven't profiled it yet for cold starts, maybe I'll test that real quick.
https://www.pantsbuild.org/dev/docs/python/overview/pex
You can already lazy import in python, but the new system makes the syntax sweeter and avoids having to have in-function `import module` calls, which some linters complain about.
Here is my quick benchmark. I refrain from using Python for most scripting/prototyping task but really like Janet [0] - here is a comparison for printing the current time in Unix epoch:
[0]: https://janet-lang.org/(Side note this is why jj is awesome. A `jj log` is almost as fast as `ls`).
For instance `uv run` has its own fair share of overhead.
Regarding cold-starts, I strongly believe V8 snapshots are perhaps not the best way to achieve fast cold starts with Python (they may be if you are tied to using V8, though!), and will have wide side effects if you go out of the standards packages included on the Pyodide bundle.
To put some perspective: V8 snapshots are storing the whole state of an application (including it's compiled modules). This means that for a Python package that is using Python (one wasm module) + Pydantic-core (one wasm module) + FastAPI... all of those will be included in one snapshot (as well as the application state). This makes sense for browsers, where you want to be able to inspect/recover everything at once.
The issue about this design is that the compiled artifacts and the application state are bundled into one piece artifact (this is not great for AOT design, but might be the optimal design for JITs though).
Ideally, you would separate each of the compiled modules from the state of the application. When you do this, you have some advantages: you can deserialize the compiled modules in parallel, and untie the "deserialization" from recovering the state of the application. This design doesn't adapt that well into the V8 architecture (and how it compiles stuff) when JavaScript is the main driver of the execution, however it's ideal when you just use WebAssembly.
This is what we have done at Wasmer, which allows for much faster cold starts than 1 second. Because we cache each of the compiled modules separately, and recover the state of the application later, we can achieve cold-starts that are a magnitude faster than Cloudflare's state of the art (when using pydantic, fastapi and httpx).
If anyone is curious, here is a blogpost where we presented fast-cold starts for the application state (note that the deserialization technique for Wasm modules is applied automatically in Wasmer, and we don't showcase it on the blogpost): https://wasmer.io/posts/announcing-instaboot-instant-cold-st...
Use lazy/dynamic imports and you will see it drop .
Real question : what would you do more with the spared time ? You are that in a hurry in your life ?
On my linux system where all the file attributes are cached, it takes about 12ms to completely start, run a pass statement, and exit.
A modern machine shouldn’t take this long, so likely something big is being imported unnecessarily at startup.
I used both for years. Nothing beats VPS/bare metal. Alright, they give lower latency, and maybe cheaper and big nightmare for managing at the same time. Hello to micro services architecture.
Currently pagespeed.web.dev score drops by around 20 than self hosted version. One of the best features of Next.js, Image optimization doesn't have out of the box support. You need separate image optimization service that also did not work for me for local images (images in the bundle).
I wonder if they plan to invest seriously into this?
ALSO the benchmarks show about a one second cold start when importing httpx, fastapi and pydantic... that's faster than Lambda and Cloud Run, thanks to memory snapshots and isolate-based infra.
BUT the default global deployment model raises questions about compliance when you need specific regions... and I'd love to know how well packages with native extensions are supported.