The Future of Python Web Services Looks Gil-Free
Posted3 months agoActive2 months ago
blog.baro.devTechstoryHigh profile
excitedpositive
Debate
60/100
PythonGilFree-ThreadingWeb Services
Key topics
Python
Gil
Free-Threading
Web Services
The article discusses the potential of Python 3.14's free-threading capabilities to improve web service performance, and the community is excitedly discussing the implications and potential challenges.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
6d
Peak period
82
Day 7
Avg / period
19.4
Comment distribution97 data points
Loading chart...
Based on 97 loaded comments
Key moments
- 01Story posted
Oct 19, 2025 at 6:38 AM EDT
3 months ago
Step 01 - 02First comment
Oct 25, 2025 at 8:46 AM EDT
6d after posting
Step 02 - 03Peak activity
82 comments in Day 7
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 29, 2025 at 8:07 AM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45633311Type: storyLast synced: 11/20/2025, 4:47:35 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
but before 3.12 the isolation was not great (and still there are basic process-level things that cannot be non-shared per interpreter)
https://docs.python.org/3/library/concurrent.interpreters.ht...
Please do not complain about the global object. Using a pure function would obviously be a useless benchmark for locking and real world Python code bases have far more intricate access patterns.
Just because there's a lot of shit Python code out there, doesn't mean people who want to write clean, performant Python code should suffer for it.
How to write Python code with no globals
And a source other than you saying you should do it this way
Even though technically, everything in Python is an object, I feel strongly that programmers should avoid OOP in Python like the plague. Every object is a petri dish for state corruption.
Thee is a very solid list of reasons to use pure functions with explicit passing wherever humanly possible, and I personally believe there is no comparable list of reason to use OOP. * Stack-allocated primitives need no refcounting * Immutable structures reduce synchronization * Data locality improves when you pass arrays/structs rather than object graphs * Pure functions can be parallelized without locks
But OOP does not necessitate mutable state; you can do OOP with immutable objects and pure methods (except the constructor). Objects are collections of partially-applied functions (which is explicit in Python) that also conceal internal details within its own namespace. It is convenient in certain cases.
But very rarely.
If you want immutability as a lynchpin, you need to look at a different language. This one is very much not designed for it.
I haven’t seen or used a global more than once in my 20 years of writing Python.
Web services in Python that want to handle multiple comcurrent requests in the same interpreter should be using a web framework that is designed around that expectation and don't use a global request context object, such as FastAPI.
3.9: 2.78
3.14: 3.86
3.14t: 3.91
This is a silly benchmark though. Look at pyperformance if you want something that might represent real script/application performance. Generally 3.14t is about 0.9x the performance of the default build. That depends on a lot of things though.
This benchmark demonstrates that global variables are not needed to find severe regressions.
If you don't want to use global variables just add the result of f to x and stop using the global variable, i.e.
Variables always start with a lowercase letter in idiomatic Python unless they're constants or types.
Using single-letter uppercase for variables is not unusual in ML Python code, but that also happens to be one of the worst ecosystems when it comes to idiomatic Python and general code quality.
Because literally every import, class definition, or function definition that you make at top-level is a global.
Now some people do in fact do all those things inside a function, too, and then call that function as the only thing that actually happens globally. And I've done such hacks myself to squeeze the last few % of perf out of CPython on the very rare occasions where you need to do that but dropping into C is not an option. But that's certainly not idiomatic Python.
I have not been seeing good Python code, for sure. Hopefully it's not a majority. But it's very far from non-existent.
But isn't that the point? Previously, pure functions would all lock, so you could only run one pure function at a time. Now they don't.
The question is whether 1+ thread per core with GIL free Python perform as well as 1+ process per core with GIL.
My understanding is that this global is just a way to demonstrate that the finely grained locking in the GIL free version may make it so that preforking servers may still be more performant.
Have people had any/good experiences running Granian in prod?
https://hugovk.github.io/free-threaded-wheels/
It's nice that someone else recognizes that event loop per thread is the way. I swear if you said this online any time in the past few years people looked at you like you insulted their mother. It's so much easier to manage even before the performance improvements.
If your thread gets stuck, the only recourse you will have is to kill the entire parent process.
Also, sharing memory between processes is very very very slow compared to sharing memory between threads.
Any time you have an arbitrary independent task that can be started and then stopped by the user, you will need processes in Python.
Java is a lot better at concurrency and has a vast library for concurrency primitives like atomic operations (CAS etc.)
Python's strengths lies in its strong ecosystem and ease of use. Many times that overshadows the benefits of Java's competent concurrency features.
Do you mean that there is no valid use-case where you wish to interrupt a blocking thread due to I/O, for example?
Or maybe you think Python can do that. Maybe I'm wrong, but as far as I can tell Python is not good at doing that.
Perhaps my choice of word "kill" confused you, and in that case I should've picked better wording reflecting what I meant. Perhaps you thought I meant people should kill threads/processes instead of fixing buggy CPU-intensive tasks? That's certainly not what I meant.
You don't seem to be very charitable in how you interpret what I'm saying. Instead of a very vague comment, it would've been better if you'd have explained your concern, and given me a chance to understand you.
This is certainly a common case, but killing a thread with a blocking native call on it is a very poor way to do so (not the least because you don't know what locks it might be holding at the moment it gets killed - imagine what happens if that's one of the locks used by low-level heap, for example). The proper way to address it is to use asynchronous I/O APIs that allow for cooperative cancellation. Unfortunately Linux doesn't exactly have a good track record in that department, which is why people do these kinds of hacks. On Windows you get stuff like https://learn.microsoft.com/en-us/windows/win32/fileio/cance....
>The proper way to address it is to use asynchronous I/O APIs that allow for cooperative cancellation.
I completely agree here. Async is the go-to for IO-bound operations and the ability to cancel (sends an exception to the task) is a very useful feature.
Killing threads, as in non-gracefully stopping them is a bad idea regardless of language, and not something I would encourage nor something I do myself.
In Python, if there is CPU-bound long-running routine that need to be executed concurrently, then this can be done using multiprocessing. I'd say, if external resource usage is well-defined then a good way to stop such a task would be to send a sigterm, wait, then send a signal 9 if it hasn't stopped after a grace period. If needed, perform a cleanup afterwards.
The problem with python and threads in my experience is that even a graceful interruption of individual threads can be tedious to get right.
Thanks for the link btw.
https://github.com/python/cpython/pull/119438/files#diff-efe...
This is how much of the standard library has been audited:
https://github.com/python/cpython/issues/116738
The json changes above are in Python 3.15, not the just released 3.14.
The consequences of the C changes not being made are crashes and corruption if unexpected mutation or object freeing happens. Web services are exposed to adversity so be *very* careful.
It would be a big help if CPython released a tool that could at least scan a C code base to detect free threaded issues, and ideally verify it is correct.
Create or extend a list of answers to:
What heuristics predict that code will fail in CPython's nogil "free threaded" mode?
https://docs.python.org/3/howto/free-threading-extensions.ht...
And a dedicated web site:
https://py-free-threading.github.io/
But as an example neither include PySequence_Fast which is in the json.c changes I pointed to. The folks doing the auditing of stdlib do have an idea of what they are looking for, and so would be best suited to keep a list (and tool) up to date with what is needed.
The GIL's guarantees didn't extend to this.
Or, on further reading, maybe it applies to anything that implements `_iadd_` in C. Which does not appear to include native longs: https://github.com/python/cpython/blob/main/Objects/longobje...
The free threaded implementation adds what amounts to individual object locks at the C level (critical sections). This still means developers writing Python code can do whatever they want, and they will not experience corruption or crashes. The base objects have all been updated.
Python is popular because of many extensions written in C, including many in the standard library. Every single piece of that code must be updated to operate correctly in free threaded mode. That is a lot of work and is still in progress in the standard library. But in order to make the free threaded interpreter useful at this point, some have been marked as free thread safe, when that is not the case.
PS For extra fun, learn what the LD_PRELOAD environmental variable does and how it can be used to abuse CPython (or other things that dynamically load shared objects).
The locking is all about reading and writing Python objects. It is not applicable to outside things like external libraries. Python objects are implemented in C code, but Python users do not need to know or care about that.
As a Python user you cannot corrupt or crash things by code you write no matter how hard you try with mutation and concurrency. The locking ensures that. Another way of looking at Python is that it is a friendly syntax for calling code written in C, and that is why people use it - the C code can be where all the performance is, while retaining the ergonomic access.
C code has to opt in to free threading - see my response to this comment
https://news.ycombinator.com/item?id=45706331
It is true that more fine grained locking can end up being done than is strictly necessary, but user's code is loaded at runtime, so you don't know in advance what could be omitted. And this is the beginning of the project - things will get better.
Aside: Yes you can use ctypes to crash things, other compiled languages can be used, concurrency is hard
This has been true forever. Nothing more needs to be said. Please, avoid Python.
On the other hand, I’ve never had issues with Python performance, in 20 years of using it, for all the reasons that have been beaten to death.
It’s great that some people want to do some crazy stuff to CPython, but honestly, don’t hold your breath. Please don’t use Python if Python interpreter performance is your top concern.
The bigger problem is that it teaches people dangerously misguided notions such as "I don't need to synchronize if I work with built-in Python collections". Which, of course, is only true if a single guaranteed-atomic operation on the collection actually corresponds to a single logical atomic operation in your algorithm. What often happens is people start writing code without locks and it works, so they keep doing it until at some point they do something that actually requires locking (like atomic remove from one collection & add to another) without realizing that they have crossed a line.
Interestingly, we've been there before, multiple times even. The original design of Java collections entailed implicit locking on every operation, with the same exact outcome. Then .NET copied that design in its own collections. Both frameworks dropped it pretty fast, though - Java in v1.2 and .NET in v2.0. But, of course, they could do it because the locking was already specific to collections - it wasn't a global lock used for literally every language object, as in Python.
Nit, that's true iff x is a primitive without the volatile modifier. That's not true for a volatile primitive.
it's quite possible to make a python app that requires libraries A and B to be able to be loaded into a free-threaded application, but which doesn't actually do any unsafe operations with them. we need to be able to let people load these libraries, but say: this thing may not be safe, add your own mutexes or whatever
You have to explicitly compile the extension against a free threaded interpreter in order to get that ABI tag in your extension and even be able to load the extension. The extension then has to opt-in to free threading in its initialization.
If it does not opt-in then a message appears saying the GIL has been enabled, and the interpreter continues to run with the GIL.
This may seem a little strange but is helpful. It means the person running Python doesn't have to keep regular and free threaded Python around, and duplicate sets of extensions etc. They can just have the free threaded one, anything loaded that requires the GIL gives you the normal Python behaviour.
What is a little more problematic is that some of the standard library is marked as supporting free threading, even though they still have the audit and update work outstanding.
Also the last time I checked, the compiler thread sanitizers can't work with free threaded Python.
if such a thing were possible, thread coordination would not have those issues in the first place
* Point out using APIs that return borrowed references
* Suggest assertions that critical sections are held when operating on objects
* Suggest alternate APIs
* Recognise code patterns that are similar to those done during the stdlib auditing work
The compiler thread sanitizers didn't work the last time I checked - so get them working.
Edit: A good example of what can be done is Coccinelle used in the Linux kernel which can detect problematic code (locking is way more complex!) as well as apply source transformations. https://www.kernel.org/doc/html/v6.17/dev-tools/coccinelle.h...
That being said I strongly believe that because of the sharp edges on async style code vs proper co-routine-based user threads like go-routines and Java virtual threads Python is still far behind optimal parallelism patterns.
And then there's patently stupid design decisions like using raw slices as collections and the maybe-change-maybe-copy semantics of append() that don't make it easier to reason about shared data when it needs to be shared.
FastAPI might improve your performance by a little but seriously, either PyPy or rewriting into compiled language.
I've found debugging Python quite easy in general, I hope the experience will be great in free-threaded mode as well.