How Many HTTP Requests/second Can a Single Machine Handle? (2024)
Posted4 months agoActive4 months ago
binaryigor.comTechstory
calmmixed
Debate
70/100
Performance OptimizationScalabilityServer Architecture
Key topics
Performance Optimization
Scalability
Server Architecture
The article explores how many HTTP requests a single machine can handle, sparking a discussion on the limits of single-machine scalability and the trade-offs between simplicity and performance.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
23m
Peak period
48
0-6h
Avg / period
11.2
Comment distribution56 data points
Loading chart...
Based on 56 loaded comments
Key moments
- 01Story posted
Aug 31, 2025 at 2:10 PM EDT
4 months ago
Step 01 - 02First comment
Aug 31, 2025 at 2:33 PM EDT
23m after posting
Step 02 - 03Peak activity
48 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 2, 2025 at 11:40 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45085446Type: storyLast synced: 11/20/2025, 5:57:30 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Many frameworks game those benchmarks. While there are some metrics they are useful for, I've yet to see any production code that's actually stripped down like the gamified versions there.
A low end ARM processor(like a raspberry pi) can crank out 1000 requests a second with a CGI program handing the requests — using a single CPU core. Of course this doesn’t happen by with traditional CGI. (Actual performance with traditional CGI will be more like 20-50/s or worse).
Like the stereotypical drivers of such vehicles, the industry has become so fat and stupid that an x86 system handling 500 requests/sec actually sounds impressive. Sadly, considering the bloated nature of modern stacks, it kinda is.
Honestly, unless you're bandwidth/uplink limited (e.g running a CDN) then a single machine will take you really far.
Also simpler systems tend to have better uptime/reliability. Doesn't get much simpler than a single box.
So when people say 1k is "highload" and requires a whole cluster, I'm not sure what to think of it. You can squeeze so much more out of a single fairly modest machine.
That's the other thing AWS tends to have really dated SSDs.
Honestly, it's like the industry has jumped the shark. 1k is not a lot of load. It's like when people say single writer means you can't be performant, it's the opposite most of the time single writer lets you batch and batching is where the magic happens.
With a test like this, you're really testing two different things:
1. How fast your database is,
2. How fast your frontend is
Since the query is simple, your frontend is basically a DB access layer and should be taking no time. And since the table is indexed the query should also take no time.
The only other interesting question is if the database can handle the number of connections and the storage is. The app is using connection pools, but the actual size of the database machine is never mentioned...which is a problem. How big is the DB instance? A small instance could be crushed with 80 connections. A database on a hard drive may not be able to handle the load either (though since the data volume is small, it could be that everything ends up cached anyway).
So this is sort of interesting, but sort of not interesting.
Both the app and db are hosted on the same machine - they are sharing resources. This fact, type of storage and other details of the setup are contained in this section: https://binaryigor.com/how-many-http-requests-can-a-single-m...
I think you're right that I didn't mention the details of the db connection pool; they are here: https://github.com/BinaryIgor/code-examples/blob/master/sing...
Long story short, there's a Hikari Connection Pool with initial 10 connections, resizable to 20.
Same with db - I wanted to see, what kind of load a system (not just app) deployed to a single machine can handle.
It can be obviously optimized even further, I didn't try to do that in the article
Suppose it takes 0.99s to send REQUESTS_PER_SECOND requests. Then you sleep for 1s. Result: You send REQUESTS_PER_SECOND requests every 1.99s. (If sending the batch of requests could take longer than a second, then the situation gets even worse.)
The issue GP has with app and DB on the same box is a red herring -- that was explicitly the condition under test.
it's fine, we all went thru these gauntlets, but, if you're interested in learning more, take all of this feedback in good faith, and compare/contrast what your tool is doing vs. what sound/valid load testing tools like vegeta and hey and (maybe) k6 do. (but definitely not ab or wrk, which are unsound)
and, furthermore, if the application and DB are co-located on the same machine, you're co-mingling service loads, and definitely not measuring or capturing any kind of useful load numbers, in the end
tl;dr is that these benchmarks/results are ultimately unsound, it's not about optimization, it's about validity
if you want to benchmark the application, then either you (a) mock the DB at as close to 0 cost as you can, or (b) point all application endpoints to the same shared (separate-machine) DB instance, and make sure each benchmark run executes exactly the same set of queries against against a DB instance that is 100% equivalent to the other runs, resetting in-between each run
Tests on the other hand were executed on multiple different machines - it's all described in the article. Sleep works properly, because there's an unbounded thread pool that makes http request - each request has its own virtual thread.
That’s barely more than a raspberry pi? (4 vs 8 cores) Huge machines today have 20+ TBs of RAM and hundreds of cores. Even top-end consumer machines can have 512GB of RAM!
I do agree with the author that single machines can scale far beyond what most orgs / companies need, but I think they may be underestimating how far that goes by orders-of-magnitude
In 8 years, Ryzen went from 1166 geekbench 6 single core to 3398.
Single core perf doubled every 8 years, multicore every 6 years, and GPUs every 3 years !
Is this common? Why not use the local filesystem? Actually, I thought that using anything else beyond the local filesystem for the database is a no-no. Am I missing something?
Block storage is meant to be reliable, so databases go there. Yes it's slower but you don't lose data.
Generally, the only time you want a local database in the cloud is if it's being used for short-lived data meaningful only to that particular instance in time.
Or it can work if your database rarely changes and you make regular backups that are easy to revert to, like for a blog.
Databases with high availability and robust storage were possible before the cloud.
I'm not saying it can't be done. But block storage is built for reliability in a way that ephemeral instances are not. There's a good reason why every guide will tell you to set your database up on block storage rather than an instance's local disk. If your instance fails, just spin up another instantly and reconnect to the same block storage.
Pre-cloud, the equivalent would have been using redundant RAID storage to handle disk failures (easy), before upgrading to replication with an always-running replica (harder).
I know that you can have significantly bigger machines; network-mounted DB storage on the other hand is not slow - it's designed specifically for these kind of use cases
https://en.wikipedia.org/wiki/C10k_problem
also, it always feels like I need a second instance at the very least for redundancy, but then we have to ensure they're stateless and that batch jobs are sharded across them (or only run on one), and again we hit an architecture explosion. Wish that I was more comfortable just dropping a single spring boot instance on a vm and calling it a day; spring boot has a lot of bells and whistles and you can get pretty far without the architecture explosion but it is almost inevitable
obviously at high load (1k TPS+) talking in servers is way cheaper than serverless, so the tradeoff can start to swing
It is not all that hard to hit 10k requests/second on modern hardware. 100k requests/second is achievable with some careful technology choices.
Use One Big Server (2022) - https://news.ycombinator.com/item?id=45085029 - Aug 2025 (61 comments)
A picture would have been worth quite a bit more than a thousand words.
This is an incredibly naive article.