Use One Big Server (2022)
Key topics
The article 'Use One Big Server' (2022) discusses the cost-effectiveness of using a single powerful server instead of multiple smaller ones or cloud services, sparking a debate among commenters about the pros and cons of this approach.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
37m
Peak period
81
0-12h
Avg / period
32
Based on 160 loaded comments
Key moments
- 01Story posted
Aug 31, 2025 at 1:29 PM EDT
4 months ago
Step 01 - 02First comment
Aug 31, 2025 at 2:06 PM EDT
37m after posting
Step 02 - 03Peak activity
81 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 7, 2025 at 7:15 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I'm not saying everybody should do this. There are of-course a lot of services that can't afford even a minute of downtime. But there is also a lot of companies that would benefit from a simpler setup.
I'm not a better engineer, I just have drastically fewer failure modes.
I think you misread OP. "Single point of failure" doesn't mean the only failure modes are hardware failures. It means that if something happens to your nodes whether it's hardware failure or power outage or someone stumbling on your power/network cable, or even having a single service crashing, this means you have a major outage on your hands.
These types of outages are trivially avoided with a basic understanding of well-architected frameworks, which explicitly address the risk represented by single points of failure.
Either way, stuff happens, figuring out what your actual requirements around uptime, time to response, and time to resolution is important before you build a nine nines solution when eight eights is sufficient. :p
Are you serious? Have you ever built/operated/wired rack scale equipment? You think the power cables for your "short" server (vs the longer one being put in) are just hanging out in the back of the rack?
Rack wiring has been done and done correctly for ages. Power cables on one side (if possible), data and other cables on the other side. These are all routed vertically and horizontally, so they land only on YOUR server.
You could put a Mercedes Maybach above/below your server and nothing would happen.
We were their largest customer and they seemed honest even when they made mistakes that seemed silly, so we rolled our eyes and moved on with life.
Managed hosting means accepting that you can't inspect the racks and chide people for not cabling to your satisfaction. And mistakes by the managed host will impact your availability.
Firing a host where you've got thousands of servers is easier said than done. We did do a quote exercise with another provider that could have supported us, and it didn't end up very competitive ... and it wouldn't have been worth the transition. Overall, there were some derpy moments, but I don't think we would have been happier anywhere else, and we didn't want to rent cages and run our own servers.
You're not getting the point. The point is that if you use a single node to host your whole web app, you are creating a system where many failure modes, which otherwise could not even be an issue, can easily trigger high-severity outages.
> and even if, you could just run a provisioned secondary server (...)
Congratulations, you are no longer using "one big server", thus defeating the whole purpose behind this approach and learning the lesson that everyone doing cloud engineering work is already well aware.
References to "elastic Kubernetes whatever" is a red herring. You can have a dead simple load balancer spreading traffic across multiple bare metal nodes.
I'm baffled by your comment. Are you sure you read what I wrote?
Sigh.
In all those years, I’ve had precisely one actual hardware failure: a PSU went out. They’re redundant, so nothing happened, and I replaced it.
Servers are remarkably resilient.
EDIT: 100% uptime modulo power failure. I have a rack UPS, and a generator, but once I discovered the hard way that the UPS batteries couldn’t hold a charge long enough to keep the rack up while I brought the generator online.
We had a rack in data center, and we wanted to put local UPS on critical machines in the rack.
But the data center went on and on about their awesome power grid (shared with a fire station, so no administrative power loss), on site generators, etc., and wouldn't let us.
Sure enough, one day the entire rack went dark.
It was the power strip on the data centers rack that failed. All the backups grids in the world can't get through a dead power strip.
(FYI, family member lost their home due to a power strip, so, again, anecdotally, if you have any older power strips (5-7+ years) sitting under your desk at home, you may want to consider swapping it out for a new one.)
Re: power strips, thanks for the reminder. I’m usually diligent about that, but forgot about one my wife uses. Replacement coming today.
The number of production incidents on our corporate mishmash of lambda, ecs, rds, fargate, ec2, eks etc? It’s a good week when something doesn’t go wrong. Somehow the logging setup is better on the personal stuff too.
Today’s systems don’t fail nearly as often if you use high quality stuff and don’t beat the absolute hell out of SSD. Another trick is to overprovision SSD to allow wear leveling to work better and reduce overall write load.
Do that and a typical box will run years and years with no issues.
Is that more, less than or about the same as having an AWS/Azure/GCP consultant?
What's the difference in labour per hour?
> the risk of having such single point of failure.
At the prices they charge I can have two hot failovers in two other datacenter and still come out ahead.
A big pain point that I personally don't love is that this non-cloud approach normally means running my own database. It's worth considering a provider who also provides cloud databases.
If you go for an 'active/passive' setup, consider saving even more money by using a cloud VM with auto scaling for the 'passive' part.
In terms of pricing the deals available these days on servers are amazing you can get 4GB RAM VPSs with decent CPU and bandwidth for ~$6 or bare metal for ~$90 for 32GB RAM quad core worth using sites like serversearcher.com to compare.
SQLite uses one reader/writer lock over the whole database. When any thread is writing the database, no other thread is reading it. If one thread is waiting to write, new reads can't begin. Additionally, every read transaction starts by checking if the database has changed since last time, and then re-loading a bunch of caches.
This is suitable for SQLite's intended use case. It's most likely not suitable for a server with 256 hardware threads and a 50Gbps network card. You need proper transaction and concurrency control for heavy workloads.
Additionally, SQLite lacks a bunch of integrity checks, like data types and various kinds of constraints. And things like materialised views, etc.
SQLite is lite. Use it for lite things, not hevy things.
SQLite is easily the best scaling DB tech I've used. I've moved all my postgres workloads over to it and the gains have been incredible.
It's not a panacea and not the best in all cases but it's a very sane default that I recommend everyone start with and only complicate their stack with an external DB when they they start hitting real limits (often never happens)
I moved several projects from sqlite to postgres because sqlite didn't scale enough for any of them.
The out of the box defaults for sqlite are terrible for web apps.
Most if not all of your concerns with SQLite are simply a matter of not using the default configuration. Enable WAL mode, enable strict mode, etc. and it's a lot better.
Sqlite (properly configured) will outperform "proper databases" often by an order of magnitude in the context of a single box. You want a single writer for high performance as it lets you batch.
> 256 hardware threads...
Have you tried? I have. Others have too. [1]
> Additionally, SQLite lacks a bunch of integrity checks, like data types and various kinds of constraints. And things like materialised views, etc.
Sqlite has blobs so you can use your own custom encoding which is what you want in a high performance context.
Here's sqlite on a 5$ shared VPS that can handle 10000+ checks per second over a billion checkboxes [2]. You're gonna be fine.
- [1] https://use.expensify.com/blog/scaling-sqlite-to-4m-qps-on-a...
- [2] https://checkboxes.andersmurphy.com
SQLite (actually SQL-ite, like a mineral) maybe be light, but so are many workloads these days. Even 1000 queries per second is quite doable with SQLite and modest hardware, and I've worked at billion dollar businesses handling fewer queries than that.
You can abuse git for it if you really want to cut corners.
Compare that with using your distro's packaged version where you can have version variations, variations in default config or file path locations, etc.
You don't need to buy server hardware(!), the article specifically mentions renting from eg Hetzner.
> The benefits of "just don't think about hardware" are real
Can you explain on this claim, beyond what the article mentioned?
I run a lambda behind a load balancer, hardware dies, its redundant, it gets replaced. I have a database server fail, while it re provisions it doesn't saturate read IO on the SAN causing noisy neighbor issues.
I don't deal with any of it, I don't deal with depreciation, I don't deal with data center maintenance.
You don't deal with that either if you rent a dedicated server from a hosting provider. They handle the datacenter and maintenance for you for a flat monthly fee.
But the cloud premium needs reiteration: twenty five times. For the price of the cloud server, you can have twenty-five-way redundancy.
A medium to large size asteroid can cause mass extinction events - this happens sometimes - it's not a theoretical risk.
The risk of the people responsible for managing the platform messing up and losing some of your data is still a risk in the cloud. This thread has even already had the argument "if the cloud provider goes down, it's not your fault" as a cloud benefit. Either cloud is strong and stable and can't break, or cloud breaks often enough that people will just excuse you for it.
Yes, there is.
Honestly, it looks to me that this school of thought is mostly adopted by people that can't do arithmetic or use a calculator. But it does absolutely exist.
That said, no, servers are not nearly expensive enough to move the needle on a company nowadays. The room that often goes around them is, and that's why way more people rent the room than the servers in it.
I ran the IT side of a media company once, and it all worked on a half-empty rack of hardware in a small closet... except for the servers that needed bandwidth. These were colocated. Until we realized that the hoster did not have enough bandwidth, at which point we migrated to two bare metal servers at Hetzner.
The actual space isn't a big deal, but the entire environment has large fixed costs.
In practice, all that except connectivity is relatively easy to have on-site.
Connectivity is highly dependent on the business location, local providers, their business plans and their willingness to go out of their way to serve the clients.
And I am not talking only about bandwidth, but also reserve lines and latency.
I for one really miss being able to go see the servers that my code runs on. I thought data centers were really interesting places. But I don't see a lot of effort to decide things based on pure dollar cost analysis at this point. There's a lot of other industry forces besides the microeconomics that predetermine people's hosting choices.
Never underestimate the price people are willing to pay to evade responsibility. I estimate this is a multi-billion dollar market.
Yep, and it's mostly caused by the VC funding model - if your investors are demanding hockey-stick growth, there is no way in hell a startup can justify (or pay for) the resulting Capex.
Whereas a nice, stable business with near-linear growth can afford to price in regular small Capex investments.
An IBM z17 is effectively one big server too, but provides levels of reliability that are simply not available in most IT environments. It won't outperform the AMD rack, but it will definitely keep up for most practical workloads.
If you sit down and really think honestly about the cost of engineering your systems to an equivalent level of reliability, you may find the cost of the IBM stack to be competitive in a surprising number of cases.
ETA - fixed spelling error
Now, if you can live with the weird environment and your people know how to programm what is essentially a distributed system described in terms noone else uses: I guess it's still ok, given the competition is all executing IBMs playbook too.
My understanding is that usually you subdivide into few LPARs and then reboot the production ones on schedule to prevent drift and ensure that yes, unplanned IPLs will work
They could have got the job done by hosting the service in a vps with a multi-tenant database schema. Instead, they went about learning kubernetes and drillingg deep into "cloud-native" stack. Spent a year trying to setup the perfect devops pipeline.
Not surprisingly the company went out of business within the next few years.
I mean, of the two, the PaaS route certainly burns more money, the exception being the rare shop that is so incompetent they can't even get their own infrastructure configured correctly, like in GP's situation.
There are guaranteed more shops that would be better off self-hosting and saving on their current massive cloud bills than the rare one-offs that actually save so much time using cloud services, it takes them from bankruptcy to being functional.
Does it? Vercel is $20/month and Neon starts at $5/month. That obviously goes up as you scale up, but $25/month seems like a fairly cheap place to start to me.
(I don't work for Vercel or Neon, just a happy customer)
And that’s before you factor in 500gb of storage.
But the engineers could find new jobs thanks to their acquired k8s experience.
Use one big server - https://news.ycombinator.com/item?id=32319147 - Aug 2022 (585 comments)
And when you need to move fast (or things break), you can't wait a day for a dedicated server to come up, or worse, have your provider run out of capacity (or have to pick a different specced server)
IME, having to go multi cloud/provider is a way worse problem to have.
A multi node system tends to be less reliable and more failure points than a single box system. Failures rarely happen in isolation.
You can do zero downtime deployment with a single machine if you need to.
Just like a lot of problems exists between keyboard and chair, a lot of problems exist between service A and service B.
The zero downtime deployment for my PHP site consisted of symlinking from one directory to another.
Honestly, we need to stop promoting prematurely making everything a network request as a good idea.
But how are all these "distributed systems engineers" going to get their resume points and jobs?
Picking an arbitrary price point of $200/mo, you can get 4(!) vCPUs and 16GB of RAM at AWS. Architectures are different etc., but this is roughly a mid-spec dev laptop of 5 or so years ago.
At Hetzner, you can rent a machine with 48 cores and 128GB of RAM for the same money. It's hard to overstate how far apart these machines are in raw computational capacity.
There are approaches to problems that make sense with 10x the capacity that don't make sense on the much smaller node. Critically, those approaches can sometimes save engineering time that would otherwise go into building a more complex system to manage around artificial constraints.
Yes, there are other factors like durability etc. that need to be designed for. But going the other way, dedicated boxes can deliver more consistent performance without worries of noisy neighbors.
Now, if you actually need to decouple your file storage and make it durable and scalable, or need to dynamically create subdomains, or any number of other things… The effort of learning and integrating different dedicated services at the infrastructure level to run all this seems much more constraining.
I’ve been doing this since before the “Cloud,” and in my view, if you have a project that makes money, cloud costs are a worthwhile investment that will be the last thing that constrains your project. If cloud costs feel too constraining for your project, then perhaps it’s more of a hobby than a business—at least in my experience.
Just thinking about maintaining multiple cluster filesystems and disk arrays—it’s just not what I would want to be doing with most companies’ resources or my time. Maybe it’s like the difference between folks who prefer Arch and setting up Emacs just right, versus those happy with a MacBook. If I felt like changing my kernel scheduler was a constraint, I might recommend Arch; but otherwise, I recommend a MacBook. :)
On the flip side, I’ve also tried to turn a startup idea into a profitable project with no budget, where raw throughput was integral to the idea. In that situation, a dedicated server was absolutely the right choice, saving us thousands of dollars. But the idea did not pan out. If we had gotten more traction, I suspect we would have just vertically scaled for a while. But it’s unusual.
This is because you are looking only at provisioning/deployment. And you are right -- node size does not impact DevOps all that much.
I am looking at the solution space available to the engineers who write the software that ultimately gets deployed on the nodes. And that solution space is different when the nodes have 10x the capability. Yes, cloud providers have tons of aggregate capability. But designing software to run on a fleet of small machines is very different from accomplishing the same tasks on a single large machine.
It would not be controversial to suggest that targeting code at an Apple Watch or Raspberry Pi imposes constraints on developers that do not exist when targeting desktops. I am saying the same dynamic now applies to targeting cloud providers.
This isn't to say there's a single best solution for everything. But there are tradeoffs that are now always apparent. The art is knowing when it makes sense to pay the Cloud Tax, and whether to go 100% Cloud vs some proportion of dedicated.
I’ve never had an issue with moving data.
I think you confuse Heztner with bare metal. Hetzner has Hetzner Cloud which is like AWS EC2 but much cheaper. (They also have bare metal servers which are even cheaper.) With Heztner Cloud, you can use Terraform, Github Actions and whatever else you mentioned.
No network latency between nodes, less memory bandwidth latency/contention as there is in VMs, no caching architecture latency needed when you can just tell e.g. Postgres to use gigs of RAM and then let Linux's disk caching take care of the rest (and not need a separate caching architecture).
If you’re running Postgres locally you can turn off the TCP/IP part; nothing more to audit there.
SSH based copying of backups to a remote server is simple.
If not accessible via network, you can stay on whatever version of Postgres you want.
I’ve heard these arguments since AWS launched, and all that time I’ve been running Postgres (since 2004 actually) and have never encountered all these phantom issues that are claimed as being expensive or extremely difficult.
(what's "medium-size corp" and how did you come up with $100k ?)
It gets even easier now that you have cheap s3 - just upload the dump to s3 every day and set the s3 deletion policy to whatever is feasible for you.
Either way: 1 day of a mid-level developer in the majority of the world (basically: anywhere except Zurich, NYC or SF) is between €208 and €291. (Yearly salary of €50-€70k)
A junior developer's time for setup and the cost of hardware is practically a one-off expense. It's a few days of work at most.
The alternative you're advocating for (a recurring SaaS fee) is a permanent rent trap. That money is gone forever, with no asset or investment to show for it. Over a few years, you'll have spent tens of thousands of dollars for nothing. The real cost is not what you pay a developer; it's what you lose by never owning your tools.
Not sure where I advocated for that. Could you point it out please?
For backups, including Postgres, I was planning on paying Veeam ~$500 a year for a software license to backup the active node and Postgres database to s3/r2. Standby node would be getting streaming updates via logical replication.
There are free options as well but I didn’t want to cheap out on the backups.
It looks pretty turnkey. I am a software engineer not a sysadmin though. Still just theory as well as I haven’t built it out yet
[0] A normal sysadmin remains vaguely bemused at their job title and the way it changes every couple years.
Sometimes even the certified cloud engineers can't tell you why an RDS behaves the way it does, nor can they really fix it. Sometimes you really do need a DBA, but that applies equally to on-prem and cloud.
I'm a sysadmin, but have been labelled and sold as: Consultant (sounds expensive), DevOps engineer, Cloud Engineer, Operations Expert and right now a Site Reliability Engineer.... I'm a systems administrator.
It doesn't need someone who knows how to use the labrythine AWS services and console?
These comments sound super absurd to me, because RDS is difficult as hell to setup, unless you do it very frequently or already have it in IoC format, since one needs setting up a VPC, subnets, security groups, internet gateway, etc.
It's not like creating a DynamoDB, Lambda or S3 where a non-technical person can learn it in a few hours.
Sure, one might find some random Terraform file online to do this or vibe-code some CloudFormation, but that's not really a fair comparison.
RDS has a value. But for many teams the price paid for this value is ridiculously high when compared to other options.
I totally also understand why some people with family to support mortgage to pay they can't just walk way from a job at FAANG or MAMAA type place.
Looking at your comparison, this point it just seems like a scam.
Also in my experience more complex systems tend to have much less reliability/resilience than simple single node systems. Things rarely fail in isolation.
Eh, sort of. The difference is that the cloud can go find other workloads to fill the trough from off peak load. They won’t pay as much as peak load does, but it helps offset the cost of maintaining peak capacity. Your personal big server likely can’t find paying workloads for your troughs.
I also have recently come to the opposite conclusion for my personal home setup. I run a number of services on my home network (media streaming, email, a few personal websites and games I have written, my frigate NVR, etc). I had been thinking about building out a big server for expansion, but after looking into the costs I bought 3 mini pcs instead. They are remarkably powerful for their cost and size, and I am able to spread them around my house to minimize footprint and heat. I just added them all to my home Kubernetes cluster, and now I have capacity and the ability to take nodes down for maintenance and updates. I don’t have to worry about hardware failures as much. I don’t have a giant server heating up one part of my house.
It has been great.
163 more comments available on Hacker News