Why Is Everything So Scalable?
Key topics
The article 'Why is everything so scalable?' discusses the trend of prioritizing scalability in software development, often at the expense of simplicity, and the HN discussion debates the merits and pitfalls of this approach.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
5d
Peak period
145
120-132h
Avg / period
53.3
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 9, 2025 at 4:53 AM EDT
3 months ago
Step 01 - 02First comment
Oct 14, 2025 at 7:57 AM EDT
5d after posting
Step 02 - 03Peak activity
145 comments in 120-132h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 15, 2025 at 1:36 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I am not sure this is true. Complexity is a function of architecture. Scalability can be achieved by abstraction, it doesn't necessarily imply highly coupled architecture, in fact scalability benefits from decoupling as much as possible, which effectively reduces complexity.
If you have a simple job to do that fits in an AWS Lambda, why not deploy it that way, scalability is essentially free. But the real advantage is that by writing it as a Lambda you are forced to think of it in stateless terms. On the other hand if suddenly it needs to coordinate with 50 other Lambdas or services, then you have complexity -- usually scalability will suffer in this case, as things become more and more synchronous and interdependent.
> The monolith is composed of separate modules (modules which all run together in the same process).
It's of course great to have a modular architecture, but whether or not they run in the same process should be an implementation detail. Barriers should be explicit. By writing it all depending on local, synchronous, same-process logic, you are likely building in all sorts of implicit barriers that will become hidden dangers when suddenly you do need to scale. And by the way that's one of the reasons we think about scaling in advance, is that when the need comes, it comes quickly.
It's not that you should scale early. But if you're designing a system architecture, I think it's better to think about scaling, not because you need it, but because doing so forces you to modularize, decouple, and make synchronization barriers explicit. If done correctly, this will lead to a better, more robust system even when it's small.
Just like premature optimization -- it's better not to get caught up doing it too early, but you still want to design your system so that you'll be able to do it later when needed, because that time will come, and the opportunity to start over is not going to come as easily as you might imagine.
It should be, but I think "microservices" somehow screwed up that. Many developers think "modular architecture == separate services communicating via HTTP/network that can be swapped", failing to realize you can do exactly what you're talking about. It doesn't really matter what the barrier is, as long as it's clear, and more often than not, network seems to be the default barrier when it doesn't have to be.
But if you want to use off the shelf solutions to your problems it often is. You can't very well do 'from keycloak import login_page'.
What you are describing is already the example of premature optimization. The moment you are thinking of a job in terms of "fits in an AWS Lambda" you are automatically stuck with "Use S3 to store the results" and "use a queue to manage the jobs" decisions.
You don't even know if that job is the bottleneck that needs to scale. For all you know, writing a simple monolithic script to deploy onto a VM/server would be a lot simpler deployment. Just use the ram/filesystem as the cache. Write the results to the filesystem/database. When the time comes to scale you know exactly which parts of your monolith are the bottleneck that need to be split. For all you know - you can simply replicate your monolith, shard the inputs and the scaling is already done. Or just use the DB's replication functionality.
To put things into perspective, even a cheap raspberry pi/entry level cloud VM gives you thousands of postgres queries per second. Most startups I worked at NEVER hit that number. Yet their deployment stories started off with "let's use lambdas, s3, etc..". That's just added complexity. And a lot of bills - if it weren't for the "free cloud credits".
I think the most important one you get is that inputs/outputs must always be < 6mb in size. It makes sense as a limitation for Lambda's scalability, but you will definitely dread it the moment a 6.1mb use case makes sense for your application.
That's equivalent to paying attention in software engineering 101. If you can't get those things right on one machine, you're going to be in world of hurt dealing with something like lambda.
Of course, that’d require CI, which clearly wasn’t working well in your example.
This is the part that is about math as a language for patterns as well as research for finding counter-examples. It’s not an engineering problem yet.
Once you have product market fit, then it becomes and engineering problem.
Cue programmers blaming the product team for "always changing their mind" as they discover what users actually need, and the product team blaming developers for being hesitant to do changes, and when programmers agree, it takes a long time to undo the perfect architecture they've spent weeks fine-tuning against some imaginary future user-base.
Why does that matter? My argument is: Engineer for what you know, leave the rest for when you know better, which isn't before you have lots of users.
That is not a lot. You can host that on a Raspberry Pi.
(16 if you need geo replication.)
I always find these debate weird. How can you compare one app’s TPS with another?
I am worried by the talk of 10k daily users and a peak of 1000TPS being too much premature optimisation. Those numbers are quite low. You should know your expected traffic patterns, add a margin of error, and stress test your system to make sure it can handle the traffic.
I disagree that self-inflicted architectural issues and personnel issues are different.
Instead, they celebrate "learning from running at scale" or some nonsense.
The result is 100% of auth requests timeout once the login queue depth gets above a hundred or so. At that point, the users retry their login attempts, so you need to scale out fast. If you haven't tested scale out, then it's time to implement a bcrypt thread pool, or reimplement your application.
But at least the architecture I described "scales".
You do, in fact, need to scale to trivial numbers of users. You may even need to scale to a small number of users in the near future.
If you have a product that’s being deployed for a new school year, yeah you should be prepared for any one-time load for that time period.
Many products don’t have the “school year just started” spikes. But some do.
It requires careful thought, pragmatism, and business sense to balance everything and achieve the most with the available resources.
they couldn't redeploy to a high-spec VPS instead?
I absolutely agree with your point, but I want to point out, like other commenters here, that the numbers should be much larger. We think that, because 10k daily users is a big deal for a product, they're also a big deal for a small server, but they really aren't.
It's fantastic that our servers nowadays can easily handle multiple tens of thousands of daily users on $100/mo.
This was my initial point :) Don't focus on trying to achieve some metrics, focus on making sure to build the right thing.
Yeah we seem to forget just how fast computers are now a days. Obviously varies with complexity of the app & what other tech you are using, but for simpler things 10k daily users could be handled by a reasonbly powerful desktop sitting under my desk without breaking a sweat.
The modern equivalent challenge is 10 million simultaneous users per machine.
The problem I see is much more about extremely vague notions of scalability, trends, best practices, clean code, and so on. For example we need Kafka, because Kafka is for the big boys like us. Not because the alternatives couldn’t handle the actual numbers.
CV-driven development is a much bigger issue than people picking overly ambitious target numbers.
I basically agree with most of what the author is saying here, and I think that my feeling is that most developers are at least aware that they should resist technical self-pleasure in pursuit of making sure the business/product they're attached to is actually performing. Are there really people out there who still reach for Meta-scale by default? Who start with microservices?
Anecdotally, the last three greenfield projects I was a part of, the Architects (distinct people in every case) began the project along the lines of "let us define the microservices to handle our domains".
Every one of those projects failed, in my opinion not primarily owing to bad technical decisions - but they surely didn't help either by making things harder to pivot, extend and change.
Clean Code ruined a generation of engineers IMO.
At this point in my career, why wouldn't I reach for microservices to supply the endpoints that my frontend calls out to? Microservices are straightforward to implement with NodeJS (or any other language, for that matter.) I get very straightforward tracing and Azure SSO support in NodeJS. For my admin console, I figured I would need one backend-for-frontend microservice that the frontend would connect to and a domain service for each domain that needed to be represented (with only one domain to start). We picked server technologies and frameworks that could easily port to the cloud.
So two microservices to implement a secure admin console from scratch, is that too many? I guess I lack the imagination to do the project differently. I do enjoy the "API First" approach and the way it lets me engage meaningfully with the business folks to come up with a design before we write any code. I like how it's easy to unit/functional test with microservices, very tidy.
Perhaps what makes a lot/most of microservice development so gross is misguided architectural and deployment goals. Like, having a server/cluster per deployed service is insane. I deploy all of my services monolithically until a service has some unique security or scaling needs that require it to separate from the others.
Similarly, it seems common for microservices teams to keep multiple git repos, one for each service. Why?! Some strange separation-of-concerns/purity ideals. Code reuse, testing, pull requests, and atomic releases suffer needless friction unless everything is kept in a monorepo, as the OP implied.
Also, when teams build microservices in such a way that services must call other services completely misses the point of services - that's just creating a distributed monolith (slow!)
I made a rule on my team that the only service type that can call another service is aggregation services like my backend-for-frontend which could launch downstream calls in parallel and aggregate the results for the caller. This made the architecture very flat with the minimum number of network hops and with as much parallelism as possible so it would stay performant. Domain services owned their data sources, no drama with backend data.
I see a lot of distributed monolith drama and abuse of NoSQL data sources giving microservices a bad reputation.
I personally reach for it to outsource some problems by using off the shelf solutions. I don't want to reinvent the wheel. And if everyone else is doing it in a certain way I want to do it in the same way to try to stand on the shoulders of giants and not reinvent everything.
But that's probably the wrong approach then...
I've been running a SaaS for 10 years now. Initially on a single server, after a couple of years moved to a distributed database (RethinkDB) and a 3-server setup, not for "scalability" but to get redundancy and prevent data loss. Haven't felt a need for more servers yet. No microservices, no Kubernetes, no AWS, just plain bare-metal servers managed through ansible.
I guess things look different if you're using somebody else's money.
(Most distributed systems problems are solvable, but only if the person that architected the system knows what they're doing. If they know what they're doing, they won't over-distribute stuff.)
...despite the vast majority of latency issues being extremely low-hanging fruit, like "maybe don't have tens of megabytes of data required to do first paint on your website" or "hey maybe have an index in that database?".
It's just as much about storage and IO and memory and bandwidth.
Different types of sites have completely different resource profiles.
The teams don't talk, and always blame each other
and adds distributed systems and additional organizational problems:
Each team implements one half of dozens of bespoke network protocols, but they still don't talk, and still always blame each other. Also, now they have access to weaponizable uptime and latency metrics, since because each team "owns" the server half of one network endpoint, but not the client half.
Yes, but it's not difficult to do something silly without even noticing until too late. Implicitly (and unintentionally) calling something with the wrong big-O, for example.
That said, anyone know what's up with the slow deletion of Safari history? Clearly O(n), but as shown in this blog post still only deleted at a rate of 22 items in 10 seconds: https://benwheatley.github.io/blog/2025/06/19-15.56.44.html
On a non-scalable system you're going to notice that big-O problem and correct it quickly. On a scalable system you're not going to notice it until you get your AWS bill.
Of course, those people's weekly status reports would always be "we spent all week tracking down a dumb mistake, wrote one line of code and solved a scaling problem we'd hit at 100x our current scale".
That's equivalent to waving a "fire me" flag at the bean counters and any borderline engineering managers.
There's a lot of off the shelf microservices that can solve difficult problems for me. Like keycloak for user management. Isn't that a good reason?
Or Grafana for log visualization?
Should I build that into the monolith too? Or should I just skip it?
Not disagreeing that you can do a lot on a lot less than in the old days, but your story would be much more impactful with that information. :)
Another thing one has to consider is the market size and timeframe window of your SaaS. No sense in building for scalability if the business opportunity is only 100 customers and only for a few years.
The need to accommodate runaway scale (unbounded N and unbounded rate of growth of N) is actually quite rare.
Another perspective is that the defacto purpose of startups (and projects at random companies) may actually be work experience and rehearsal for the day the founders and friends get to interview at an actual FAANG.
I think the author's “dress for the job you want, not the job you have” nails it.
I was but a baby engineer then, and the leads would not countenance anything as pedestrian as MySQL/Postgres.
Anyway, fast forward a bit and we were tasked with building an in-house messaging service. And at that point Mongo's eventual consistency became a roaring problem. Users would get notifications that they had a new message, and then when they tried to read it it was... well... not yet consistent.
We ended up implementing all kinds of ugly UX hacks to work around this, but really we could've run the entire thing off of sqlite on a single box and users would've been able to read messages instantaneously, so...
I feel like that's kind of the other arm of this whole argument: on the one hand, you ain't gonna need that "scalable" thing. On the other hand, the "unscalable" thing scales waaaaaay higher than you are led to believe.
A single primary instance with a few read-only mirrors gets you a reaaaaaaally long way before you have to seriously think about doing something else.
Agreeing with you... Any reasonable database will scale pretty far if you put in a machine with 160 cores and 3 TB of RAM. And that's just a single socket board.
There's no reason to do anything other than get bigger machines until you're near or at the limits of single socket. Dual socket and cpu generations should cover you for long enough to move to something else if you need to. Sharding a traditional database works pretty well in a lot of cases, and it mostly feels like the regular database.
The Postgres database for a company I worked for (that was very concerned about scaling when they interviewed me because their inefficient "nosql" solution was slow) ran very happily on a machine with 2 shared CPU cores and 4GB RAM.
Meanwhile all they needed was... frankly, probably SQLite, for their particular use case, having each client of theirs based around a single portable file actually would have been a big win for them. Their data for each client were tiny, like put-it-all-in-memory-on-an-RPi2 tiny. But no, "it's graphs so we need a graph database! Everything's graphs when you think about it, really! (So says Neo4j's marketing material, anyway)"
And yeah there was ton of those issues but yolo
I don't think I should dress down any further :>
I don't think that necessarily follows. Especially the language choice is almost impossible to change - look at Facebook, Dropbox, etc. Facebook ended up creating an entirely new language that only they use, because it was impossible to rewrite in another language.
Language choice (and probably database choice too) are essentially locked in from the start, and they do affect scaling.
Growing customers is probably harder, but I don't buy "do everything in hacky Bash scripts because you can fix it later". Nor do I think having solid foundations means you need to be less agile. Would Dropbox have been less successful if they wrote their backend in Typescript? I doubt it.
They're just trying to be cool, you see.
Here's the thing, though: Almost every choice that leads to scalability also leads to reliability. These two patterns are effectively interchangeable. Having your infra costs be "$100 per month" (a claim that usually comes with a massive disclaimer, as an aside) but then falling over for a day because your DB server crashed is a really, really bad place to be.
How is that supposed to happen. Without k8 involved somehow?
Empirically, that does not seem to be the case. Large scalable systems also go offline for hours at a time. There are so many more potential points of failure due to the complexity.
And even with a single regular server, it's very easy to keep a live replica backup of the database and point to that if the main one goes down. Which is a common practice. That's not scaling, just redundancy.
Failures are astonishingly, vanishingly rare. Like it's amazing at this point how reliable almost every system is. There are a tiny number of failures at enormous scale operations (almost always due to network misconfigurations, FWIW), but in the grand scheme of things we've architected an outrageously reliable set of platforms.
>That's not scaling, just redundancy.
In practice it almost always is scaling. No one wants to pay for a whole n server just to apply shipped logs to. I mean, the whole premise of this article is that you should get the most out of your spend, so in that case much better is two hot servers. And once you have two hot...why not four, distributed across data centers. And so on.
You and I must be using different sites and different clouds.
There's a reason isitdownrightnow.com exists. And why HN'ers are always complaining about service status pages being hosted on the same services.
By your logic, AWS and Azure should fail once in a millennium, yet they regularly bring down large chunks of the internet.
Literally last week: https://cyberpress.org/microsoft-azure-faces-global-outage-i...
https://www.youtube.com/watch?v=b2F-DItXtZs
15 years ago people were making the same "chasing trends" complaints. In that case there absolutely were people cargo culting, but to still be whining about this a decade and a half later, when it's quite literally just absolutely basic best practices.
Even if you do truly have a microservices architecture, you’ve also now introduced a great deal of complexity, and unless you have some extremely competent infra / SRE folk on staff, that’s going to bite you. I have seen this over and over and over again.
People make these choices because they don’t understand computing fundamentals, let alone distributed systems, but the Medium blogs and ChatGPT have assured them that they do.
But if it was just a monolith and had proper startup checks, when they roll out a new version and it fails, just kill it right there. Leave the old working version up.
Monoliths have their issues too. But doing microservices correctly is quite the job.
Yes, dealing with skew for every single change and hunting down bugs across network boundaries that could have been a function call is peak reliability.
Break your code into modules/components that have a defined interface between them. That interface only passes data - not code with behaviour - and signal the method calls may fail to complete ( ie throw exceptions ).
ie the interface could be a network call in the future.
Allow easy swapping of interface implementations by passing them into constructors/ using factories or dependency injection frameworks if you must.
That's it - you can then start with everything in-process and the rapid development that allows, but if you need to you can add splitting into networked microservices - any complexity that arises from the network aspect is hidden behind the proxy, with the ultimate escape hatch of the exception.
Have I missed something?
Even so it's still very simple.
To scale your auth service you just write a proxy to a remote implementation and pass that in - any load balancing etc is hidden behind that same interface and none of the rest of the code cares.
I like the idea of the remote implementation being proxied -- not sure I've come across that pattern before.
Also, most of these interfaces you'll likely never need. It's a cost of initial development, and the indirection is a cost on maintainability of your code. It's probably (although not certainly) cheaper to refactor to introduce interfaces as needed, rather than always anticipate a need that might never come.
I'm not suggesting that the distributed bit is still coupled behind the scenes ( ie via a data backend that requires distributed transactions ) - the interaction is through the interface.
In the end you are always going to have code calling code - the key point is to assume these key calls are simply data passing, not behaviour passing, and that they can fail.
What else is need to make something network friendly? ( I'm suggesting that things like retries, load-balancing etc can be hidden as a detail in the network implementation - all you need to surface is succeed or fail ).
you get to have new problems that are qualitatively different from before like timeouts, which can break the adsumptions in the rest of your code about say, whether state was updated or not, and in what order. you also then get to deal with thundering herds and circuit breakers and so on.
In terms of timing the call is synchronous and either succeeds or fails - the details like timeouts/ asynch underhood etc are hidden by the proxy - in the end the call succeeds or fails and if you surface that as a synchronous call you hide the underlying complexity from the caller.
A bit like opening a file and writing to it - most platform apis throw exceptions - and your code has to deal with it.
Quite a while ago, before containers were a thing at all, I did systems for some very large porn companies. They were doing streaming video at scale before most, and the only other people working on video at that scale were Youtube.
The general setup for the largest players in that space was haproxy in front of nginx in front of several PHP servers in front of a MySQL database that had one primary r/w with one read only replica. Storage (at that time) was usually done with glusterfs. This was scalable enough at the time for hundreds of thousands of concurrent users, though the video quality was quite a bit lower than what people expect today.
Today at AWS, it is easily possible for people to spend a multiple of the cost of that hardware setup every month for far less compute power and storage.
The only problem is that there is a lot of video data.
I think most people don't realise that "10 million" records is small, for a computer.
(That said, I have had to deal with code that included an O(n^2) de-duplication where the test data had n ~= 20,000, causing app startup to take 20 minutes; the other developer insisted there was no possible way to speed this up, later that day I found the problem, asked the CTO if there was a business reason for that de-duplication, removed the de-duplication, and the following morning's stand-up was "you know that 20 minute startup you said couldn't possibly be sped up? Yeah, well, I sped it up and now it takes 200ms")
Also, it was overwhelmingly likely that none of the elements were duplicates in the first place, and the few exceptions were probably exactly one duplicate.
Most engineers that I've worked with that die on a premature optimization molehill like you describe also make that molehill as complicated as possible. Replacing the inside of the nested loop with a hashtable probe certainly fits the stereotype.
Fair.
To set the scene a bit: the other developer at this point was arrogant, not at all up to date with even the developments of his preferred language, did not listen to or take advice from anyone.
I think a full quarter of my time there was just fire-fighting yet another weird thing he'd done.
> If it was absolutely necessary to get this 1MB dataset to be smaller
It was not, which is why my conversation with the CTO to check on if it was still needed was approximately one or two sentences from each of us. It's possible this might have been important on a previous pivot of the thing, at least one platform shift before I got there, but not when I got to it.
Like I can honestly have trouble listing too many business problems/areas that would fail to scale with their expected user count, given reasonable hardware and technical competence.
Like YouTube and Facebook are absolute outliers. Famously, stackoverflow used to run on a single beefy machine (and the reason they changed their architecture was not due to scaling issues), and "your" startup ain't needing more scale than SO.
Maintaining the media lifecycle, receiving, transcoding, making it available and removing it, is the big task but that's not real-time, it's batch/event processing at best efforts.
The biggest challenges with streaming are maintaining the content catalogue, which aren't just a few million records but rich metadata about the lifecycle and content relationships. Then user management and payments tends to also have a significant overhead, especially when you're talking about international payment processing.
I think people have a warped perception of performance, if only because the cloud providers are serving up a shared VM on equipment I'd practically class as vintage computing. You could throw some of the same parts together from eBay and buy the whole system with less than a few months worth of the hourly on-demand cost.
A common story is that since day one you just have lightweight app servers handling http requests doing 99% I/O. And your app servers can be deployed on a cheap box anywhere since they're just doing I/O. Maybe they're on Google Cloud Run or a small cluster of $5 VPS. You've built them so that they have zero deps on the machine they're running on.
But then one day you need to do some sort of computations.
One incremental option is to create a worker that can sit on a machine that can crunch the tasks and a pipeline to feed it. This can be seen as operationally complex compared to one machine, but it's also simple in other ways.
Another option is to do everything on one beefy server where your app servers just shell out the work on the same machine. This can be operationally simple in some ways, but not necessarily in all ways.
I used to run a webmail system with 2m accounts on hardware with less total capacity (ram, disk, CPU throughput) than my laptop...
What's more: It was a CGI (so new process for every request), and the storage backend spawned separate processes per user.
What's the bandwidth and where can I rent one of these??
1: https://www.hetzner.com/dedicated-rootserver/matrix-ex
2: https://docs.hetzner.com/robot/dedicated-server/network/10g-...
Their prices have come down a lot. I used them when the servers still cost $200 a piece, but their support at the time was fantastic.
This whole thread was a response to
> Today at AWS, it is easily possible for people to spend a multiple of the cost of that hardware setup every month for far less compute power and storage.
suggesting to use a few beefy servers but if we are renting them from cloud we're back where we started.
If you want more control than that, colo is also pretty cheap [1]. But I'd consider that a step above what 95% of people need
https://www.hetzner.com/colocation
also pretty sure 24 cores is like 48 cloud “cores” which are usually just hyper threads right?
[1] https://www.thoughtworks.com/radar/techniques/high-performan...
[2] https://www.thoughtworks.com/radar/techniques/big-data-envy
I laughed. I cried. Having a back full of microservices scars, I can attest that everything said here is true. Just build an effin monolith and get it done.
The turning point might have been Heroku? Prior to Heroku, I think people just assumed you deploy to a VPS. Heroku taught people to stop thinking about the production environment so much.
I think people were so inspired by it and wanted to mimic it for other languages. It got more people curios about AWS.
Ironically, while the point of Heroku was to make deployment easy and done with a single command, the modern deployment story on cloud infrastructure is so complicated most teams need to hold a one hour meeting with several developers "hands on deck" and going through a very manual process.
So it might seem counter intuitive to suggest that the trend was started by Heroku, because the result is the exact opposite of the inspiration.
- scaling vertically is cheaper to develop
- scaling horizontally gets you further.
What is correct for your situation depends on your human, financial and time resources.
210 more comments available on Hacker News