Why Is Everything So Scalable?

Posted3 months agoActive3 months ago

kunley

408 points

370 comments

stavros.ioTechstoryHigh profile

heatedmixed

Debate

80/100

ScalabilitySoftware ArchitectureStartups

Key topics

Scalability

Software Architecture

Startups

The article 'Why is everything so scalable?' discusses the trend of prioritizing scalability in software development, often at the expense of simplicity, and the HN discussion debates the merits and pitfalls of this approach.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

145

120-132h

Avg / period

53.3

Comment distribution160 data points

Loading chart...

Based on 160 loaded comments

Key moments

01Story posted
Oct 9, 2025 at 4:53 AM EDT
3 months ago
Step 01
02First comment
Oct 14, 2025 at 7:57 AM EDT
5d after posting
Step 02
03Peak activity
145 comments in 120-132h
Hottest window of the conversation
Step 03
04Latest activity
Oct 15, 2025 at 1:36 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (370 comments)

Showing 160 comments of 370

radarsat1

3 months ago

3 replies

> scalability needs a whole bunch of complexity

I am not sure this is true. Complexity is a function of architecture. Scalability can be achieved by abstraction, it doesn't necessarily imply highly coupled architecture, in fact scalability benefits from decoupling as much as possible, which effectively reduces complexity.

If you have a simple job to do that fits in an AWS Lambda, why not deploy it that way, scalability is essentially free. But the real advantage is that by writing it as a Lambda you are forced to think of it in stateless terms. On the other hand if suddenly it needs to coordinate with 50 other Lambdas or services, then you have complexity -- usually scalability will suffer in this case, as things become more and more synchronous and interdependent.

> The monolith is composed of separate modules (modules which all run together in the same process).

It's of course great to have a modular architecture, but whether or not they run in the same process should be an implementation detail. Barriers should be explicit. By writing it all depending on local, synchronous, same-process logic, you are likely building in all sorts of implicit barriers that will become hidden dangers when suddenly you do need to scale. And by the way that's one of the reasons we think about scaling in advance, is that when the need comes, it comes quickly.

It's not that you should scale early. But if you're designing a system architecture, I think it's better to think about scaling, not because you need it, but because doing so forces you to modularize, decouple, and make synchronization barriers explicit. If done correctly, this will lead to a better, more robust system even when it's small.

Just like premature optimization -- it's better not to get caught up doing it too early, but you still want to design your system so that you'll be able to do it later when needed, because that time will come, and the opportunity to start over is not going to come as easily as you might imagine.

CaptainOfCoit

3 months ago

1 reply

> It's of course great to have a modular architecture, but whether or not they run in the same process should be an implementation detail

It should be, but I think "microservices" somehow screwed up that. Many developers think "modular architecture == separate services communicating via HTTP/network that can be swapped", failing to realize you can do exactly what you're talking about. It doesn't really matter what the barrier is, as long as it's clear, and more often than not, network seems to be the default barrier when it doesn't have to be.

worldsayshi

3 months ago

> network seems to be the default barrier when it doesn't have to be.

But if you want to use off the shelf solutions to your problems it often is. You can't very well do 'from keycloak import login_page'.

saidinesh5

3 months ago

1 reply

> If you have a simple job to do that fits in an AWS Lambda, why not deploy it that way, scalability is essentially free. But the real advantage is that by writing it as a Lambda you are forced to think of it in stateless terms.

What you are describing is already the example of premature optimization. The moment you are thinking of a job in terms of "fits in an AWS Lambda" you are automatically stuck with "Use S3 to store the results" and "use a queue to manage the jobs" decisions.

You don't even know if that job is the bottleneck that needs to scale. For all you know, writing a simple monolithic script to deploy onto a VM/server would be a lot simpler deployment. Just use the ram/filesystem as the cache. Write the results to the filesystem/database. When the time comes to scale you know exactly which parts of your monolith are the bottleneck that need to be split. For all you know - you can simply replicate your monolith, shard the inputs and the scaling is already done. Or just use the DB's replication functionality.

To put things into perspective, even a cheap raspberry pi/entry level cloud VM gives you thousands of postgres queries per second. Most startups I worked at NEVER hit that number. Yet their deployment stories started off with "let's use lambdas, s3, etc..". That's just added complexity. And a lot of bills - if it weren't for the "free cloud credits".

bpicolo

3 months ago

1 reply

> The moment you are thinking of a job in terms of "fits in an AWS Lambda" you are automatically stuck with "Use S3 to store the results" and "use a queue to manage the jobs" decisions.

I think the most important one you get is that inputs/outputs must always be < 6mb in size. It makes sense as a limitation for Lambda's scalability, but you will definitely dread it the moment a 6.1mb use case makes sense for your application.

hedora

3 months ago

1 reply

The counterargument to this point is also incredibly weak: It forces you to have clean interfaces to your functions, and to think about where the application state lives, and how it's passed around inside your application.

That's equivalent to paying attention in software engineering 101. If you can't get those things right on one machine, you're going to be in world of hurt dealing with something like lambda.

daxfohl

3 months ago

1 reply

I'd say the real advantage is that if you need to change it you don't have to deploy your monolith. Of course, the relative benefit of that is situationally dependent, but I was recently burned by a team that built a new replication handler we needed into their monolith, and every time it had a bug, and the monolith only got deployed once a week. I begged them to put it into a lambda but every week was "we'll get it right next week", for months. So it does happen.

hedora

3 months ago

1 reply

That’s orthogonal to microservices. They could deploy the monolith multiple times a day.

Of course, that’d require CI, which clearly wasn’t working well in your example.

daxfohl

3 months ago

Correct. It's orthogonal in an ideal world, but in a real world where there's tech debt and competing priorities, it can very much come into play.

dapperdrake

3 months ago

The complexity that makes money is all the essential complexity of the problem domain. The "complexity in the architecture" can only add to that (and often does).

This is the part that is about math as a language for patterns as well as research for finding counter-examples. It’s not an engineering problem yet.

Once you have product market fit, then it becomes and engineering problem.

CaptainOfCoit

3 months ago

10 replies

I've seen startups killed because of one or two "influential" programmers deciding they need to start architecturing the project for 1000TPS and 10K daily users, as "that's the proper way to build scalable software", while the project itself hasn't even found product-market fit yet and barely has users. Inevitably, the project needs to make a drastic change which now is so painful to do because it no longer fits the perfect vision the lead(s) had.

Cue programmers blaming the product team for "always changing their mind" as they discover what users actually need, and the product team blaming developers for being hesitant to do changes, and when programmers agree, it takes a long time to undo the perfect architecture they've spent weeks fine-tuning against some imaginary future user-base.

the8472

3 months ago

3 replies

1000TPS isn't that much? Engineer for low latency and with a 10ms budget that'd be 10 cores if it were CPU-bound, less in practice since usually part of the time is spent in IO wait.

CaptainOfCoit

3 months ago

1 reply

> 1000TPS isn't that much?

Why does that matter? My argument is: Engineer for what you know, leave the rest for when you know better, which isn't before you have lots of users.

the8472

3 months ago

What I'm saying is that "building for 1000TPS" is not what gets you an overengineered 5-layer microservice architecture. If you build for a good user experience (which includes low latency) you get that not-that-big scale without sharding.

hedora

3 months ago

I doubt much time would be in I/O wait if this was really a scale up architecture. Ignoring the 100's of GB of page cache, it should be sitting on NVMe drives, where a write is just a PCIe round trip, and a read is < 1ms.

drob518

3 months ago

And with CPUs now being shipped with 100+ cores, you can brute force that sucker a long way.

otabdeveloper4

3 months ago

2 replies

> 1000TPS and 10K daily users

That is not a lot. You can host that on a Raspberry Pi.

pja

3 months ago

2 replies

Not if you’re going to be “web scale” (tm) you can’t.

moffkalast

3 months ago

You put one Mongo shard on each Pi, they are the secret ingredient in the web scale sauce.

hedora

3 months ago

You can host it on 8 raspberry pi's: Three for etcd, three for minio/ceph, and two for Kubernetes workers.

(16 if you need geo replication.)

byroot

3 months ago

That entirely depends on what these transactions are meant to do.

I always find these debate weird. How can you compare one app’s TPS with another?

strken

3 months ago

6 replies

I've seen senior engineers get fired and the business suffer a setback because they didn't have any way to scale beyond a single low spec VPS from a budget provider, and their system crashed when a hall full of students tried to sign up together during a demo and each triggered 200ms of bcrypt CPU activity.

sgarland

3 months ago

1 reply

That’s a skill issue, not an indictment on the limitations of the architecture. You can spin up N servers and load-balance them, as TFA points out. If the server is a snowflake and has nothing in IaC, again, not an architectural issue, but a personnel / knowledge issue.

strken

3 months ago

The architecture in TFA is fine, and sounds preferable to microservices for most use cases.

I am worried by the talk of 10k daily users and a peak of 1000TPS being too much premature optimisation. Those numbers are quite low. You should know your expected traffic patterns, add a margin of error, and stress test your system to make sure it can handle the traffic.

I disagree that self-inflicted architectural issues and personnel issues are different.

kunleyAuthor

3 months ago

1 reply

I frankly don't believe that in a workplace where an userbase can be characterized as a "hall full of students" anyone was fired overnight. Doesn't happen at these places. Reprimanded, maybe

hedora

3 months ago

More frequently, anyone that sounded the alarm about this was let go months ago, so the one that'd be fired is the one in charge of the firing.

Instead, they celebrate "learning from running at scale" or some nonsense.

nasmorn

3 months ago

2 replies

This seems weird. I have a lot of experience with rails which is considered super slow. But the scenario you describe is trivial. Just get a bigger VPS and change a single env var. even if you fucked up everything else like file storage etc you can still to that. If you build your whole application in way where you can’t scale anything you should be fired. That is not even that easy

hedora

3 months ago

2 replies

People screw up the bcrypt thing all the time. Pick a single threaded server stack (and run on one core, because Kubernetes), then configure bcrypt so brute forcing 8 character passwords is slow on an A100. Configure kubernetes to run on a medium range CPU because you have no load. Finally, leave your cloud provider's HTTP proxy's timeout set to default.

The result is 100% of auth requests timeout once the login queue depth gets above a hundred or so. At that point, the users retry their login attempts, so you need to scale out fast. If you haven't tested scale out, then it's time to implement a bcrypt thread pool, or reimplement your application.

But at least the architecture I described "scales".

ericwood

3 months ago

Fond memories of a job circa 2013 on a very large Rails app where CI times were sped up by a factor of 10 when someone realized bcrypt was misconfigured when running tests and slowing things down every time a user was created through a factory.

achierius

3 months ago

"because Kubernetes"? Is this assuming that you're running your server inside of a Kubernetes instance (and if so, is Kubernetes going to have problems with more than one thread?), or is there some other reason why it comes into this?

strken

3 months ago

1 reply

Of course you should be fired for doing that! I meant the example as an illustration of how "you don't need to scale" thinking turns into A-grade bullshit.

You do, in fact, need to scale to trivial numbers of users. You may even need to scale to a small number of users in the near future.

g8oz

3 months ago

1 reply

I'm not seeing how your example proves that a beefy server/cloud free architecture cannot handle the workload that most companies will encounter. The example you give of an under specified VPS is not what is being discussed in the article.

strken

3 months ago

I was responding to CaptainOfColt, who was writing about premature optimisation killing companies. The article's proposed architecture seems fine and is similar to things I've done, but it's not an excuse to completely avoid thinking about future traffic patterns.

esafak

3 months ago

1 reply

I will never forget the time my university's home-grown Web-based registration system crashed at the beginning of the semester, and the entirety of the university's student body had to form a line in order to have their registration entered manually. I waited a whole day, and they did not get round to me by night, so I had to wait the next day too.

dwaltrip

3 months ago

“Knowing what’s reasonable” matters.

If you have a product that’s being deployed for a new school year, yeah you should be prepared for any one-time load for that time period.

Many products don’t have the “school year just started” spikes. But some do.

It requires careful thought, pragmatism, and business sense to balance everything and achieve the most with the available resources.

ipsento606

3 months ago

> they didn't have any way to scale beyond a single low spec VPS from a budget provider

they couldn't redeploy to a high-spec VPS instead?

CaptainOfCoit

3 months ago

Wonder which one happens more often? Personally I haven't worked in that kind of "find the person to blame" culture which would led to something like that, so I haven't witnessed what you're talking about, but I believe you it does happen in some places.

stavros

3 months ago

2 replies

> 1000TPS and 10K daily users

I absolutely agree with your point, but I want to point out, like other commenters here, that the numbers should be much larger. We think that, because 10k daily users is a big deal for a product, they're also a big deal for a small server, but they really aren't.

It's fantastic that our servers nowadays can easily handle multiple tens of thousands of daily users on $100/mo.

hamdingers

3 months ago

1 reply

Users/TPS aren't the right metric in the first place. I have a webhook glue side project that I didn't even realize had ~8k daily users/~300tps until I set up Cloudflare analytics. As a go program doing trivial work, the load is dwarfed by the cpu/memory usage of all my seedbox related software (which has 1 user, not even every day).

CaptainOfCoit

3 months ago

> Users/TPS aren't the right metric in the first place.

This was my initial point :) Don't focus on trying to achieve some metrics, focus on making sure to build the right thing.

thewebguyd

3 months ago

> We think that, because 10k daily users is a big deal for a product, they're also a big deal for a small server, but they really aren't.

Yeah we seem to forget just how fast computers are now a days. Obviously varies with complexity of the app & what other tech you are using, but for simpler things 10k daily users could be handled by a reasonbly powerful desktop sitting under my desk without breaking a sweat.

jstimpfle

3 months ago

1 reply

Something that does not scale to 10k users is likely so badly architected, it would be faster to iterate on it if it was more scalable hence better architected and more maintainable.

o11c

3 months ago

For reference, in 1999 10K was still considered a (doable) challenge ... but they were talking "simultaneous" not "per day".

The modern equivalent challenge is 10 million simultaneous users per machine.

hamburglar

3 months ago

1 reply

I was part of a small team that built a $300M company on Ruby and MySQL that made every scaling mistake you can possibly make. This was also the right decision because it forced us to stay lean and focus on what we needed right now, as opposed to getting starry-eyed about what it was going to be like when we had 10 million users. At every order of magnitude, we had sudden emergencies where some new part of the system had become a bottleneck, and we scrambled like crazy to rearchitect things to accommodate. It was hard, and it was fun. And it was frugal. We eventually hit over 10 million users before I left, and I can’t say I regret the painful approach one bit.

stavros

3 months ago

1 reply

I also imagine you were pretty agile by not having tons of complexity to grapple with every time you wanted to add a new feature.

hamburglar

3 months ago

Haha, we were tech debt city. It certainly didn’t feel agile.

th0ma5

3 months ago

You simply can't get the software or support for a lot of smaller solutions. It can be sometimes easier to do the seemingly more difficult thing, and sometimes because all the money goes to those more difficult seeming technical problems and solutions.

smoe

3 months ago

In my opinion, if those influential programmers actually architected around some concrete metrics like 1,000 TPS and 10K daily users, they would end up with much simpler systems.

The problem I see is much more about extremely vague notions of scalability, trends, best practices, clean code, and so on. For example we need Kafka, because Kafka is for the big boys like us. Not because the alternatives couldn’t handle the actual numbers.

CV-driven development is a much bigger issue than people picking overly ambitious target numbers.

throwaway894345

3 months ago

On the flip side, I've seen a project fail because it was built on the unvalidated assumption that the naive architecture would scale to real world loads only to find that a modest real world workload was exceeding targets by a factor of 100X. You really do need technical leadership with good judgment and experience; we can't substitute it with facile "assume low scale" or "assume large scale" axioms.

systems

3 months ago

Clearly this project failed for either

  1. scaling for a very specific use case, or because
  2. it hasn't even found product-market fit

Blaming the failure or designing for scale seem misplaced, you can scale while remaining agile and open to change

acron0

3 months ago

4 replies

Ugh, there is just something so satisfying about developer cynicism. It gives me that warm, fuzzy feeling.

I basically agree with most of what the author is saying here, and I think that my feeling is that most developers are at least aware that they should resist technical self-pleasure in pursuit of making sure the business/product they're attached to is actually performing. Are there really people out there who still reach for Meta-scale by default? Who start with microservices?

lpapez

3 months ago

1 reply

> Are there really people out there who still reach for Meta-scale by default? Who start with microservices?

Anecdotally, the last three greenfield projects I was a part of, the Architects (distinct people in every case) began the project along the lines of "let us define the microservices to handle our domains".

Every one of those projects failed, in my opinion not primarily owing to bad technical decisions - but they surely didn't help either by making things harder to pivot, extend and change.

Clean Code ruined a generation of engineers IMO.

robertlagrant

3 months ago

1 reply

I think this sounds more like Domain Driven Design than Clean Code.

ahoka

3 months ago

1 reply

It kinda started with Clean Code. I remember some old colleagues walking around with the book in their hand and deleting ten year old comments in every commit they made: "You see, we don't need that anymore, because the code describes itself". It made a generation (generations?) of software developers think that all the architectural patterns were found now, we can finally do real engineering and just have to find the one that fits for the problem at hand! Everyone asked the SOLID principles during interviews, because that's how real engineers design! I think "cargo cult" was getting used at that time too to describe this phenomenon.

sarchertech

3 months ago

It was (is) bad. The worst part is they the majority of people pushing it haven’t even read Clean Code. They’ve read a blog post by a guy who read a blog post by a guy who skimmed the book.

dwoldrich

3 months ago

I needed to build an internal admin console, not super-scalable, just a handful of business users to start. The SQL database it would access was on-premises, but might move to the cloud in future. Authorized users needed single sign-on to their Azure-based active directory accounts for login. I wanted to do tracing of user requests with OpenTelemetry or something like.

At this point in my career, why wouldn't I reach for microservices to supply the endpoints that my frontend calls out to? Microservices are straightforward to implement with NodeJS (or any other language, for that matter.) I get very straightforward tracing and Azure SSO support in NodeJS. For my admin console, I figured I would need one backend-for-frontend microservice that the frontend would connect to and a domain service for each domain that needed to be represented (with only one domain to start). We picked server technologies and frameworks that could easily port to the cloud.

So two microservices to implement a secure admin console from scratch, is that too many? I guess I lack the imagination to do the project differently. I do enjoy the "API First" approach and the way it lets me engage meaningfully with the business folks to come up with a design before we write any code. I like how it's easy to unit/functional test with microservices, very tidy.

Perhaps what makes a lot/most of microservice development so gross is misguided architectural and deployment goals. Like, having a server/cluster per deployed service is insane. I deploy all of my services monolithically until a service has some unique security or scaling needs that require it to separate from the others.

Similarly, it seems common for microservices teams to keep multiple git repos, one for each service. Why?! Some strange separation-of-concerns/purity ideals. Code reuse, testing, pull requests, and atomic releases suffer needless friction unless everything is kept in a monorepo, as the OP implied.

Also, when teams build microservices in such a way that services must call other services completely misses the point of services - that's just creating a distributed monolith (slow!)

I made a rule on my team that the only service type that can call another service is aggregation services like my backend-for-frontend which could launch downstream calls in parallel and aggregate the results for the caller. This made the architecture very flat with the minimum number of network hops and with as much parallelism as possible so it would stay performant. Domain services owned their data sources, no drama with backend data.

I see a lot of distributed monolith drama and abuse of NoSQL data sources giving microservices a bad reputation.

wilkommen

3 months ago

Yes, there are still people who start with microservices, unfortunately. There are where I work.

worldsayshi

3 months ago

I don't buy the idea that people mainly reach for microservices for scalability or "pleasure" reasons though.

I personally reach for it to outsource some problems by using off the shelf solutions. I don't want to reinvent the wheel. And if everyone else is doing it in a certain way I want to do it in the same way to try to stand on the shoulders of giants and not reinvent everything.

But that's probably the wrong approach then...

jwr

3 months ago

7 replies

I don't get this scalability craze either. Computers are stupid fast these days and unless you are doing something silly, it's difficult to run into CPU speed limitations.

I've been running a SaaS for 10 years now. Initially on a single server, after a couple of years moved to a distributed database (RethinkDB) and a 3-server setup, not for "scalability" but to get redundancy and prevent data loss. Haven't felt a need for more servers yet. No microservices, no Kubernetes, no AWS, just plain bare-metal servers managed through ansible.

I guess things look different if you're using somebody else's money.

drob518

3 months ago

2 replies

One of the silliest things you can do to cripple your performance is build something that is artificially over distributed, injecting lots of network delays between components, all of which have to be transited to fulfill a single user request. Monoliths are fast. Yes, sometimes you absolutely have to break something into a standalone service, but that’s rare.

hedora

3 months ago

2 replies

I've notice a strong correlation between artificially over-distributing, and not understanding things like the CAP theorem. So, you end up with a slow system that's added a bunch of unsolvable distributed systems problems on its fast path.

(Most distributed systems problems are solvable, but only if the person that architected the system knows what they're doing. If they know what they're doing, they won't over-distribute stuff.)

Groxx

3 months ago

1 reply

You can solve just about any distributed systems problem by accepting latency, but nobody wants to accept latency :)

...despite the vast majority of latency issues being extremely low-hanging fruit, like "maybe don't have tens of megabytes of data required to do first paint on your website" or "hey maybe have an index in that database?".

hedora

3 months ago

Well, yeah, but the people that create the issues typically solve them by just corrupting the crap out of app state and adding manual ops procedures.

drob518

3 months ago

Yes, that too. If you look at the commits for Heisenbugs associated with the system, you have a good chance of seeing artificial waits injected to “fix” things.

gowld

3 months ago

There's no need to deploy separate service on separate machines.

crazygringo

3 months ago

1 reply

Scalability isn't just about CPU.

It's just as much about storage and IO and memory and bandwidth.

Different types of sites have completely different resource profiles.

sreekanth850

3 months ago

1 reply

Microservice is not a solution for scalability. There are multiple options for building scalable software, even a monolith or a modular monolith with proper loadbalanced setup will drastically reduce the complexity of microservice and get massive scale. Only bottleneck will be db.

hedora

3 months ago

Microservices take an organizational problem:

The teams don't talk, and always blame each other

and adds distributed systems and additional organizational problems:

Each team implements one half of dozens of bespoke network protocols, but they still don't talk, and still always blame each other. Also, now they have access to weaponizable uptime and latency metrics, since because each team "owns" the server half of one network endpoint, but not the client half.

ben_w

3 months ago

1 reply

> unless you are doing something silly, it's difficult to run into CPU speed limitations.

Yes, but it's not difficult to do something silly without even noticing until too late. Implicitly (and unintentionally) calling something with the wrong big-O, for example.

That said, anyone know what's up with the slow deletion of Safari history? Clearly O(n), but as shown in this blog post still only deleted at a rate of 22 items in 10 seconds: https://benwheatley.github.io/blog/2025/06/19-15.56.44.html

phkahler

3 months ago

1 reply

>> Yes, but it's not difficult to do something silly without even noticing until too late. Implicitly (and unintentionally) calling something with the wrong big-O, for example.

On a non-scalable system you're going to notice that big-O problem and correct it quickly. On a scalable system you're not going to notice it until you get your AWS bill.

hedora

3 months ago

Also, instead of having a small team of people to fight scalable infrastructure configuration, you could put 1-2 full time engineers on performance engineering. They'd find big-O and constant factor problems way before they mattered in production.

Of course, those people's weekly status reports would always be "we spent all week tracking down a dumb mistake, wrote one line of code and solved a scaling problem we'd hit at 100x our current scale".

That's equivalent to waving a "fire me" flag at the bean counters and any borderline engineering managers.

JackSlateur

3 months ago

1 reply

Is it about scalability, or about resiliency ?

worldsayshi

3 months ago

Or is it about outsourcing problems?

There's a lot of off the shelf microservices that can solve difficult problems for me. Like keycloak for user management. Isn't that a good reason?

Or Grafana for log visualization?

Should I build that into the monolith too? Or should I just skip it?

floating-io

3 months ago

For how many users, and at what transaction rate?

Not disagreeing that you can do a lot on a lot less than in the old days, but your story would be much more impactful with that information. :)

intrasight

3 months ago

I ran a SaaS for 10 years. Two products. Profitable from day 1 as customers paid $500/month and it ran on a couple of EC2 instances as well as a small RDS database.

Another thing one has to consider is the market size and timeframe window of your SaaS. No sense in building for scalability if the business opportunity is only 100 customers and only for a few years.

danielmarkbruce

3 months ago

It's not about scalability. It's about copying what the leaders in a space do, regardless of whether it makes sense or not. It's pervasive in most areas of life.

_ZeD_

3 months ago

2 replies

Honestly, in my experience, the only good reason to have microservices in a "software solution" is to be able to match 1 service -> 1 mantainer/team and have a big (read "nested", with multiple level of middle-managers) group of teams, each that may have different goals. In this way it's very easy to "map" a manager/team to a "place" in the solution map, with very explicit and documented interactions between them

hsn915

3 months ago

1 reply

Nice story but the places I've seen that make use of services, there's never a "1 server -> 1 team". It's more like 20 services distributed among 3 teams, and some services are "shared" by all teams

rvitorper

3 months ago

I can relate to this

nycdotnet

3 months ago

This is Conway’s Law: You ship your org chart.

abujazar

3 months ago

1 reply

I've seen my share of insanely over-engineered Azure locked-in applications that could easily have been run on an open source stack on a $20 VM.

hedora

3 months ago

3 replies

But what if payroll grows to 100M internal users?

WJW

3 months ago

Seems like a great problem to have. Surely one of those millions of employees can be used to change the current system at that point. Until then, no reason to overengineer it.

roncesvalles

3 months ago

I sense sarcasm but just to add on, most software problems actually have very predictable natural upper bounds. There are only X number of people in the world growing at a well-defined rate. There are only X number of people in your country, or in your TAM (e.g. if you're a restaurant) etc. This is especially true for B2B.

The need to accommodate runaway scale (unbounded N and unbounded rate of growth of N) is actually quite rare.

abujazar

3 months ago

Yea, and what if my country suddenly grows by 10000%?

yobbo

3 months ago

4 replies

Many startup business models have no chance of becoming profitable unless they reach a certain scale, but they might have less than 1% probability of reaching that scale. Making it scalable is easy work since it is deterministic, but growing customers is not.

Another perspective is that the defacto purpose of startups (and projects at random companies) may actually be work experience and rehearsal for the day the founders and friends get to interview at an actual FAANG.

I think the author's “dress for the job you want, not the job you have” nails it.

nicoburns

3 months ago

1 reply

I guess the work is deterministic, but it often (unintentionally) makes the systems being developed non-deterministic!

potatolicious

3 months ago

3 replies

Ah yes. I once worked at a startup that insisted on Mongo despite not having anywhere near the data volume for it to make any sense at all. Like, we're talking 5 orders of magnitude off of what one would reasonably expect to need a Mongo deployment.

I was but a baby engineer then, and the leads would not countenance anything as pedestrian as MySQL/Postgres.

Anyway, fast forward a bit and we were tasked with building an in-house messaging service. And at that point Mongo's eventual consistency became a roaring problem. Users would get notifications that they had a new message, and then when they tried to read it it was... well... not yet consistent.

We ended up implementing all kinds of ugly UX hacks to work around this, but really we could've run the entire thing off of sqlite on a single box and users would've been able to read messages instantaneously, so...

nicoburns

3 months ago

1 reply

I've seen similar with Firebase. Luckily I took over as tech lead at this company, so I was able to migrate us to Postgres. Amusingly, as well as being more reliable, the Postgres version (on a single small database instance) was also much faster than the previous Firebase-based version (due to it enabling JOINs in the database rather than in application code).

potatolicious

3 months ago

1 reply

Funnily enough prior to this startup I had worked at a rainforest-themed big tech co where we ran all kinds of stuff on MySQL without issue, at scales that dwarfed what this startup was up to by 3-4 orders of magnitude.

I feel like that's kind of the other arm of this whole argument: on the one hand, you ain't gonna need that "scalable" thing. On the other hand, the "unscalable" thing scales waaaaaay higher than you are led to believe.

A single primary instance with a few read-only mirrors gets you a reaaaaaaally long way before you have to seriously think about doing something else.

toast0

3 months ago

1 reply

> On the other hand, the "unscalable" thing scales waaaaaay higher than you are led to believe.

Agreeing with you... Any reasonable database will scale pretty far if you put in a machine with 160 cores and 3 TB of RAM. And that's just a single socket board.

There's no reason to do anything other than get bigger machines until you're near or at the limits of single socket. Dual socket and cpu generations should cover you for long enough to move to something else if you need to. Sharding a traditional database works pretty well in a lot of cases, and it mostly feels like the regular database.

nicoburns

3 months ago

That, and a lot of companies don't have the scale they think they have.

The Postgres database for a company I worked for (that was very concerned about scaling when they interviewed me because their inefficient "nosql" solution was slow) ran very happily on a machine with 2 shared CPU cores and 4GB RAM.

walkabout

3 months ago

I watched a company you've probably heard of burn stupid amounts of money because one guy there was trying to build a personal brand as a Graph Database Expert, and another had fallen hard for Neo4j's marketing. Stability issues, stupid bugs, weak featureset, mediocre performance for most of the stuff they wanted to do (Neo4j, at least at this time, was tuned to perform some graph-related operations very fast, but it was extremely easy to find other graph-related operations that it's terrible at, and they're weren't exactly obscure things) all stretching out project development times to like 2x what they needed to be, with absolutely zero benefits for it. So fucking dumb.

Meanwhile all they needed was... frankly, probably SQLite, for their particular use case, having each client of theirs based around a single portable file actually would have been a big win for them. Their data for each client were tiny, like put-it-all-in-memory-on-an-RPi2 tiny. But no, "it's graphs so we need a graph database! Everything's graphs when you think about it, really! (So says Neo4j's marketing material, anyway)"

andoando

3 months ago

Dawg tinder was operating at 20M DAU with all its DB based on Dynamo. Probably still is.

And yeah there was ton of those issues but yolo

ahartmetz

3 months ago

>“dress for the job you want, not the job you have”

I don't think I should dress down any further :>

stavros

3 months ago

Unfortunately, you can't really get experience from solving hypothetical problems. The actual problems you'll encounter are different, and while you can get experience in a particular "scalable" stack, it won't be worth its maintenance cost for a company that doesn't need it.

IshKebab

3 months ago

> Making it scalable is easy work since it is deterministic

I don't think that necessarily follows. Especially the language choice is almost impossible to change - look at Facebook, Dropbox, etc. Facebook ended up creating an entirely new language that only they use, because it was impossible to rewrite in another language.

Language choice (and probably database choice too) are essentially locked in from the start, and they do affect scaling.

Growing customers is probably harder, but I don't buy "do everything in hacky Bash scripts because you can fix it later". Nor do I think having solid foundations means you need to be less agile. Would Dropbox have been less successful if they wrote their backend in Typescript? I doubt it.

llm_nerd

3 months ago

5 replies

This piece is written with a pretty cliche dismissive tone that assumes that everything everyone else does is driven by cargo-culting if not outright ignorance. That people make these choices because they're just rushing to chase the latest trend.

They're just trying to be cool, you see.

Here's the thing, though: Almost every choice that leads to scalability also leads to reliability. These two patterns are effectively interchangeable. Having your infra costs be "$100 per month" (a claim that usually comes with a massive disclaimer, as an aside) but then falling over for a day because your DB server crashed is a really, really bad place to be.

blueflow

3 months ago

1 reply

> Here's the thing, though: Almost every choice that leads to scalability also leads to reliability.

How is that supposed to happen. Without k8 involved somehow?

97nomad

3 months ago

There is a lot of instruments, that don't need k8s to be scalable and reliable. Starting from stateless services and simple load balancers and ending with actor systems like in Erlang or Akka.

crazygringo

3 months ago

1 reply

> Almost every choice that leads to scalability also leads to reliability.

Empirically, that does not seem to be the case. Large scalable systems also go offline for hours at a time. There are so many more potential points of failure due to the complexity.

And even with a single regular server, it's very easy to keep a live replica backup of the database and point to that if the main one goes down. Which is a common practice. That's not scaling, just redundancy.

llm_nerd

3 months ago

1 reply

>Empirically, that does not seem to be the case.

Failures are astonishingly, vanishingly rare. Like it's amazing at this point how reliable almost every system is. There are a tiny number of failures at enormous scale operations (almost always due to network misconfigurations, FWIW), but in the grand scheme of things we've architected an outrageously reliable set of platforms.

>That's not scaling, just redundancy.

In practice it almost always is scaling. No one wants to pay for a whole n server just to apply shipped logs to. I mean, the whole premise of this article is that you should get the most out of your spend, so in that case much better is two hot servers. And once you have two hot...why not four, distributed across data centers. And so on.

crazygringo

3 months ago

> Failures are astonishingly, vanishingly rare

You and I must be using different sites and different clouds.

There's a reason isitdownrightnow.com exists. And why HN'ers are always complaining about service status pages being hosted on the same services.

By your logic, AWS and Azure should fail once in a millennium, yet they regularly bring down large chunks of the internet.

Literally last week: https://cyberpress.org/microsoft-azure-faces-global-outage-i...

okaleniuk

3 months ago

1 reply

Yes, reliability comes from the same ground the scalability does, and yes people are mostly chasing the latest trend. One does not contradict the other.

llm_nerd

3 months ago

>yes people are mostly chasing the latest trend

https://www.youtube.com/watch?v=b2F-DItXtZs

15 years ago people were making the same "chasing trends" complaints. In that case there absolutely were people cargo culting, but to still be whining about this a decade and a half later, when it's quite literally just absolutely basic best practices.

sgarland

3 months ago

1 reply

A distributed monolith - which is what nearly all places claiming to run microservices actually have - has N^m uptime.

Even if you do truly have a microservices architecture, you’ve also now introduced a great deal of complexity, and unless you have some extremely competent infra / SRE folk on staff, that’s going to bite you. I have seen this over and over and over again.

People make these choices because they don’t understand computing fundamentals, let alone distributed systems, but the Medium blogs and ChatGPT have assured them that they do.

dinkleberg

3 months ago

This is the truth. I work with an application that has nearly 100 microservices and it seems like at any given point in time at least one is busted. Is it going to impact what you’re doing? Maybe. Maybe not.

But if it was just a monolith and had proper startup checks, when they roll out a new version and it fails, just kill it right there. Leave the old working version up.

Monoliths have their issues too. But doing microservices correctly is quite the job.

gherkinnn

3 months ago

> Almost every choice that leads to scalability also leads to reliability.

Yes, dealing with skew for every single change and hunting down bugs across network boundaries that could have been a function call is peak reliability.

DrScientist

3 months ago

4 replies

Isn't it simple as the following?

Break your code into modules/components that have a defined interface between them. That interface only passes data - not code with behaviour - and signal the method calls may fail to complete ( ie throw exceptions ).

ie the interface could be a network call in the future.

Allow easy swapping of interface implementations by passing them into constructors/ using factories or dependency injection frameworks if you must.

That's it - you can then start with everything in-process and the rapid development that allows, but if you need to you can add splitting into networked microservices - any complexity that arises from the network aspect is hidden behind the proxy, with the ultimate escape hatch of the exception.

Have I missed something?

crazygringo

3 months ago

2 replies

You're not missing much, but I don't understand why you're just basically repeating what the article already says. Except the article also says to use a monorepo.

DrScientist

3 months ago

1 reply

I think I've added a couple of elements to make it possible to scale your auth service if you need to. Easily swappable implementations and making sure the interfaces advertise that calls may simply fail.

Even so it's still very simple.

To scale your auth service you just write a proxy to a remote implementation and pass that in - any load balancing etc is hidden behind that same interface and none of the rest of the code cares.

crazygringo

3 months ago

Good point! Sorry if I was being ungenerous.

I like the idea of the remote implementation being proxied -- not sure I've come across that pattern before.

stavros

3 months ago

No, I'm saying you don't need to use a monorepo! The repo discussion is a bit orthogonal, and up to you to decide whether you want a single repo or multiple repos with modules/libraries that get deployed together.

williamdclt

3 months ago

1 reply

You're not missing something, but you're assuming that it's easy to know ahead of time where the module boundaries should be and what the interfaces should look like. This is very far from easy, if possible at all (eg google "abstraction boundaries are optimization boundaries").

Also, most of these interfaces you'll likely never need. It's a cost of initial development, and the indirection is a cost on maintainability of your code. It's probably (although not certainly) cheaper to refactor to introduce interfaces as needed, rather than always anticipate a need that might never come.

mejutoco

3 months ago

I think it is more intuitive if we think of side-effects. By specifying the interface you are explicitly defining inputs and outputs. If you want to add this later, it can be very difficult to make sure you can find all the side-effects. The whole point of the interface is to explicitly limit those side-effects and extra inputs outputs to happen, so it makes sense to define in advance.

goodpoint

3 months ago

1 reply

Yes, you are missing the cost of complexity and network calls. You are describing a distributed monolith. It does not help.

DrScientist

3 months ago

Not sure I understand. What is a distributed monolith?

I'm not suggesting that the distributed bit is still coupled behind the scenes ( ie via a data backend that requires distributed transactions ) - the interaction is through the interface.

In the end you are always going to have code calling code - the key point is to assume these key calls are simply data passing, not behaviour passing, and that they can fail.

What else is need to make something network friendly? ( I'm suggesting that things like retries, load-balancing etc can be hidden as a detail in the network implementation - all you need to surface is succeed or fail ).

8note

3 months ago

1 reply

the swap from interface to network call is still non-trivial.

you get to have new problems that are qualitatively different from before like timeouts, which can break the adsumptions in the rest of your code about say, whether state was updated or not, and in what order. you also then get to deal with thundering herds and circuit breakers and so on.

DrScientist

3 months ago

Sure is more complex - but as I said key thing is to define those interfaces in a way that can be networked - you are just passing data not behaviour and the calls could fail to complete.

In terms of timing the call is synchronous and either succeeds or fails - the details like timeouts/ asynch underhood etc are hidden by the proxy - in the end the call succeeds or fails and if you surface that as a synchronous call you hide the underlying complexity from the caller.

A bit like opening a file and writing to it - most platform apis throw exceptions - and your code has to deal with it.

BirAdam

3 months ago

2 replies

Just to be honest for a bit here... we also should be asking what kind of scale?

Quite a while ago, before containers were a thing at all, I did systems for some very large porn companies. They were doing streaming video at scale before most, and the only other people working on video at that scale were Youtube.

The general setup for the largest players in that space was haproxy in front of nginx in front of several PHP servers in front of a MySQL database that had one primary r/w with one read only replica. Storage (at that time) was usually done with glusterfs. This was scalable enough at the time for hundreds of thousands of concurrent users, though the video quality was quite a bit lower than what people expect today.

Today at AWS, it is easily possible for people to spend a multiple of the cost of that hardware setup every month for far less compute power and storage.

aeyes

3 months ago

3 replies

The architecture you describe is ok because in the end it is a fairly simple website. Little user interaction, limited amount of content (at most a few million records), few content changes per day. The most complex part is probably to have some kind of search engine but even with 10 million videos an ElasticSearch index is probably no larger than 1GB.

The only problem is that there is a lot of video data.

ben_w

3 months ago

1 reply

This is probably also true for 98% of startups.

I think most people don't realise that "10 million" records is small, for a computer.

(That said, I have had to deal with code that included an O(n^2) de-duplication where the test data had n ~= 20,000, causing app startup to take 20 minutes; the other developer insisted there was no possible way to speed this up, later that day I found the problem, asked the CTO if there was a business reason for that de-duplication, removed the de-duplication, and the following morning's stand-up was "you know that 20 minute startup you said couldn't possibly be sped up? Yeah, well, I sped it up and now it takes 200ms")

phkahler

3 months ago

1 reply

I thought you were going to say to reduced O(n^2) to O(n*log(n)), but you just deleted the operation. Normally I'd say that's great, but just how much duplicate data is being left around now? Is that OK?

ben_w

3 months ago

1 reply

Each element was about, oh I can't remember exactly, perhaps 50 bytes? It wasn't a constant value, there could in theory be a string in there, but those needed to be added manually and when you have 20,000 of them, nobody would.

Also, it was overwhelmingly likely that none of the elements were duplicates in the first place, and the few exceptions were probably exactly one duplicate.

hedora

3 months ago

1 reply

I'm kind of surprised no one just searched for "deduplication algorithm". If it was absolutely necessary to get this 1MB dataset to be smaller (when was this? Did it need to fit in L2 on a pentium 3 something?), then it could probably have been deduped + loaded in 300-400ms.

Most engineers that I've worked with that die on a premature optimization molehill like you describe also make that molehill as complicated as possible. Replacing the inside of the nested loop with a hashtable probe certainly fits the stereotype.

ben_w

3 months ago

> I'm kind of surprised no one just searched for "deduplication algorithm".

Fair.

To set the scene a bit: the other developer at this point was arrogant, not at all up to date with even the developments of his preferred language, did not listen to or take advice from anyone.

I think a full quarter of my time there was just fire-fighting yet another weird thing he'd done.

> If it was absolutely necessary to get this 1MB dataset to be smaller

It was not, which is why my conversation with the CTO to check on if it was still needed was approximately one or two sentences from each of us. It's possible this might have been important on a previous pivot of the thing, at least one platform shift before I got there, but not when I got to it.

gf000

3 months ago

1 reply

As opposed to what problem?

Like I can honestly have trouble listing too many business problems/areas that would fail to scale with their expected user count, given reasonable hardware and technical competence.

Like YouTube and Facebook are absolute outliers. Famously, stackoverflow used to run on a single beefy machine (and the reason they changed their architecture was not due to scaling issues), and "your" startup ain't needing more scale than SO.

bccdee

3 months ago

Scaling to a lot of reads is relatively easy, but you get into weird architectural territory once you hit a certain volume of writes. Anything involving monitoring or real-time event analysis can get hairy. That's when stuff like kafka becomes really valuable.

bobdvb

3 months ago

1 reply

In streaming your website is typically totally divorced from your media serving. Media serving is just a question of cloud storage and pointing at an hls/dash manifest in that object store. Once it starts playing the website itself does almost nothing. Live streaming adds more complexity but it's still not much of a website problem.

Maintaining the media lifecycle, receiving, transcoding, making it available and removing it, is the big task but that's not real-time, it's batch/event processing at best efforts.

The biggest challenges with streaming are maintaining the content catalogue, which aren't just a few million records but rich metadata about the lifecycle and content relationships. Then user management and payments tends to also have a significant overhead, especially when you're talking about international payment processing.

BirAdam

3 months ago

This was before HTML5 and before the browser magically handled a lot of this… so there was definitely a bit more to it. Every company also wanted to have statistics of where people scrub to and all of that. It wasn’t super simple, but yeah, it also wasn’t crazy complex. The point is, scale is achievable without complex inf.

sgarland

3 months ago

3 replies

THANK YOU. People look at me like I’m insane when I tell them that their overly-complicated pipeline could be easily handled by a couple of beefy servers. Or at best, they’ll argue that “this way, they don’t have to manage infrastructure.” Except you do - you absolutely do. It’s just been partially abstracted away, and some parts like OS maintenance are handled (not that that was ever the difficult part of managing servers), but you absolutely need to configure and monitor your specific XaaS you’re renting.

huflungdung

3 months ago

6 replies

What I say is that we massively underestimate just how fast computers are these days

ahartmetz

3 months ago

Indeed - they are incredibly fast, it's just buried under layers upon layers of stuff

oceanplexian

3 months ago

If you know anything about hardware and look at the typical instances AWS is serving up (other than the ludicrously expensive ones) it's Skylake and older.

I think people have a warped perception of performance, if only because the cloud providers are serving up a shared VM on equipment I'd practically class as vintage computing. You could throw some of the same parts together from eBay and buy the whole system with less than a few months worth of the hourly on-demand cost.

hombre_fatal

3 months ago

On the other hand, there is a real crossroad that pops up that HNers tend to dismiss.

A common story is that since day one you just have lightweight app servers handling http requests doing 99% I/O. And your app servers can be deployed on a cheap box anywhere since they're just doing I/O. Maybe they're on Google Cloud Run or a small cluster of $5 VPS. You've built them so that they have zero deps on the machine they're running on.

But then one day you need to do some sort of computations.

One incremental option is to create a worker that can sit on a machine that can crunch the tasks and a pipeline to feed it. This can be seen as operationally complex compared to one machine, but it's also simple in other ways.

Another option is to do everything on one beefy server where your app servers just shell out the work on the same machine. This can be operationally simple in some ways, but not necessarily in all ways.

robotresearcher

3 months ago

No worries, another fifteen layers of software abstraction will soak that up pronto.

f1shy

3 months ago

In 2010 I was managing 100 servers, with many Oracle and Postgres DB, PHP, Apache, all on Solaris and Sun HW. I was constantly impressed by how people were unable to do more or less correct estimations. I had a discussion with my boss, he wanted to buy 8 servers, I argued one was more than enough. The system, after growing massively, was still in 2020 managing the load with just 3 servers. So I would argue, not only today, but 15 years ago already.

vidarh

3 months ago

Most younger devs just have no concept on how limited hardware we ran services on...

I used to run a webmail system with 2m accounts on hardware with less total capacity (ram, disk, CPU throughput) than my laptop...

What's more: It was a CGI (so new process for every request), and the storage backend spawned separate processes per user.

skydhash

3 months ago

1 reply

I look at what I can do with an old mac mini (2011) and it’s quite good. I think the only issue with hardware is technical maintenance, but at the scale of a small companies, that would probably be having a support contract with Dell and co.

amluto

3 months ago

Small companies should never forget to ask Dell, etc for discounts. The list prices at many of these companies are aspirational and, even at very small scale, huge discounts are available.

hrimfaxi

3 months ago

1 reply

Depending on your regulatory environment, it can be cost-effective to not have to maintain your own data center with 24/7 security response, environmental monitoring, fire suppression systems, etc. (of course, the majority of businesses are probably not interested in things like SOC 2)

wongarsu

3 months ago

1 reply

This argument comes up a lot, but it feels a bit silly to me. If you want a beefy server you start out with renting one. $150/month will give you a server with 24 core Xeon and 256GB of RAM, in a data center with everything you mentined plus a 24/7 hands-on technician you can book. Preferably rent two servers, because reliablity. Once you outgrow renting servers you start renting rack space in a certified data center with all the same amenities. Once you outgrow that you start renting entire racks, then rows of racks or small rooms inside the DC. Then you start renting portions of the DC. Once you have outgrown that you have to seriously worry about maintaining your own data center. But at that point you have so much scale that this will be the least of your worries

hrimfaxi

3 months ago

2 replies

> This argument comes up a lot, but it feels a bit silly to me. If you want a beefy server you start out with renting one. $150/month will give you a server with 24 core Xeon and 256GB of RAM, in a data center with everything you mentined plus a 24/7 hands-on technician you can book.

What's the bandwidth and where can I rent one of these??

wongarsu

3 months ago

3 replies

Hetzner [1]. Bandwidth is 1 GBit/s. You can also get 10 GBit/s, that's hidden away a bit instead of being mentioned on the order page [2]

1: https://www.hetzner.com/dedicated-rootserver/matrix-ex

2: https://docs.hetzner.com/robot/dedicated-server/network/10g-...

coder543

3 months ago

1 reply

I have wished for years that Hetzner would offer their bare metal servers in the U.S., and not just Hetzner Cloud.

Aeolun

3 months ago

1 reply

Here is US Hetzner: https://ioflood.com/

Their prices have come down a lot. I used them when the servers still cost $200 a piece, but their support at the time was fantastic.

christophilus

3 months ago

Wow. No joke. I haven’t heard of them, but I like their blurb, and those are Hetzner like prices. Now, I just need to find a use for that much beef.

hrimfaxi

3 months ago

1 reply

How is that any different from cloud?

This whole thread was a response to

> Today at AWS, it is easily possible for people to spend a multiple of the cost of that hardware setup every month for far less compute power and storage.

suggesting to use a few beefy servers but if we are renting them from cloud we're back where we started.

wongarsu

3 months ago

1 reply

The difference from the big clouds is that an equivalent instance at AWS costs 10x as much. If you go with few beefy servers AWS offers very little value for the money they charge, they only make sense for "cloud native" architectures. But if you rent raw servers from traditional hosters you can get prices much closer to the amortized costs of running them yourself, with the added convenience of having them in a certified data center with 24/7 security, backup power, etc.

If you want more control than that, colo is also pretty cheap [1]. But I'd consider that a step above what 95% of people need

https://www.hetzner.com/colocation

hrimfaxi

3 months ago

For me the comparison was not against the specific instance of AWS but cloud in general, and AWS was a for instance. Which was the whole reason why I brought up compliance and stuff—it is much cheaper to have someone else handle that for you (even if it is hetzner!). That was my whole point.

deaux

3 months ago

Not ideal when a large part of your userbase is in APAC.

iknowstuff

3 months ago

1 reply

https://us.ovhcloud.com/bare-metal/prices/?display=list

also pretty sure 24 cores is like 48 cloud “cores” which are usually just hyper threads right?

ahartmetz

3 months ago

1 reply

IME, a cloud "core" is even worse than a hyperthread. I'm not sure if they oversubscribe, or underclock, or if it's virtualization overhead... but anyway, not great.

jandrese

3 months ago

They oversubscribe.

smokel

3 months ago

ThoughtWorks gathers this phenomenon under the term "envy": Web Scale envy [1] or Big Data envy [2] are two relevant blips on their technology radar. It is typically better to keep things simple.

[1] https://www.thoughtworks.com/radar/techniques/high-performan...

[2] https://www.thoughtworks.com/radar/techniques/big-data-envy

drob518

3 months ago

> The first problem every startup solves is scalability. The first problem every startup should solve is “how do we have enough money to not go bust in two months”, but that’s a hard problem, whereas scalability is trivially solvable by reading a few engineering blogs, and anyway it’s not like anyone will ever call you out on it, since you’ll go bust in two months.

I laughed. I cried. Having a back full of microservices scars, I can attest that everything said here is true. Just build an effin monolith and get it done.

hsn915

3 months ago

I think it was around 2015 when everything was basically AWS and Kubernetes

The turning point might have been Heroku? Prior to Heroku, I think people just assumed you deploy to a VPS. Heroku taught people to stop thinking about the production environment so much.

I think people were so inspired by it and wanted to mimic it for other languages. It got more people curios about AWS.

Ironically, while the point of Heroku was to make deployment easy and done with a single command, the modern deployment story on cloud infrastructure is so complicated most teams need to hold a one hour meeting with several developers "hands on deck" and going through a very manual process.

So it might seem counter intuitive to suggest that the trend was started by Heroku, because the result is the exact opposite of the inspiration.

lambdaone

3 months ago

I had a client with a system just like this. EBS, S3, RDS, Cognito, the lot. It cost $00s per month under almost no load, and was a maintenance nightmare - which was the real problem, not the cost, as it stopped working altogether eventually. A bit of hacking later, it all fits on a single VM that costs ~$10/month to run and is far easier to build, deploy and maintain.

esher

3 months ago

I can relate - running a small hosting business. People come up with too complex solutions. They solve problems that they'd wish to have. For instance: HA setups are complex. If not done correctly, like in most cases, people don't gain the additional '9' from the SLA.

TickleSteve

3 months ago

scale vertically before horizontally...

- scaling vertically is cheaper to develop

- scaling horizontally gets you further.

What is correct for your situation depends on your human, financial and time resources.

210 more comments available on Hacker News

View full discussion on Hacker News

ID: 45525168Type: storyLast synced: 11/20/2025, 8:14:16 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN