Back to Home11/13/2025, 3:17:57 PM

We cut our Mongo DB costs by 90% by moving to Hetzner

arbol

258 points

201 comments

Mood

supportive

Sentiment

positive

Discussion Activity

Very active discussion

First comment

21m

Peak period

156

Day 1

Avg / period

Comment distribution160 data points

Based on 160 loaded comments

Key moments

01Story posted
11/13/2025, 3:17:57 PM
5d ago
Step 01
02First comment
11/13/2025, 3:39:11 PM
21m after posting
Step 02
03Peak activity
156 comments in Day 1
Hottest window of the conversation
Step 03
04Latest activity
11/18/2025, 7:11:21 AM
1d ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (201 comments)

Showing 160 comments of 201

CyanLite2

5d ago

3 replies

"We replaced a cluster of virtualized servers with a single bare metal server. Nothing has gone wrong, yet."

usrnm

5d ago

1 reply

There are many cases when some downtime is perfectly ok. Or, at least, worth the savings

tayo42

5d ago

They saved a little under 3k and were motivated by the aws outage

NorwegianDude

5d ago

To be fair, a single server is way more reliable than cloud clusters.

Just look at the most recent many hour long Azure downtime where Microsoft could not even get microsoft.com back. With that much downtime you could physically move drives between servers multiple times each year, and still have less downtime. Servers are very reliable, cloud software is not.

I'm not saying people should use a single server if they can avoid it, but using a single cloud provider is just as bad. "We moved to the cloud, with managed services and redundancy, nothing has gone wrong...today"

arbol

5d ago

Lol yep that could've been the headline. We plan to add replica servers at some point. This DB is not critical to our product hence the relaxed interim setup.

rmoriz

5d ago

1 reply

I‘m a big fan of owning the stack but why not spend the money on redundancy? At least a couple of machines in a different data center at Hetzner or another provider (OVH, Scaleway, Vultr, …) can easily fit your budget.

arbol

5d ago

1 reply

We will be adding additional db servers and running our own replica set eventually. We're just not there yet. Thanks for reading!

hinkley

5d ago

2 replies

But then you’ll be tripling your costs.

Business people are weird about numbers. You should have claimed 70% even if the replicas do nothing and made them work later on. This is highly likely to bite you on the ass.

rmoriz

4d ago

Exactly, this is junior mistake I made too many times. There is a wisdom: Never tell anyone, when you’ve won the lottery.

In technical terms you need to plan ahead. The legacy mistakes are caused by actions in the past and will likely be made again, when you can’t change the strategy or approach to problems. You won‘t get budget for this AFTER you successfully made a change. „It‘s all solved now, we are good“. No.

mystifyingpoi

5d ago

+1 this is so true. You've lost, you've already publicly praised yourself that you saved 90%. They won't like the idea of tripling the costs, even if it is still below the previous costs.

petcat

5d ago

1 reply

> The more keen eyed among you will have noticed the huge cost associated with data transfer over the internet - its as much as the servers! We're building Prosopo to be resilient to outages, such as the recent massive AWS outage, so we use many different cloud providers.

I mean, you're connecting to your primary database potentially on another continent? I imagine your costs will be high, but even worse, your performance will be abysmal.

> When you migrate to a self-hosted solution, you're taking on more responsibility for managing your database. You need to make sure it is secure, backed up, monitored, and can be recreated in case of failure or the need for extra servers arises.

> ...for a small amount of pain you can save a lot of money!

I wouldn't call any of that "a small amount of pain." To save $3,000/month you've now required yourself to become experts in a domain that maybe is out of your depth. So whatever cost saved is now tech debt and potentially having to hire someone else to manage your homemade solution for you.

However, I self-host, and applaud other self-hosters. But sometimes it really has to make business sense for your team.

arbol

5d ago

1 reply

> I mean, you're connecting to your primary database potentially on another continent?

Atlas AWS was actually setup in Ireland. The data transfer costs were coming from extracting data for ML modelling. We don't get charged for extracting data with the new contract.

> experts in a domain that maybe is out of your depth

We're in the bot detection space so we need to be able to run our own infra in order to inspect connections for patterns of abuse. We've built up a fair amount of knowledge because of this and we're lucky enough to have a guy in our team who just understands everything related to computers. He's also pretty good at disseminating information.

Thanks for reading!

goastler

5d ago

aww shucks ;)

kachapopopow

5d ago

4 replies

Always consider if 12 hours of lost revenue is worth the savings. Recently hetzner has been flakey with minimum or no response for support or even status updates that anything was wrong. My favorite was them blaming an issue on my side just to have a maintenance status update the day after about congestion.

arbol

5d ago

2 replies

Atlas wasn't giving us any support for $3K per month. Hetzner at least have some channel to contact them, which is an improvement. That said, if their uptime is rubbish them we'll probably migrate again. Moving back to Atlas is not an option as we were getting hammered by the data transfer costs and this was only going to increase due to our architecture. Thanks for reading!

zamalek

5d ago

1 reply

OVH is allegedly pretty good. I host all my personal stuff on Hetzner right now so I can't speak to it personally.

arbol

5d ago

We also use OVH and have so far not had any downtime in about 6 months.

kosherhurricane

5d ago

3 replies

500GB isn't a lot of data, and $3K/month seems like an extortion for that little data.

Having said that, MongoDB pricing page promises 99.995% uptime, which is outstanding, and would probably be hard to beat that doing it oneself, even after adding redundancy. But maybe you don't need that much uptime for your particular use case.

arbol

5d ago

2 replies

Its more like 700GB now on the new server and we were about to have to migrate to a higher tier on Atlas.

> maybe you don't need that much uptime for your particular use case.

Correct. Thanks for reading!

tecleandor

5d ago

Yep, we just migrated to Atlas, and the disk size limitation of the lower instance tiers pushed us to do a round of data cleaning before the migration.

Also, we noticed that after migration, the databases that were occupying ~600GB of disk in our (very old) on premise deployment, were around 1TB big on Atlas. After talking with support for a while we found that they were using Snappy compression with a relatively low compression level and we couldn't change that by ourselves. After requesting it through support, we changed to zstd compression, rebuilt all the storage, and a day or two later our storage was under 500GB.

And backup pricing is super opaque. It doesn't show concrete pricing on the docs, just ranges. And depending on the cloud you deployed, snapshots are priced differently so you can't just multiply you storage by the number of the snapshots, and they aren't transparent about the real size of the snapshots.

All the storage stuff is messy and expensive...

gervwyk

5d ago

Did you have a very aggressive backup schedule?

gizzlon

5d ago

> Having said that, MongoDB pricing page promises 99.995% uptime

Or.. what? That's the important part

buster

5d ago

That's all fine and such, but i suppose the SLAs aren't covering your revenue loss.

In fact after looking at https://www.mongodb.com/legal/sla/atlas/data-federation#:~:t... it makes me wonder how much worth the SLA is. 10% Service Credit after all the limitations?

Atlas can take their 10% Service Credit, i wouldn't care. Save the money and chose a stable provider.

0x073

5d ago

1 reply

Using hetzner since 5 years never had issues and only 1 downtime in one data center.

kachapopopow

4d ago

I think the issue stems from their poor cloud infrastructure since that's where I've had the most issues the dedicated servers seem fine, that being said 2 years prior I had no issues either so it's definitely something recent.

izacus

5d ago

1 reply

My Hetzner instances all have higher reliability and uptime than AWS deployments. For years now.

That was an interesting surprise.

jabwd

5d ago

1 reply

Curious what kind of deployments you are running with them? I only have personal stuff with Hetzner; but never had issues so far (bare metal in my case coz cheap for what I get and need).

izacus

1d ago

Mostly EC2 type VMs with docker clients in them. Keeping infrastructure simple is important for us :)

mbesto

5d ago

1 reply

If I understand correctly, the author's company provides a CAPTCHA alternative, which presumably means that if their service goes down, all of their customer's logins, forms, etc. either become inoperable or don't provide the security the company is promising by using their service.

This makes me want to use the company's service less because now I know they can't survive an outage in a consistent and resilient way.

goastler

1d ago

We have 14 servers and counting running the frontend captcha service, we have a high uptime which you can observe here: https://portal.prosopo.io/status

We have extra provisions for enterprise clients to provide rock solid SLAs for every use case.

The db in question is our data store for event that we use for aggregated features such as traffic analysis and ml. This service lags behind our realtime services so we can deal with some downtime if necessary

CodesInChaos

5d ago

2 replies

MongoDB Atlas is so overpriced that you can probably save already 90% by moving to AWS.

computerfan494

5d ago

3 replies

Most of the cost in their bill wasn't from MongoDB, it was cost passed on from AWS

CodesInChaos

5d ago

1 reply

Was it? Assuming an M40 cluster consists of 3 m6g.xlarge machines, that's $0.46/hr on-demand compared to Atlas's $1.04/hr for the compute. Savings plans or reserved instances reduce that cost further.

computerfan494

5d ago

There's definitely MongoDB markup, but a full 33% of their bill was AWS networking costs that have nothing to do with Atlas.

darth_avocado

5d ago

1 reply

Highly doubt that. MongoDB has 5000 well paid employees and is not a big loss making enterprise. If most of the cost was pass through to AWS, they’d not be able to do that. Their quarterly revenue is $500M+ but also spend $200M in sales and marketing and $180M in R&D. (All based on their filings)

computerfan494

5d ago

1 reply

You can look at this particular bill and observe that more than 50% of the cost was going to AWS.

MagicMoonlight

5d ago

1 reply

If they’re a reseller of AWS, which they will be, they decide the rates that get charged.

computerfan494

5d ago

Yes, and my point is that this customer switching to running their own MongoDB instances on EC2 like Atlas does would reduce the bill by less than 50% because the rates that they are charging mean that their cut is less than what AWS is getting from this customer.

CodesInChaos

5d ago

I don't remember the numbers (90% is probably a bit exaggerated) but our savings of going from Atlas to MongoDB Community on EC2 several years ago were big.

In addition to direct costs, Atlas had also expensive limitations. For example we often spin up clone databases from a snapshot which have lower performance and no durability requirements, so a smaller non-replicated server suffices, but Atlas required those to be sized like the replicated high performance production cluster.

KaiserPro

5d ago

we saved 50% by moving from atlas to a three node cluster. Thats for a 6tb db (we moved because of size rather than cost, but its been a nice bonus)

cnkk

5d ago

1 reply

Are you sure you went with RAID1 with 4x disks instead of RAID10?

arbol

5d ago

Good spot - this is wrong. It should've been 4 x 3.84 TB NVMe SSD RAID 5. My colleague set this bit up so I'm not entirely up to speed on the terminology.

CodesInChaos

5d ago

1 reply

How long does mongodump take on that database? My experience was that incremental filesystem/blockdevice snapshots were the only realistic way of backing up (non sharded) mongodb. In our case EBS snapshots, but I think you can achieve the same using LVM or filesystems like XFS and ZFS.

goastler

5d ago

It takes ~21hrs to dump the entire db (~500gb), but I'm limited by my internet speed (100mbps, seeing 50-100mbps during dump). Interestingly, the throughput is faster than doing a db dump from atlas which used to max around 30mbps

0xbadcafebee

5d ago

17 replies

> Here's how we managed to cut our costs by 90%

You could cut your MongoDB costs by 100% by not using it ;)

> without sacrificing performance or reliability.

You're using a single server in a single datacenter. MongoDB Atlas is deployed to VMs on 2-3 AZs. You don't have close to the same reliability. (I'm also curious why their M40 instance costs $1000, when the Pricing Calculator (https://www.mongodb.com/pricing) says M40 is $760/month? Was it the extra storage?)

> We're building Prosopo to be resilient to outages, such as the recent massive AWS outage, so we use many different cloud providers

This means you're going to have multiple outages, AND incur more cross-internet costs. How does going to Hetzner make you more resilient to outages? You have one server in one datacenter. Intelligent, robust design at one provider (like AWS) is way more resilient, and intra-zone transfer is cheaper than going out to the cloud ($0.02/GB vs $0.08/GB). You do not have a centralized or single point of failure design with AWS. They're not dummies; plenty of their services are operated independently per region. But they do expect you to use their infrastructure intelligently to avoid creating a single point of failure. (For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.)

I get it; these "we cut bare costs by moving away from the cloud" posts are catnip for HN. But they usually don't make sense. There's only a few circumstances where you really have to transfer out a lot of traffic, or need very large storage, where cloud pricing is just too much of a premium. The whole point of using the cloud is to use it as a competitive advantage. Giving yourself an extra role (sysadmin) in addition to your day job (developer, data scientist, etc) and more maintenance tasks (installing, upgrading, patching, troubleshooting, getting on-call, etc) with lower reliability and fewer services, isn't an advantage.

goastler

5d ago

3 replies

> you're going to have multiple outages us: 0, aws: 1. Looking good so far ;)

> AND incur more cross-internet costs hetzner have no bandwidth traffic limit (only speed) on the machine, we can go nuts.

I understand you point wrt the cloud, but I spend as much time debugging/building a cloud deployment (atlas :eyes: ) as I do a self-hosted solution. Aws gives you all the tools to build a super reliable data store, but many people just chuck something on us-east-1 and go. There's you single point of failure.

Given we're constructing a many-node decentralised system, self-hosted actually makes more sense for us because we've already had to become familiar enough to create a many-node system for our primary product.

When/if we have a situation where we need high data availability I would strongly consider the cloud, but in the situations where you can deal with a bit of downtime you're massively saving over cloud offerings.

We'll post a 6-month and 1-year follow-up to update the scoreboard above

runako

5d ago

3 replies

> many people just chuck something on us-east-1 and go

Even dropping something on a single EC2 node in us-east-1 (or at Google Cloud) is going to be more reliable over time than a single dedicated machine elsewhere. This is because they run with a layer that will e.g. live migrate your running apps in case of hardware failures.

The failure modes of dedicated are quite different than those of the modern hyperscaler clouds.

chubot

5d ago

2 replies

It's not an apples-to-apples comparison, because EC2 and Google Cloud have ephemeral disk - persistent disk is an add-on, which is implemented with a complex and frequently changing distributed storage system

On the other hand, a Hetzner machine I just rented came with Linux software RAID enabled (md devices in the kernel)

---

I'm not aware of any comparisons, but I'd like to see see some

It's not straightforward, and it's not obvious the cloud is more reliable

The cloud introduces many other single points of failure, by virtue of being more complex

e.g. human administration failure, with the Unisuper incident

https://news.ycombinator.com/item?id=40366867

https://arstechnica.com/gadgets/2024/05/google-cloud-acciden... - “Unprecedented” Google Cloud event wipes out customer account and its backups

Of course, dedicated hardware could have a similar type of failure, but I think the simplicity means there is less variety in the errors.

e.g. A distributed system is one in which the failure of a computer you didn't even know existed can render your own computer unusable - Leslie Lamport

travisgriggs

5d ago

1 reply

> by virtue of being more complex

I just wish there was a way to underscore this more and more. Complex systems fail in complex ways. Sadly, for many programmers, the thrill or ego boost that comes with solving/managing complex problems lets us believe complex is better than simple.

antod

5d ago

One side effect of devops over the last 10-15yrs I've noticed as dev and ops converged is that infrastructure complexity exploded as the old school pessimistic sysadmin culture of simplicity and stability gave way to a much more optimistic dev culture. Also better tooling also enabled increased complexity in a self fulfilling feedback loop as more complexity also demanded better tooling.

It's kept me employed though...

Anonyneko

5d ago

Anecdotal, but a year ago we lost the whole RAID array in a rented Hetzner server to some hardware failure.

In a way, I think it doesn't matter what you use as long as you diversify enough (and have lots of backups), as everything can fail, and often the probability of failure doesn't even matter that much as any failure can be one too many.

wongarsu

5d ago

1 reply

Hardware failures on server hardware at the scale of 1 machine are far less common than us-east-1 downtime

The typical failure mode of AWS is much better. Half the internet is down, so you just point at that and wait for everything to come back, and your instances just keep running. If you have one server you have to do the troubleshooting and recovery work. But you need to run more than one machine to get fewer nines of reliability

runako

5d ago

1 reply

> Hardware failures on server hardware at the scale of 1 machine are far less common than us-east-1 downtime

A couple pieces of gentle pushback here:

- if you chose a hyperscaler, you should use their (often one-click) geographic redundancy & failover.

- All of the hyperscalers have more than one AZ. Specifically, there's no reason for any AWS customer to locate all/any* of their resources in us-east-1. (I actively recommend against this.)

* - Except for the small number of services only available in us-east-1, obviously.

wongarsu

5d ago

Hetzner also offers more than one datacenter, which you should obviously use if you want geographic redundancy. But the comment I was replying was saying "Even dropping something on a single EC2 node in us-east-1", and for a single EC2 node in us-east-1 none of the things you are mentioning are possible without violating the premise

jabwd

5d ago

The internet was designed to survive nukes.

Lets host it all with 2 companies instead and see how it goes.

Anyway random things you will encounter: Azure doesn't work because frontdoor has issues (again, and again) A webapp in Azure just randomly stops working, its not live migrated by any means, restarts don't work. Okay lets change SKU, change it back, oop its on a different baremetal cluster and now it works again. Sure there'll be some setup (read, upsell) that'll prevent such failures from reaching customers, but there is just simply no magic to any of this.

Really wish people would stop dreaming up reasons that hyperscalars are somehow magical places where issues don't happen and everything is perfect if you justtt increase the complexity a little bit more the next time around.

kdazzle

5d ago

I’m curious about the resilience bit. Are you planning on some sort of active-active setup with mongo? I found it difficult on AWS to even do active-passive (i guess that was docdb), since programatically changing the primary write node instance was kind of a pain when failing over to a new region.

Going into any depth with mongo mostly taught me to just stick with postgres.

MobileVet

5d ago

Thanks for sharing the story and committing to a 6-month and 1 year follow up. We will definitely be interested to hear further how it went over time.

In the mean time, I am curious where the time was spent debugging and building Atlas deployments? It certainly isn't the cheapest option, but it has been quite a '1 click' solution for us.

mnutt

5d ago

1 reply

> we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.

I think it was just luck of the draw that the failure happened in this way and not some other way. Even if APIs falling over but EC2 instances remaining up is a slightly more likely failure mode, it means you can't run autoscaling, can't depend on spot instances which in an outage you can lose and can't replace.

0xbadcafebee

5d ago

1 reply

> it means you can't run autoscaling, can't depend on spot instances which in an outage you can lose and can't replace

Yes, this is part of designing for reliability. If you use spot or autoscaling, you can't assume you will have high availability in those components. They're optimizations, like a cache. A cache can disappear, and this can have a destabilizing effect on your architecture if you don't plan for it.

This lack of planning is pretty common, unfortunately. Whether it's in a software component or system architecture, people often use a thing without understanding the implications of it. Then when AWS API calls become unavailable, half the internet falls over... because nobody planned for "what happens when the control plane disappears". (This is actually a critical safety consideration in other systems)

mnutt

5d ago

Sure, you can only use EC2, not use autoscaling or spot and instead just provision to your highest capacity needs, and not use any other AWS service that relies on dynamo as a dependency.

We still take some steps to mitigate control plane issues in what I consider a reasonable AWS setup (attempt to lock ASGs to prevent scale-down) but I place the control plane disappearing on the same level as the entire region going dark, and just run multi-region.

toast0

5d ago

1 reply

> Intelligent, robust design at one provider (like AWS) is way more resilient, and intra-zone transfer is cheaper than going out to the cloud ($0.02/GB vs $0.08/GB).

If traffic cost is relevant (which it is for a lot of use cases), Hetzner's price of $1.20/TB ($0.0012 / GB) for internet traffic [1] is an order of magnitude less than what AWS charges between AWS locations in the same metro. If you host only at providers with reasonable bandwidth charges, most likely all of your bandwidth will be billed at less than what AWS charges for inter-zone traffic. That's obscene. As far as I can tell, clouds are balancing their budgets on the back of traffic charges, but nothing else feels under cost either.

> For example, during the AWS outage, my company was in us-east-1, and we never had any issues, because we didn't depend on calling AWS APIs to continue operating. Things already running continue to run.

This doesn't always work out. During the GCP outage, my service was running fine, but other similar services were having trouble, so we attracted more usage, which we would have scaled up for, except that the GCP outage prevented that. Cloud makes it very expensive to run scaled beyond current needs and promises that scale out will be available to do just in time...

[1] https://docs.hetzner.com/robot/general/traffic/

precommunicator

5d ago

1 reply

keep in mind, for dedicated servers, traffic is free and unlimited - see the page you've linked

toast0

5d ago

Not if you're running at 10G...

dspillett

5d ago

4 replies

> You're using a single server in a single datacenter.

This is a common problem with “bare metal saved us $000/mo” articles. Bare metal is cheaper than cloud by any measure, but the comparisons given tend to be misleadingly exaggerated as they don't compare like-for-like in terms of redundancy and support, and after considering those factors it can be a much closer result (sometimes down as far as familiarity and personal preference being more significant).

Of course unless you are paying extra for multi-region redundancy things like the recent us-east-1 outage will kill you, and that single point of failure might not really matter if there are several others throughout your systems anyway, as is sometimes the case.

PenguinCoder

5d ago

6 replies

Premature optimization. Not every single service needs or require 5 nines.

bastawhiz

5d ago

2 replies

What does that mean, though?

If I'm storing data on a NAS, and I keep backups on a tape, a simple hardware failure that causes zero downtime on S3 might take what, hours to recover? Days?

If my database server dies and I need to boot a new one, how long will that take? If I'm on RDS, maybe five minutes. If it's bare metal and I need to install software and load my data into it, perhaps an hour or more.

Being able to recover from failure isn't a premature optimization. "The site is down and customers are angry" is an inevitability. If you can't handle failure modes in a timely manner, you aren't handling failure modes. That's not an optimization, that's table stakes.

It's not about five nines, it's about four nines or even three nines.

ffsm8

4d ago

1 reply

You're confusing backup with high availability.

Backups are point in time snapshots of data, often created daily and sometimes stored on tape.

It's primary usecase is giving admins the ability to e.g restore partial data via export and similar. It can theoretically also be used to restore after you had a full data loss, but that's beyond rare. Almost no company has had that issue.

This is generally not what's used in high availability contexts. Usually, companies have at least one replica DB which is in read only and only needs to be "activated" in case of crashes or other disasters.

With that setup you're already able to hit 5 nines, especially in the context of b2e companies that usually deduct scheduled downtimes via SLA

bastawhiz

1d ago

> With that setup you're already able to hit 5 nines

This is "five nines every year except that one year we had two freak hardware failures at the same time and the site was hard down for eighteen hours".

"Almost no company has this problem" well I must be one incredibly unlucky guy, because I've seen incidents of this shape at almost every company I've worked at.

bcrl

5d ago

I know one company that strove for five sixes.

withinboredom

5d ago

and each additional nine increases complexity geometrically.

dpkirchner

5d ago

It's true, but I'm woken up more frequently if there are fewer 9s, which is unpleasant. It's worth the extra cost to me.

hdgvhicv

5d ago

Hence you can use AWS to host them.

dspillett

4d ago

This is true. Also some things are just fine, in fact sometimes better (better performing at the scale they actually need and easier to maintain, deploy, and monitor), as a single monolith instead of a pile of microservices. But when comparing bare metal to cloud it would be nice for people to acknowledge what their solution doesn't give, even if the acknowledgement comes with the caveat “but we don't care about that anyway because <blah>”.

And it isn't just about 9s of uptime, it is all the admin that goes with DR if something more terrible then a network outage does happen, and other infrastructure conveniences. For instance: I sometimes balk at the performance we get out of AzureSQL given what we pay for it, and in my own time you are safe to bet I'll use something else on bare metal, but while DayJob are paying the hosting costs I love the platform dealing with managing backup regimes, that I can do copies or PiT restores for issue reproduction and such at the click of the button (plus a bit of a wait), that I can spin up a fresh DB & populate it without worrying overly about space issues, etc.

I'm a big fan of managing your own bare metal. I just find a lot of other fans of bare metal to be more than a bit disingenuous when extolling its virtues, including cost-effectiveness.

chasd00

5d ago

you have to look at all the factors, a simple server in a simple datacenter can be very very stable. When we were all doing bare metal servers back in the day server uptimes measured in years wasn't that rare.

celsoazevedo

5d ago

It doesn't have to be one server in a single datacenter, though. It adds some complexity, but you could have a backup server ready to go at a different cheap provider (Hetzner and OVH, for example) and still save a lot.

Aeolun

5d ago

I think the problem is that the multi-az redundancy in AWS setups has saved me exactly zero times. The problem is nearly always some application issue.

dvfjsdhgfv

5d ago

> unless you are paying extra for multi-region redundancy things like the recent us-east-1 outage will kill you

Unfortunately it's not guaranteed hat paying for multi-region replication will save you.

koakuma-chan

5d ago

2 replies

> You could cut your MongoDB costs by 100% by not using it ;)

I cut my Mongo DB costs by 100% by piping my data to /dev/null.

philote

5d ago

https://github.com/dcramer/mangodb

diffuse_l

5d ago

At least it's ACID compliant

canucktrash669

5d ago

1 reply

At some point our cross-AZ traffic for Elasticsearch replication at AWS was more expensive than what we'd pay to host the whole cluster replicated across multiple baremetal Hetzner servers.

Could we have done better with more sensible configs? Was it silly to cluster ES cross-AZ? Maybe. Point is that if you don't police every single detail of your platform at AWS/GCP and the like, their made-up charges will bleed your startup and grease their stock price.

canucktrash669

5d ago

turns out cross AZ is recommended for ES. perhaps our data team was rewritting the indices too often. but it was an internal requirement. so I think the data schema could have been more efficient to append deltas instead of reindexing all. but none of that will inflate your bill significantly at Hetzner. of course it will at AWS as that's how they incentivise clients to optimize and reduce their impact. and that's how you cut your runway by 3-6 months in compute heavy startups

lxe

5d ago

2 replies

I think you underestimate how reduction in complexity can increase reliability. becoming a sysadmin for a single inexpensive server instance carries almost the same operational burden as operating an unavoidably very complicated cluster using a cloud provider.

gervwyk

5d ago

not if you are using Atlas. Its as simple as it can be with way more functionality you can ever admin in yourself.

As others have said unless the scale of the data is the issue, if your switching because of cost, perhaps you should be going back to your business model instead.

hdgvhicv

5d ago

Nowhere near the same. Admining a few servers is far easier than a mix of AWS cloud services, especially when they are either metal as a service or plain VMs.

navigate8310

5d ago

1 reply

Usually AWS is pretty good at hiding all the reliability and robustness that goes onto into making a customer's managed service. Customers are not made aware what it takes.

chasd00

5d ago

An interesting experiment would be doing the equivalent at the scale of the median saas company.

Setup mongodb (or any database) so that you have geographically distributed nodes with replication+whatever else and maintain the same SLA as one of the big hyperscalers. Blog about how long did it take to setup, how hard is it to maintain, and how much are the ongoing costs.

My hunch is a setup on the scale of the median saas company is way more simple and cost effective than you'd think.

whatever1

5d ago

1 reply

AWS and Azure were down for a full day in the past month.

No way I cannot spin up my infra in a full day even if the current datacenter burns to the ground.

So we have the same reliability.

hsbauauvhabzb

5d ago

1 reply

Not if you don’t have hot replicated user data etc, assuming that matters, which it will unless you outsource auth and if you do that you’re back at square 1

walletdrainer

5d ago

Just have backups, you could literally sync everything to rsync.net unless you have a ridiculous amount of users

KaiserPro

5d ago

1 reply

> MongoDB Atlas is deployed to VMs on 2-3 AZs

I've not actually seen an AZ go down in isolation, so whilst I agree its technically a less "robust" deployment, in practice its not that much of a difference.

> these "we cut bare costs by moving away from the cloud" posts are catnip for HN. But they usually don't make sense.

We moved away from atlas because they couldn’t cope with the data growth that we had(4tb is the max per DB). Turns out that its a fuck load cheaper even hosting on amazon (as in 50%). We haven't moved to hertzner because that would be more effort than we really want to expend, but its totally doable, with not that much extra work.

> more maintenance tasks (installing, upgrading, patching, troubleshooting, getting on-call, etc) with lower reliability and fewer services, isn't an advantage.

Depends right, firstly its not that much of an overhead, and if it saves you significant cash, then it increases your run rate.

korkybuchek

5d ago

1 reply

> I've not actually seen an AZ go down in isolation

Counterpoint: I have. Maybe not completely down, but degraded, or out of capacity on an instance type, or some other silly issue that caused an AZ drain. It happens.

dvfjsdhgfv

5d ago

While I agree, I remember we once had cross-region replication for some product but when AWS was down the service was down anyway because of some dependency. Things were working fine during our DR exercises, but when the actual failure arrived, cross-region turned out useless.

hsbauauvhabzb

5d ago

1 reply

Naïve. If the network infrastructure is down, your computer goes down, it just happens that the functionality that went down you didn’t rely on. You could not rely on any functions at all by turning the server off, too.

tekno45

5d ago

yeah loadbalancers definitely removed instances during that outage. Watched it happen.

rajman187

5d ago

> You could cut your MongoDB costs by 100% by not using it ;)

Came here to say exactly this

celsoazevedo

5d ago

> You have one server in one datacenter.

It doesn't have to be only one server in one datacenter though.

It's more work, but you can have replicas ready to go at other Hetzner DCs (they offer bare metal at 3 locations in 2 different countries) or at other cheaper providers like OVH. Two or three $160 servers is still cheaper than what they're paying right now.

winrid

5d ago

At FastComments we have Mongo deployed across three continents and four regions on dedicated servers with full disk encryption, across two major providers just incase. It was setup by one person. Replication lag is usually under 300ms.

raxxorraxor

4d ago

I don't buy it. It really depends on your service, but I don't believe the reliability story. All large providers have had outages and I do host services on a single server that didn't have an outage in a few years.

Depends on the service and its complexity. More complexity means more outages. In most instances a focus on easy recoverability is more productive than preemptive "reliability". As I have said, depends on your service.

And prices get premium very fast if you have either a lot of traffic or low traffic but larger file interchange. And you have more work to do if you use the cloud, because it uses non-standard interfaces. Today a well maintained server is a few clicks away. Even for managed servers you have maintenance and configuration. Plus, your provider probably changes the service quite often. I had to accommodate beanstalk while my application was just running on its own, free of maintenance needs.

torginus

5d ago

How often do the AZs matter? - I feel like there's a major global outage on every cloud provider of choice, at least every other year, yet I don't remember any outage where only a single AZ went down (I'm on AWS).

Fighting said outages is often made harder is that the providers themselves just don't admit to anything being wrong, everything's green on the dashboard yet 4 out of 5 requests are timing out.

port11

5d ago

These types of posts make for excellent karma farming, but this one does present all the issues you've mentioned. Heck, Scaleway has managed Mongo for a bit more money and with redundancy and multi-AZ to boot. Were they trying to go as cheap as possible?

euph0ria

5d ago

2 replies

You probably want to store the backup somewhere else, ie. not Hetzner.

They are known to just cancel accounts and cut access.

sdoering

5d ago

2 replies

Any proof of that? I am a Hetzner customer and had never heard of this before. Would be good to know what I got into.

ch2026

5d ago

1 reply

A few years back I launched an io game and used hetzner as my backend. an hour into launch day they null routed my account because their anti-abuse system thought my sudden surge in websocket connections was an attack (unclear if they thought it was inbound or outbound doing the attacking).

I had paid for advertising on a few game curation sites plus youtubers and streamers. Lovely failure all thanks to Hetzner. Took 3 days and numerous emails with the most arrogant Germans you’ve ever met before my account was unlocked.

I switched to OVH and while they’re not without their own faults (reliability is a big one), it’s been a far better experience.

__turbobrew__

5d ago

OVH also null routes, it has happened to me.

It seems like you have to go to one of the big boys like hurricane electric where you are allowed to use the bandwidth you paid for without someone sticking their fingers in it.

arcanemachiner

5d ago

There are a lot of such stories if you go digging around HN and reddit threads. Haven't seen a lot of these stories in a while, so it may be happening less now.

arbol

5d ago

Good shout. I think we'll also run replicas on other providers. We've got some complex geo-fencing stuff to do with regards to data hence why we're just on Hetzner right now.

PeterZaitsev

5d ago

1 reply

Note, if you're looking for MongoDB Enterprise features you can find many of them with Percona Server for MongoDB, which you can use for free the same way as MongoDB Community

arbol

5d ago

Nice, thanks for the tip!

ianberdin

5d ago

5 replies

I’m starting to worry about this Hetzner trend. It can end up to get the price skyrocketing.

arbol

5d ago

2 replies

Hopefully not. Their console is pretty bad so I reckon that will put a lot of people off.

the_duke

5d ago

1 reply

The cloud console is pretty good though? Even does live sync!

The old one for dedicated servers (robot) is horribly outdated though.

arbol

5d ago

Ah right, we're on robot so I've not seen the cloud one. Robot is old! :)

patrickmcnamara

5d ago

The new console is completely fine.

righthand

5d ago

4 replies

We’re just going to end up with everyone moving from Amazon to Hetzner and the same issue will remain. High prices, lockin, etc will appear.

We need an American “get off American big tech” movement.

Differentiate people! Reading “we moved from X to Y” does not mean everyone move from X to Y, it means start considering the Y values and research other Y’s around you.

arbol

5d ago

1 reply

We also use OVH, Contabo, Hostwinds... Architect so you can be multi-provider and reduce internet centralisation!

righthand

5d ago

Nice, if you write an article about it, try to leave the focus off of a single hosting provider. Encouraging the differentiation is important too (next time! I’m not dogging the movement or your efforts in this article, I love to see reduced reliance of Amazon in general).

zzzeek

5d ago

1 reply

Hetzner is German?

Lapel2742

5d ago

> Hetzner is German?

Yes. Hetzner is a German company from Gunzenhausen.

https://en.wikipedia.org/wiki/Hetzner

k4rnaj1k

5d ago

Pretty sure hetzner is still a lot less in terms of provided features. There are reasons people get "amazon certified". So, aws alternatives are few and require a lot more resources to create and maintain, while alternatives to hetzner would be a lot easier to create, keeping original Hetzner prices in-check with the market.

cmrdporcupine

5d ago

> We need an American “get off American big tech” movement.

As a non-American, I use Hetzner precisely to have my projects not hosted anywhere near the US.

dehrmann

5d ago

EC2 is sort of a ceiling price.

goastler

5d ago

There's other providers (OVH, etc) so I'm sure the price will remain competitive

dzonga

5d ago

prices just dropped. :)

poszlem

5d ago

2 replies

As in so many of these stories, what gets glossed over is just how much complexity there is in setting up your own server securely.

You set up your server. Harden it. Follow all the best practices for your firewall with ufw. Then you run a Docker container. Accidentally, or simply because you don’t know any better, you bind it to 0.0.0.0 by doing 5432:5432. Oops. Docker just walked right past your firewall rules, ignored ufw, and now port 5432 is exposed with default Postgres credentials. Congratulations. Say hello to Kinsing.

And this is just one of many possible scenarios like that. I’m not trying to spread FUD, but this really needs to be stressed much more clearly.

EDIT. as always - thank you HN for downvoting instead of actually addressing the argument.

mkesper

5d ago

2 replies

I don't see the point of using ufw at all as Hetzner provides an external firewall.

tracker1

5d ago

UFW doesn't add much overhead given the implementation in Linux is already in place, it's mostly just a convenient front-end. That said, you also need to be concerned with internal/peer threats as well as external ones...

Clearly defining your boundaries is important for both internal and external vectors of attack.

poszlem

5d ago

If you use a dedicated hetzner machine you only get a stateless firewall. That would be one reason.

isaacvando

5d ago

There are also an enormous number of ways to build insecure apps on AWS. I think the difficulty of setting up your own server is massively overblown. And that should be unsurprising given that there are so many companies that benefit from developers thinking it's too hard.

cpursley

5d ago

8 replies

Why in the world do people choose Mongo over Postgres? I'm legit curious. Is it inexperience? Javascript developers who don't know backend or proper data modeling (or about jsonb)? Is this type of decision coming down from non-technical management? Are VCs telling their portfolio companies what to use so they have something to burn their funding on? It's just really confounding, especially when there's even mongo-api compatible Postgres solutions now. Perhaps I'm just not webscale and too cranky.

a13n

5d ago

2 replies

maybe instead of communicating how dumb you think people are for choosing mongo, communicate why you think it’s so dumb

cpursley

5d ago

Why mongo is dumb has been written up about ad nauseam - from data modeling and quality issues, out of control costs, etc. It's been a known toxic dumpsterfire for well over a decade...

williamdclt

5d ago

I've read a lot more about "how dumb it is to use mongo over PG" than the opposite, I think the burden of proof is on the mongo-lovers these days (not that anyone has to prove anything to randos on the internet)

tgv

5d ago

2 replies

I'll repeat it again: you don't always want a relational database. Sometimes you need a document-oriented one. It matches quite a lot of use cases, e.g. when there aren't really interesting relations, or when the structures are very deep. That can be really annoying in SQL.

> when there's even mongo-api compatible Postgres solutions

With their own drawbacks.

kdazzle

5d ago

I'd probably use a jsonfield in postgres for data that i knew was going to be unstructured. meanwhile, other columns can join and have decent constraints and indexes.

KaiserPro

5d ago

> Sometimes you need a document-oriented one.

Like a file system?

arbol

5d ago

2 replies

Personally I've found it faster to build using mongo cause you don't need to worry about schemas. You get 32mb per document and you can work out your downstream processing later, e.g. cleanup and serve to postgres, file, wherever. This data is a big data dump that's feeding ML models so relational stuff is not that important.

debazel

5d ago

1 reply

I used to build personal projects like this, but after Postgres got JSONB support I haven't found any reason to not just start with Postgres. There's usually a couple of tables/columns you want a proper schema for, and having it all in Postgres to begin with makes it much easier to migrate the schemaless JSONB blobs later on.

winrid

5d ago

Their JSONB impl is not equivalent in terms of write isolation.

Eikon

5d ago

You definitely do have to worry about a schema. Except it’s ill defined and scattered across your business logic.

stopthe

5d ago

We've been using mongodb for the past 8 years. What we like:

- schema-less: we don't have to think about DDL statements at any point.

- oplog and change streams as built-in change data capture.

- it's dead simple to setup a whole new cluster (replica set).

- IMO you don't need a designated DBA to manage tens of replica sets.

- Query language is rather low-level and that makes performance choices explicit.

But I have to admit that our requirements and architecture play to the strength of mongodb. Our domain model is neatly described in a strongly typed language. And we use a sort of event sourcing.

tracker1

5d ago

It depends on your use case, and RDBMS isn't the best option for all needs. Mongo's approach is pretty useable. That said, there are alternatives, you can get very similar characteristics, though a more painful devex out of say CockroachDB with (key:string, value: JSONB) tables.

The only thing I really don't care for is managing Mongo... as a developer, using it is pretty joyous assuming you can get into the query mindset of how to use it.

Also, if you're considering Mongo, you might also want to consider looking at Cassandra/ScyllaDB or CockroachDB as alternatives that might be a better fit for your needs that are IMO, easier to administer.

riku_iki

5d ago

> Why in the world do people choose Mongo over Postgres?

Postgrtes distributed story is more complicated.

aranw

5d ago

> Why in the world do people choose Mongo over Postgres?

I'm using on a project not by choice. It was chosen already when I joined the project and the more we develop the project the more I feel Postgres would be a better fit but I don't think we can change it now

nalekberov

5d ago

IMHO it's because so many people take decisions in rush. e.g. let's not design database, put whatever data shape we came ip in alpha version and see where it goes. Sometimes people favor one particular technology because every other startup chose it.

To be quite honest today's software engineering sadly is mostly about addressing 'how complex can we go' rather than 'what problem are we trying to solve'.

zkmon

5d ago

2 replies

Atlas is plain robbery. I see companies paying 600K USD/month on a few clusters, mostly used for testing. The problem is they got locked into this, by doing a huge migration of their apps and switching to a different tech would easily take 2 to 5 years.

nuschk

4d ago

Would a company paying 600k per month not also be able to employ a couple of devs to improve the situation? Sure, effort is required, but with the right people they could save a ton and have a very good ROI.

I think it's just more complicated than that. No hostage situation, just good old incentives.

mathattack

5d ago

I’ve seen this happen many times. It looks cheap and easy to spin up, then it grows out of hand and they kill you on the renewals.

zzzeek

5d ago

1 reply

it's getting hard to ignore Hetzner (as a Linode user).

Thing is, Linode was great 10-15 years ago, then enshittification ensued (starting with Akamai buying them).

So what does enshittification for Hetzner look like? I've already got migration scripts pointed at their servers but can't wait for the eventual letdown.

tracker1

5d ago

IMO, virtual servers and dedicated server hosting is really commoditized at this point. So you have a lot of options... assuming you have appropriate orchestration and management scripted out, with good backup procedures in place, you should be able to shift to any other provider relatively easily.

The pain points are when you're also intwined with specific implementations for services from a given provider... Sure, you can shift from PostgreSQL on a hosted provider to another without much pain... but say SQS to Azure Simple Queues or Service Bus is a lot more involved. And that is just one example.

The is a large reason to keep your services to those with self-hosted options and/or self-hosting from the start... that said, I'm happy to outsource things that are easier to (re) integrate or replace.

mads_quist

5d ago

2 replies

OK guys, running on a single instance is REALLY a BAD IDEA for non-pet-projects. Really bad! Change it as fast as you can.

I love Hetzner for what they offer but you will run into huge outages pretty soon. At least you need two different network zones on Hetzner and three servers.

It's not hard to setup, but you need to do it.

MaKey

5d ago

3 replies

I think you're being overly dramatic. In practice I've seen complexity (which HA setups often introduce) causing downtimes far more often than a service being hosted only on a single instance.

lewiscollard

5d ago

2 replies

Yes, any time someone says "I'm going to make a thing more reliable by adding more things to it" I either want to buy them a copy of Normal Accidents or hit them over the head with mine.

immibis

4d ago

How bad are the effects of an interruption for you? Google has servers running every day, but you with one server can afford to gamble on it, since it probably won't fail for years - no matter the hardware though, keep a backup, because data loss is permanent. Would you lose millions of dollars a minute, or would you just have to send an email to customers saying "oops"?

Risk management is a normal part of business - every business does it. Typically the risk is not brought down all the way to zero, but to an acceptable level. The milk truck may crash and the grocery store will be out of milk that day - they don't send three trucks and use a quorum.

If you want to guarantee above-normal uptime, feel free, but it costs you. Google has servers failing every day just because they have so many, but you are not Google and you most likely won't experience a hardware failure for years. You should have a backup because data loss is permanent, but you might not need redundancy for your online systems. Depending on what your business does.

smartbit

5d ago

Normal Accidents https://en.wikipedia.org/wiki/Normal_Accidents

mads_quist

5d ago

1 reply

You'll have planned downtime just for upgrading MongoDB version or rebooting the instance. I don't think that this is sth you'd want to have. Running MongoDB in a replica set is really easy and much easier than running postgres or MySQL in an HA setup.

No need for SREs. Just add 2 more Hetzner servers.

spwa4

5d ago

1 reply

The sad part of that is that 3 Hetzner servers are still less than 20% of the price of equivalent AWS resources. This was already pretty bad when AWS started, but now it's reaching truly ridiculous proportions.

from the "Serverborse": i7-7700 with 64GB ram and 500G disk.

37.5 euros/month

This is ~8 vcpus + 64GB ram + 512G disk.

585 USD/month

It gets a lot worse if you include any non-negligible internet traffic. How many machines before for your company a team of SREs is worth it? I think it's actually dropped to 100.

mads_quist

5d ago

Sure, I am not against Hetzner, it's great. I just find that running sth in HA mode is important for any service that is vital to customers. I am not saying that you need HA for a website. Also, I run many applications NOT in HA mode but those are single customer applications where it's totally fine to do maintenance at night or on the weekend. But for SaaS this is probably not a very good idea.

PunchyHamster

5d ago

HA can be hard to get right, sure, but you have to at least have (TESTED) plan for what happens

"Run a script to deploy new node and load last backup" can be enough, but then you have to plan on what to tell customers when last few hours of their data is gone

antoniojtorres

5d ago

agree on single instance, but for hetzner, I run 100+ large bare metal servers in hetzner, have for at least 5 years and there’s only been one significant outage they had, we do spread across all their datacenter zones and replicate, so it’s all been manageable. It’s worth it for us, very worth it.

41 more comments available on Hacker News

View full discussion on Hacker News

ID: 45915884Type: storyLast synced: 11/16/2025, 9:42:57 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Read Article View on HN