Mongobleed Explained Simply
Key topics
The MongoBleed vulnerability has sparked a lively debate about MongoDB's security and usage practices. Commenters point out that a staggering 213K MongoDB instances are exposed to the internet, according to a Shodan scan, and that the default installation settings often leave them vulnerable, with authentication disabled and binding to all interfaces. Many users attribute MongoDB's popularity to its flexibility and ease of use, but also to a perceived "laziness" in not having to define a schema or worry about persistence and durability. Interestingly, some commenters note that other databases, like Postgres, have also been exposed publicly without proper security measures, suggesting that the issue is not unique to MongoDB.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
45m
Peak period
33
0-6h
Avg / period
14.3
Based on 114 loaded comments
Key moments
- 01Story posted
Dec 28, 2025 at 4:03 PM EST
4d ago
Step 01 - 02First comment
Dec 28, 2025 at 4:48 PM EST
45m after posting
Step 02 - 03Peak activity
33 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 31, 2025 at 2:34 PM EST
2d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
That sort of inclination to push off doing the right thing now to save yourself a headache down the line probably overlaps with “let’s just make the db publicly exposed” instead of doing the work of setting up an internal network to save yourself a headache down the line.
But now we can at least be rest assured that the important data in mongoDB is just very hard to read with the lack of schemas.
Probably all of that nasty "schema" work and tech debt will finally be done by hackers trying to make use of that information.
It’s probably better to check what you’re working on than blindly assuming this thing you’ve gotten from somewhere is the right shape anyway.
I honestly feel like the opposite, at least if you're the only consumer of the data. I'd never really go out of my way to use a dynamically typed language, and at that point, I'm already going to be having to do something to get the data into my own language's types, and at that point, it doesn't really make a huge difference to me what format it used to be in. When there are a variety of clients being used though, this logic might not apply though.
I never said mongodb was wrong in that post, I just said it accumulated tech debt.
Let's stop feeling attacked over the negatives of tradeoffs
In any case, you quite literally said there was a "lack of schemas", and I disagreed with that characterization. I certainly didn't feel attacked by it; I just didn't think it was the most accurate way to view things from a technical perspective.
I suspect that this is in part due to historical inertia and exposure to SecDB designs.[0] Financial instruments can be hideously complex and they certainly are ever-evolving, so I can imagine a fixed schema for essentially constantly shifting time series universe would be challenging. When financial institutions began to adopt the SecDB model, MongoDB was available as a high-volume, "schemaless" KV store, with a reasonably good scaling story.
Combine that with the relatively incestuous nature of finance (they tend to poach and hire from within their own ranks), the average tenure of an engineer in one organisation being less than 4 years and you have an osmotic process of spreading "this at least works in this type of environment" knowledge. Add the naturally risk-averse nature of finance[ß] and you can see how one successful early adoption will quickly proliferate across the industry.
0: This was discussed at HN back in the day too: https://calpaterson.com/bank-python.html
ß: For an industry that loves to take financial risks - with other people's money of course, they're not stupid - the players in high finance are remarkably risk-averse when it comes to technology choices. Experimentation with something new and unknown carries a potentially unbounded downside with limited, slowly emerging upside.
Which is such a cop out, because there is always a schema. The only questions are whether it is designed, documented, and where it's implemented. Mongo requires some very explicit schema decisions, otherwise performance will quickly degrade.
Kleppmann chooses "schema-on-read" vs "schema-on-write" for the same concept, which I find harder to grasp mentally, but describes when schema validation need occur.
* Don't worry about a schema.
* Don't worry about persistence or durability.
* Don't worry about reads or writes.
* Don't worry about connectivity.
This is basically the entire philosophy, so it's not surprising at all that users would also not worry about security.
https://www.shodan.io/search?query=mongodb https://www.shodan.io/search?query=mysql https://www.shodan.io/search?query=postgresql
But this must be proportional to the overall popularity.
You very often have both NoSQL and SQL at scale.
NoSQL is used for high availability of data at scale - iMessage famously uses it for message threads, EA famously uses it for gaming matchmaking.
What you do is have both SQL and NoSQL. The NoSQL is basically caches of resources for high availability. Imagine you are making a social media app... Yes of course you have a SQL database that stores all the data, but you maintain API caches of posts in NoSQL.
Why? This gets to some of your other black vs white insults: NoSQL is typically WAY FASTER than SQL. That's why you use it. It's way faster to read a JSON file from a hard drive than it is to query a SQL database, always has been. So why not use NoSQL for EVERYTHING? Well, because you have duplicated data everywhere since it's not relational, it's just giant caches essentially. You also will get slow queries when the documents get huge.
Anyway you need both. It's not an either/or thing. I cannot believe this many years later people do not know the purpose of SQL and NoSQL and do not understand that it is not a competition at all. You want both!
Mongo has spent its entire existence pretending to be a SQL database by poorly reinventing everything you get for free in postgres or mysql or cockroach.
That being said the question was genuine - because I don't keep up with the ecosystem, I don't know it's ever valid practice to have a nosql db exposed to the internet.
-JS devs after "Signing In With Facebook" to MongoDB Atlas
AKA me
Sorry guys, I broke it
The end result is "everyone" kind of knows that if you put a PostgreSQL instance up publicly facing without a password or with a weak/default password, it will be popped in minutes and you'll find out about it because the attackers are lazy and just running crypto-mine malware, etc.
We expected this to hurt performance, but we were unable to measure any impact in practice.
Everyone still working in memory-unsafe languages should really just do this IMO. It would have mitigated this Mongo bug.
see here: https://godbolt.org/z/rMa8MbYox
[0] https://llvm.org/docs/LangRef.html#llvm-memset-intrinsics
[1] https://gitweb.git.savannah.gnu.org/gitweb/?p=gnulib.git;a=b...
The C committee gave you memset_explicit. But note that there is still no guarantee that information can not leak. This is generally a very hard problem as information can leak in many different ways as it may have been copied by the compiler. Fully memory safe languages (so "Safe Rust" but not necessarily real-word Rust) would offer a bit more protection by default, but then there are still side-channel issues.
Creating memset_explicit won't fix existing code. "Oh but what if maybe" is just cope.
If I do memset then free then that's what I want to do
And the way things go I won't be surprised if they break memset_explicit for some other BS reason and then make you use memset_explicit_you_really_mean_it_this_time
Once you accept that optimizing compilers do, well, optimizations, the question is what should be allowed and what not. Both inlining "memset" and eliminating dead stores are both simply optimizations which people generally want.
> Once you accept that optimizing compilers do, well, optimizations
Why in tarnation it is optimizing out a write to a pointer out before a function that takes said pointer? Imagine it is any other function besides free, see how ridiculous that sounds?
And obviously, I did test that it actually works.
Looks like this is the default in OpenBSD.
If you need better performance, write your own allocator optimized for your specific use case — it's not that hard.
Besides, you if you don't need to clear old allocations, there are likely other optimizations you'll be able to find which would never fly in a system allocator.
Note that many malloc implementations will do this for you given an appropriate environment, e.g. setting MALLOC_CONF to opt.junk=free will do this on FreeBSD.
ref: https://www.youtube.com/watch?v=b2F-DItXtZs
I would rather not use it, but I see that there are legitimate cases where MongoDB or DynamoDB is a technically appropriate choice.
Absence of evidence is not evidence of absence...
Do other CVE reports come with more strong statements? I’m not sure they do. But maybe you can provide some counter examples that meet your bar.
It is also a pretty standard response indeed. But now that it was highlighted, maybe it does deserve some scrutiny?
It is standard, yes. The problem with it as a statement is that it's true even if you've collected exactly zero evidence. I can say I don't have evidence of anyone being exploited, and it's definitely true.
What would break if the compiler zero'd it first? Do programs rely on malloc() giving them the data that was there before?
Honestly, aside from the "<emoji> impact" section that really has an LLM smell (but remember that some people legit do this since it's in the llm training corpus), this more feels like LLM assisted (translated? reworded? grammar-checked?) that pure "explain this" prompt.
I did some research with it, and used it to help create the ASCII art a bit. That's about it.
I was afraid that adding the emoji would trigger someone to think it's AI.
In any case, nowadays I basically always get at least one comment calling me an AI on a post that's relatively popular. I assume it's more a sign of the times than the writing...
In hindsight, I would not even have thought about it if not for the comment I replied to. LLM prose fail to make me read whole paragraphs and I find myself skipping roughly the second half of every paragraph, which was definitely not the case for your article. I did somewhat skip at the emoji heading, not because of LLMs, but because of a saturation of emojis in some contexts that don't really need them.
Again, sorry, don't get discouraged by the LLM witch hunt.
https://x.com/dez_/status/2004933531450179931
Almost always when you hear about emails or payment info leaking (or when Twitter stored passwords in plaintext lol) it's from logs. And a lot of times logs are in NoSQL because it is only ever needed in that same JSON format and in a very highly available way (all you Heroku users tailing logs all day, yw) and then almost nobody encrypts phone numbers and emails etc. whenever those end up in logs.
There's basically no security around logs actually. They're just like snapshots of the backend data being sent around and nobody ever cares about it.
Anyway it has nothing to do with the choice to use NoSQL, it has more to do with how neglected security is around it.
Btw in case you are wondering in both the Twitter plaintext password case and in the Rainbow Six Siege data leak you mention were both logs that leaked. NoSQL backed logs sure, but it's more about the data security around logging IMO.
MongoBleed
https://news.ycombinator.com/item?id=46394620
Do yourself a favour, use ToroDB instead (or even straight PostgreSQL's JSONB).
26 more comments available on Hacker News