A Webshell and a Normal File That Have the Same Md5
Posted4 months agoActive4 months ago
github.comTechstory
calmmixed
Debate
60/100
CryptographySecurityHash Collisions
Key topics
Cryptography
Security
Hash Collisions
A GitHub repository demonstrates two files with the same MD5 hash, one a webshell and the other a seemingly normal PHP file, sparking discussion on the implications for security and hash-based scanning.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
6m
Peak period
36
72-84h
Avg / period
6.7
Comment distribution47 data points
Loading chart...
Based on 47 loaded comments
Key moments
- 01Story posted
Sep 21, 2025 at 1:52 AM EDT
4 months ago
Step 01 - 02First comment
Sep 21, 2025 at 1:59 AM EDT
6m after posting
Step 02 - 03Peak activity
36 comments in 72-84h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 27, 2025 at 6:34 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45320382Type: storyLast synced: 11/20/2025, 4:53:34 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
That's only true if you ignore all the details.
As usual, you cannot make a coherent understanding on just about any subject by reading headlines alone. Life would have taught you by now that the devil is in the details.
WP uses salt and multiple rounds of hashing, fully mitigating the md5 collisions being topic of discussion here.
So no, wp doesn't "use md5" in the sense that they would be vulnerable to this type of attack.
Source: https://developer.wordpress.org/reference/functions/wp_hash_...
The amount of sweet, sweet irony displayed here will make me diabetic. Did you read the article at all? Salting? What are you on about?
Honestly, it feels that some HN commenters are LLMs instructed to defend a given entity.
>Can use it bypass some cached webshell detections.
As the OP article/PoC is about hashing uploaded files, not passwords btw, I think you should read it again.
Because as I pointed out, wp_hash() is used to check against uploaded files.
Oh, and source: https://developer.wordpress.org/reference/functions/wp_hash/
And as I cannot resist quoting you for trying to smartass while literally not having read the source code the PoC was about:
> As usual, you cannot make a coherent understanding on just about any subject by reading headlines alone. Life would have taught you by now that the devil is in the details.
But there are two applications: the first is breaking in to a system under some very obscure set of circumstances that you are very unlikely to encounter in the real world. The second is to bump up your karma on HN.
If you do know, then you also know md5 being broken is really really old news.
Seriously. Cryptographers have been warning that md5 seems weak since 1996. There are probably people reading this thread who weren't even alive yet. (It got totally broken in 2004 but the warning signs were way earlier).
Such security! Much wow!
Is there any way to use HN karma? Like, can I sell my account on some shady exchange like people sell big twitter accounts? And if I can, what's the going rate for internet points these days? Asking for an unscrupulous friend.
Nothing other than vanity AFAIK.
It's actually a bit of a scam because karma accumulates and never expires. I've been on the leaderboard for a long time, not because I'm making particularly valuable contributions (I only post a few times a week) but just because I've been on HN since it launched.
> Can use it bypass some cached webshell detections.
The thing that makes this blog post not realistic is:
* Such tricks would make much more sense with normal programs, where you're trying to trick an user to download and execute it. Webshells are downloaded by the attacker knowingly.
* Md5 is not used anymore (although I know security vendors who used it for embarrassingly long time). If this was SHA256, that attack would be devastating for many more severe reasons.
But it's still a fun PoC.
1. You can upload scripts that get scanned for malicious code 2. These scripts can be executed once deemed "safe" 3. The server is using MD5 hashes to determine if you uploaded the same file or if it should re-scan it
3. Is where the issue is. It should probably always re-scan it and it definitely should not be using MD5.
Most I've seen (sec scans, backup validation/dedup etc) pushed to phase out md5 very long time ago.
Wouldn't the sensible thing for a server that gets an upload matching an existing file's hash be to just treat it as an idempotent no-op? What reason would it have to replace the old version with a presumably identical copy? What am I missing?
[1]: https://lemire.me/blog/2025/01/11/javascript-hashing-speed-c...
Try this on your own system:
This is on an old Chromebook with Intel(R) Core(TM) m3-6Y30 CPU @ 0.90GHz CPU (dual core, but with hyperthreading). Note that even using only a single thread (which SHA256 and MD5 are limited to by their design), BLAKE3 is 6x as fast as SHA256 and 4x as fast as MD5.While BLAKE3 can be many times faster than SHA-256, by consuming many times more power, the amount of work for computing a hash differs much less between the 2 hashes than the execution time on a multi-core CPU.
The speed difference quoted by you for a single thread is caused by your Skylake-based CPU, which does not have the SHA hardware instructions.
Moreover, even the programs that claim to use the SHA hardware instructions may have a speed several times lower than allowed by the hardware, because the more recent CPUs, e.g. from the last 4 years, have wider SHA instructions than the older CPUs, but the programs must have been compiled to support such CPUs, e.g. Zen 3 and newer or Alder Lake and newer.
Such an algorithm has been first published by Ralph Merkle, in 1979, but it has been improved later:
https://en.wikipedia.org/wiki/Merkle_tree
For security, it is necessary to use different hash functions at different levels in the hash tree, but this is trivially achieved by using the same hash function, but also hashing some extra distinguishing data besides the hashes from the previous level.
Is this not an oxymoron? E.g. b3 then ought to be an order of magnitude easier to brute force.
For a 256-bit cryptographic hash function, it should take an expected 2^256 attempts to find a message with a given hash (preimage attack) and around 2^128 attempts to find any collision (due to the birthday paradox), and a few other properties like that. This holds for both SHA-256 and Blake3 (as far as we know—neither algorithm has proven security*) but not for MD5.
MD5 is insecure not just because its output size of 128 bit is too short (though that's a problem too), but also because it has weaknesses that allow constructing collisions with much less than the 2^64 attempts than you would expect on the basis of its output size. That's why MD5 is considered insecure even for its size.
Generally speaking, you want your hashing primitives to be as fast as possible. The practical security then comes from the output size. If someone discovered a secure 320-bit cryptographic hash that is a trillion times faster than even Blake3 (10^12 or about 2^40), everyone should adopt it, because it would be much faster and even more secure against brute force attacks than SHA-256/Blake3 are (since 320 > 256 + 40).
While there are use cases for deliberately slow hash functions too (notably password hashing) those can be constructed using fast hash functions as primitives. For example, one of the strongest password hashing schemes (Argon2) is based on one of the fastest hashing primitives (Blake2), not a slow one as you might have expected.
The only CPUs still likely to be in use and without SHA support are the Intel Core CPUs until and including the Skylake derivatives (i.e. up to Comet Lake, i.e. up to 6 years ago).
The Intel Atoms have received SHA support many years before Intel Core, because they competed with ARM, which already had such support.
The support in Intel Core has been added due to AMD Zen, but the products with it have been delayed by the failure of Intel to achieve acceptable fabrication yields in their 10-nm CMOS process, before 2019/2020.
if normal.php had actual php code in there, being really 'normal' as the name implies, this would be much more severe / interesting because it might be more easy to convince modern security products it's actually a benign file.
Currently if it would be analysed, it would be flagged as suspicious simply because its not a valid file. and really, it dont need to be php, it could be any valid file format as long as it's an actually file that has benign behavior or contents.
plaintext might be easier to generate, but you'd need it to be 'executable' format or something interpretable like a script to have it actually stored in databases marking files as malicious or benign. matching filetype with the malicious file, in a valid form that does actual benign behavior would be 'best'.
don't take me wrong tho. still fun to see these things and honestly props, if it bypasses anything that's always a 'nice result' :)
its funny often web basted languages have this property tho , i mean, how else you gonna poison logs and execute them :')... js and php are just adorable for providing opportunities :D
Also, a more recent innovation in MD5 collisions is textcoll, which creates colliding blocks that are completely plaintext. This would allow for colliding PHP source files like in OP but without any obvious binary artefacts (although this requires identical prefixes).
https://github.com/cr-marcstevens/hashclash?tab=readme-ov-fi...
https://github.com/angea/pocorgtfo#0x14
And yes, documents are not normally supposed to be able to display their own MD5 hash.