IP Address Truncation Fails at Anonymization
Posted2 months agoActive2 months ago
00f.netTechstory
calmnegative
Debate
40/100
Data PrivacyAnonymizationNetwork Security
Key topics
Data Privacy
Anonymization
Network Security
The article discusses how IP address truncation, a common method for anonymizing IP addresses, can be ineffective and even reveal more information than intended, sparking a discussion on the limitations and risks of such anonymization techniques.
Snapshot generated from the HN discussion
Discussion Activity
Light discussionFirst comment
5h
Peak period
3
12-14h
Avg / period
1.4
Key moments
- 01Story posted
Oct 27, 2025 at 2:23 PM EDT
2 months ago
Step 01 - 02First comment
Oct 27, 2025 at 7:29 PM EDT
5h after posting
Step 02 - 03Peak activity
3 comments in 12-14h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 28, 2025 at 1:15 PM EDT
2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45724527Type: storyLast synced: 11/20/2025, 1:30:03 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://en.wikipedia.org/wiki/Crypto-PAn
If you store the blood type of a patient hashed, the problem is that there are only so many blood types. So the same blood type will have the same hash value and attackers could (1) just infer statistically which are which, (2) crack one and get the rest and (3) group users even without cracking the hash.
That means we need to ensure the input values are getting more complex by prefixing them with secrets from elsewhere.
If you have one secret (e.g. stored in an environment variable) that would be the pepper. Adding pepper just makes cracking harder, but since it is the same for each value, it is not enough. But since it is not stored next to the input value it makes attacks harder.
A salt would be a per value secret that is stored for each blood type and prepended on hash.
The two in combination make it much harder to get from the hashed value to the input value without having both salt and pepper.
"IP anonymization" is kind of a subset of "data destruction." We want to destroy some of the information — like, "is this address 127.0.0.2?" — but we want to preserve some of it — like, "is this one address in the same /24 subnet as this other one?". That's because we want to be able to say things like, "50% of our traffic comes from a single /24. Its anonymized name in this dataset is 28.238.72.0/24; we can't tell you what its real name is because we anonymized that away."
If your threat model includes things like "We really want not to be able to say things like that about our dataset," then obviously you should not use (only) anonymization. Because the whole point of anonymization is precisely to preserve the ability to say things like that about subnet structure, while anonymizing away the real addresses.
Perhaps it should have been called "IP pseudonymization." I would have said that ship has sailed, but after googling "ip pseudonymization" it seems like maybe precise terminology is trying to make a comeback due to things like the GDPR.
https://portolano.it/en/newsletter/portolano-cavallo-inform-...
> In the General Court’s opinion [...] the identifiability of the data subject should be assessed taking into account the concrete possibilities of the third-party recipient to identify data subjects. As such, when sharing pseudonymous data, the same must be considered anonymous if the recipient has no means to re-identify data subjects.
> [S]ince the third-party recipient did not have access to the additional information capable of identifying the data subjects, nor could it in any way have acquired such access, the transmitted data should be considered anonymous data and not pseudonymous data.
If the author couldn't be bothered to write it, why would anyone think we should bother to read it?