IP Address Truncation Fails at Anonymization

Posted2 months agoActive2 months ago

jedisct1

26 points

10 comments

00f.netTechstory

calmnegative

Debate

40/100

Data PrivacyAnonymizationNetwork Security

Key topics

Data Privacy

Anonymization

Network Security

The article discusses how IP address truncation, a common method for anonymizing IP addresses, can be ineffective and even reveal more information than intended, sparking a discussion on the limitations and risks of such anonymization techniques.

Snapshot generated from the HN discussion

Discussion Activity

Light discussion

First comment

Peak period

12-14h

Avg / period

1.4

Key moments

01Story posted
Oct 27, 2025 at 2:23 PM EDT
2 months ago
Step 01
02First comment
Oct 27, 2025 at 7:29 PM EDT
5h after posting
Step 02
03Peak activity
3 comments in 12-14h
Hottest window of the conversation
Step 03
04Latest activity
Oct 28, 2025 at 1:15 PM EDT
2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (10 comments)

Showing 10 comments

quuxplusone

2 months ago

2 replies

TFA correctly points to (subnet-structure-preserving) encryption as the right way to anonymize IP addresses, although for some reason it calls it "IPCrypt" instead of "Crypto-PAn."

https://en.wikipedia.org/wiki/Crypto-PAn

comex

2 months ago

1 reply

Anonymization is supposed to be irreversible. This scheme is reversible by whoever has the key. I don't really get the point of it.

true_religion

2 months ago

1 reply

Any stable hash can't truly anonymize IP addresses because there is a finite amount of outputs easily computable via ordinary machines.

atoav

2 months ago

1 reply

Which is why we pepper and salt our hashes.

If you store the blood type of a patient hashed, the problem is that there are only so many blood types. So the same blood type will have the same hash value and attackers could (1) just infer statistically which are which, (2) crack one and get the rest and (3) group users even without cracking the hash.

That means we need to ensure the input values are getting more complex by prefixing them with secrets from elsewhere.

If you have one secret (e.g. stored in an environment variable) that would be the pepper. Adding pepper just makes cracking harder, but since it is the same for each value, it is not enough. But since it is not stored next to the input value it makes attacks harder.

A salt would be a per value secret that is stored for each blood type and prepended on hash.

The two in combination make it much harder to get from the hashed value to the input value without having both salt and pepper.

47282847

2 months ago

That’s encryption at rest, but not anonymization, unless you throw away the salt and pepper, at which point the record becomes meaningless since it cannot serve for future comparisons.

atoav

2 months ago

1 reply

This can be anonymization, if you throw away the key. If you keep it, it worse than encryption since now attackers can also differenciate subnets.

quuxplusone

2 months ago

Right. In fact "data destruction" itself can be implemented as "encryption" plus "throwing-away-the-key" plus (importantly!) "throwing-away-the-plaintext." If you don't throw away the plaintext after encryption, you're really missing an important step. ;)

"IP anonymization" is kind of a subset of "data destruction." We want to destroy some of the information — like, "is this address 127.0.0.2?" — but we want to preserve some of it — like, "is this one address in the same /24 subnet as this other one?". That's because we want to be able to say things like, "50% of our traffic comes from a single /24. Its anonymized name in this dataset is 28.238.72.0/24; we can't tell you what its real name is because we anonymized that away."

If your threat model includes things like "We really want not to be able to say things like that about our dataset," then obviously you should not use (only) anonymization. Because the whole point of anonymization is precisely to preserve the ability to say things like that about subnet structure, while anonymizing away the real addresses.

Perhaps it should have been called "IP pseudonymization." I would have said that ship has sailed, but after googling "ip pseudonymization" it seems like maybe precise terminology is trying to make a comeback due to things like the GDPR.

https://portolano.it/en/newsletter/portolano-cavallo-inform-...

> In the General Court’s opinion [...] the identifiability of the data subject should be assessed taking into account the concrete possibilities of the third-party recipient to identify data subjects. As such, when sharing pseudonymous data, the same must be considered anonymous if the recipient has no means to re-identify data subjects.

> [S]ince the third-party recipient did not have access to the additional information capable of identifying the data subjects, nor could it in any way have acquired such access, the transmitted data should be considered anonymous data and not pseudonymous data.

bashtoni

2 months ago

1 reply

Can we get a tag for AI slop generated articles like this one?

If the author couldn't be bothered to write it, why would anyone think we should bother to read it?

Sophira

2 months ago

Why do you feel this was generated by AI?

waynesonfire

2 months ago

We would also truncate lat/lot coordinates.

View full discussion on Hacker News