Latexposed: a Systematic Analysis of Information Leakage in Preprint Archives
Posted3 months agoActive3 months ago
arxiv.orgResearchstory
controversialmixed
Debate
80/100
Academic PublishingInformation SecurityLatex
Key topics
Academic Publishing
Information Security
Latex
A study analyzed LaTeX sources in arXiv preprints and found sensitive information, sparking debate about responsible disclosure and the trade-offs between openness and security in academic publishing.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
2h
Peak period
9
2-4h
Avg / period
2.5
Comment distribution20 data points
Loading chart...
Based on 20 loaded comments
Key moments
- 01Story posted
Oct 13, 2025 at 4:33 AM EDT
3 months ago
Step 01 - 02First comment
Oct 13, 2025 at 6:08 AM EDT
2h after posting
Step 02 - 03Peak activity
9 comments in 2-4h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 14, 2025 at 5:02 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45566123Type: storyLast synced: 11/20/2025, 12:35:35 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I am getting so tired of every vulnerability getting a cutesy pet name trying to pretend being the new Heartbleed / Spectre / Meltdown...
Though I have to admit, when I was still in academia, whenever I saw a beautiful figure or formatting in a preprint, I'd often try to take some inspiration from the source for my own work, occasionally learning a new neat trick or package.
1: https://info.arxiv.org/help/faq/whytex.html
At least arxiv could have run the cleaner [1] before the print of this pre-print (lol). If there was no disclosure, then I think this pre-print becomes unethical to put up.
> leading to the identification of nearly 1,200 images containing sensitive metadata. The types of data represented vary significantly. While device information (e.g., the camera used) or software details (such as the exact version of Photoshop) may already raise concerns, in over 600 cases the metadata contained GPS coordinates, potentially revealing the precise location where a photo was taken. In some instances, this could expose a researcher’s home address (when tied to a profile picture) or the location of research facilities (when images capture experimental equipment)
Oof, that's not too great.
[1] https://github.com/google-research/arxiv-latex-cleaner
The vast majority (I would wager >(100 - 1e-4)) of location of research institutions is public knowledge and can be found out by simply googling the institution address (I am not aware of a single research institution that publishes publically where the location is confidential).
Though I doubt all my collaborators do something similar.