Incremental Backups of Gmail Takeouts
Key topics
The debate around email archiving heats up as one developer shares a tool for incremental backups of Gmail takeouts, sparking a lively discussion on the value of holding onto old emails. While some commenters, like SanjayMehta, swear by deleting non-essential emails to minimize storage needs, others, such as raybb and stephenhuey, cherish the nostalgia and practicality of keeping old messages, citing instances where they've dug up decade-old emails for reference or reminiscence. As the conversation unfolds, it becomes clear that the decision to keep or discard old emails hinges on individual needs and habits, with some commenters pointing out that disk space is cheap and ignoring old emails is easier than deleting them. The discussion reveals a surprising divide between those who've long practiced email minimalism and those who see value in preserving their digital history.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
N/A
Peak period
43
108-120h
Avg / period
12.6
Based on 63 loaded comments
Key moments
- 01Story posted
Dec 25, 2025 at 7:59 AM EST
15 days ago
Step 01 - 02First comment
Dec 25, 2025 at 7:59 AM EST
0s after posting
Step 02 - 03Peak activity
43 comments in 108-120h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 31, 2025 at 7:39 PM EST
9 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
https://github.com/rustmailer/bichon
I only save financial statements and contact information. Everything else gets deleted as soon as possible.
What's the advantage to deleting? It's easy to ignore anything old and disk space is cheap. Do you delete old photos?
I keep photos, though I don't keep all photos. But emails and messages? Why...
As for photos, I print maybe 1 out of 100 and don't bother with the rest.
If she was hard-deleting everything, she wasn't just Inbox Zero, she was F---s Zero, too.
Old SMS, iMessage, Telegram etc messages have been useful from time to time too for similar reasons.
Both can also serve as exceptional time capsules that provide windows into past “eras” of life. I occasionally kick myself for not having archived mail and messages from a couple of defunct email addresses and chat apps… without them there’s a hole spanning a few years where visibility is limited.
Also to reminisce how cheap stuff was.
So, yes
About OP's tool, I also back up my Google account to an external disk periodically. Gmail is ~8 GB so it's manageable. But Google Photos is a pain. They recently removed most of the useful APIs, so AFAIK the only way to backup is via Takeout. It's terrible. Pictures in multiple albums are included as copies every time, so I had to make a script to find duplicates and replace them with symlinks. Just downloading the whole thing is a PITA (multiple 50 GB zip files). I get that Google has little incentive to make this better, in fact they might have an incentive to make it as inconvenient as possible, but I really wish they made it easier.
Also, oftentimes I search email not so much for the content, but to find the timestamp associated with a particular event. I have had to search old email metadata a few times when I get an unexpected question related to time (for example, gmail will ask when you created the account as part of its account recovery process).
Back when I used Gmail I just kept everything personal and work related but when I moved away and started paying for email storage I took a different approach. It didn’t make sense for me to pay considerably more storage for something I almost never use.
I ended up backing up all of my emails outside of the last 5 years and stored them on an offline drive where I can reference them as eml files if I ever need it.
Going forward once a year I’ll export and purge the oldest year in my account.
I let it pile up, rarely delete anything except marketing emails. Over 30K emails in my gmail inbox.
That's why I keep all of them
I do have “Clean Inbox”[1] because I don’t see or interact with them, but I keep them. The only emails I see are the actionable “Unread OR Flagged.”
1. https://brajeshwar.com/2024/email/
As I'm sure the author is aware, Restic will do hash-based chunking so that similar files can be efficiently be backed up.
How similar are two successive Takeout mboxes?
If the order of messages within an mbox is stable, and new emails are inserted somewhere, the delta update might be tiny.
Even if the order of the mbox's messages are ~random, Restic's delta updates will forego large attachments.
The average of sticking with restic is simplicity, and also avoiding the risk of your tool managing to screw up the data.
> Even if the order of the mbox's messages are ~random, Restic's delta updates will forego large attachments.
I forget the exact number, but the rolling hashes for Restic and Borg are tuned to produce chunks sizes on the order of an entire megabyte.
Which means attachment file sizes need to be many megabytes in order for Restic to be much use, since the full chunk has to fall within the attachment. — You'd lose 0.5MB at both ends of each attachment on average, so a 5MB file would only be 80% deduped.
Nothing against Restic, but it's tuned for file-level backup, and I'm sure it wouldn't be as performant if it used chunks that were small enough to pick apart individual e-mails.
I suggested the author check out ZPAQ, which has a user-tunable average fragment size, and is arguably even simpler than Restic.
The ZPAQ file can then itself be efficiently backed up by Restic.
Does takeout include any metadata not accessible via IMAP? Does it even include labels?
The underlying data model is kind of OK though: messages are immutable, they get a long lived unique ID that survives changes to labels, etc. There is a history of changes you can grab incrementally. You can download the full message body that you can then parse as mail, and I save each one into a separate file and then index them in SQLite.
I tried using takeout to have a more accurate listing. I thought I could open it with thunderbird, I failed, I then tried to open it with some python lib, also failed.
5GB of storage sounds not so bad.
I don't know how large is a single mail without image or attachment, though
For the email accounts I want a backup, I set it to spew out POP3 without doing anything (don’t mark read or delete). I set up Thunderbird with that POP3. It has a backup copy of all the emails. I’ve had searchable emails since like 2004/2005, and I’ve occasionally replied to people and gotten back in touch with very old friends from the Internet.
I saw an open-source tool sometime back (I think, here on Hacker News) that backs up your IMAP mails with a nicely done interface. That would be nice to have.
Edit: Perhaps Bichon,[1] mentioned somewhere in the other comment threads[2] was the one.
1. https://github.com/rustmailer/bichon
2. https://news.ycombinator.com/item?id=46429250
An alternative idea is to properly parse it and store in smaller mbox files, such as one file month, with the idea that any month in the past usually will not change. (And if it changes because they are storing frequently changing attributes in a faux header, like an atime, then maybe strip that header.) Then your incremental `restic` backups work fine, and you can also use it easily with a variety of mail programs (MUAs, impromptu IMAP servers for migration, quick text editor, etc.).
If at some point in the future you need to prove that you received a given message, having the signatures (eg: DKIM) intact makes all the difference.
https://github.com/gaubert/gmvault
There's no reason to go through Takeout when IMAP exists.
This seems to be the easiest and most straightforward way for me.
[0] https://github.com/GAM-team/got-your-back
Apply a label to emails dated after the last backup, using an “after:YYYY-MM-DD” search. Takeout then offers the option to export only that label. I do an annual backup so the amount of manual effort here is acceptable.
The resulting artifact is a single .zpaq file on disk. This file is only ever appended to, never overwritten, so it plays nice with Restic's own chunked deduplication. Plus it won't flood the filesystem and it suffers less small files overhead than TFA's solution.
Granted I suspect TFA splitting on the e-mail headers may be chunking more efficiently. Though, unless I skimmed the linked GitHub too fast, it looks like TFA's solution also doesn't use any solid compression to exploit redundancy across chunks. And I trust zpaq as a general purpose tool more than a one-off just for a single use case. The code does look clean, though, nice work.
[0] Average fragment size is 1024*2^N. If the most of the data is attachments that don't change, you can probably use a higher `-fragment N` to have less overhead keeping track of hashes. `-method 3` is a good middle ground for backups. `-m5` gets crazy high compression ratios, but also crazy slow speed. Old versions of ingested files are shadowed by default; use `-all` when you want to list/extract them.
0: https://imapsync.lamiral.info/FAQ.d/FAQ.Gmail.txt
[0] https://github.com/jstedfast/gmime
[1] https://github.com/jstedfast/MimeKit