Perkeep – Personal storage system for life
Mood
thoughtful
Sentiment
positive
Category
tech
Key topics
personal data storage
open-source
data management
Perkeep is a personal storage system for life that allows users to store and manage their data in a secure and decentralized manner. The project has garnered significant interest and discussion on HN.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
49m
Peak period
68
Day 1
Avg / period
36.5
Based on 73 loaded comments
Key moments
- 01Story posted
11/12/2025, 3:34:32 AM
7d ago
Step 01 - 02First comment
11/12/2025, 4:23:53 AM
49m after posting
Step 02 - 03Peak activity
68 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
11/13/2025, 3:26:54 PM
5d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
I’ve never looked at it before but this seems pretty solid, definitely worth keeping an eye on or testing.
But it really is nearly abandoned, and outside of the happy-path the primary author uses it for, it's desolate. There is no community around growing its usage, and pull requests have sat around for months before the maintainer replies. Which is fine if that's what the author wants (he's quite busy!), but disappointing to potential adopters. I've looked at using it, but with data types that sit outside the author's use case, and you'd really need to fork it and change code all over the repo to effectively use it. It just never hit the ideal of "store everything" it promises when it has hard-coded data types for indexing and system support.
(and yes, I did look at forking it and creating my own indexer, but some things just aren't meant to be)
I just added support for perkeep in Filestash last week (https://github.com/mickael-kerjean/filestash)
A permanent private data store needs to have straightforward ways to get that data into it, and then search and consume it again once there.
Kinda sad, as this looks interesting.
How is this better then a filesystem with automated replication?
If I take a bunch of photos, those don’t have filenames (or not good ones, and not unique). They just exist. They don’t need a directory or a name.
So how are you supposed to find anything? Sure, I take photos. Most of them aren't needed after they serve their immediate purpose, but I can't be bothered to delete them, or sort or name the ones that do have a longer purpose. But at least they are organized automatically by date. For permanence, OwnCloud archives them for me automatically, from where they get sucked into my regular backups.
Why would I want to toss them all into an even less-organized pile?
[run] search queries over my higher-level objects. e.g. a “recent” directory of recent photos
How, exactly, are those search queries supposed to work? Sure, maybe date is retained in meta-info, but at best he is regaining the functionality he lost by tossing those pictures into a pile. If he is expecting actual image recognition, that could work anyway, without the pile.
It would be nice if we were a bit more in control. At least, it would be nice if we had a reliable backup of all our content. Once we have all our content, it’s then nice to search it, view it, and directly serve it or share it out to others
Sure, and that's exactly what you achieve with OwnCloud (or NextCloud, or whatever).
As for reliable backups, that's a completely different issue, which still has to be solved separately. You have got to periodically copy your data to offline storage, or you don't have real backups.
Seriously, I'm just not seeing it...
They don't mean the photos can't have names. They just observe that usually in-camera photos don't have particularly useful names like IMG_4321.JPG, same as all the other IMG_4321.JPGs that your camera has and will produce if it sees enough use.
Also that storage doesn't address a blob (or photo) by its name. But by hash / digest. You are welcome to store photo metadata with the hashes and perhaps even a good name if you care for one, in a database, on web pages, or whatever you use - if that makes it easier for you to retrieve the right photo. Probably you should.
Content object storage and retrieval (cumbersome objects) is then separate from issues of remembering what is what (small data).
Concretely, you could search by metadata (timestamp, geotag, which camera device, etc) or by content (all photos of Joe, or photos with Eiffel tower in the background, or boating at dusk...). For the latter, you just need to process your corpus with a vision language model (VLM) and generate embeddings. Btw, this is not outlandish; there are already photos apps with this such capability if you search a bit online.
At least all the photos I take have a date and place attached to them. That is usually all the info I need to find them.
Although, to be fair, running it in Docker seems less fraught and breaks less often (and it's a lot easier to restart when it does break.)
(I've got a pipeline from Instapaper -> {IFTTT -> {Pinboard -> Linkhut, Dropbox, Webhook -> ArchiveBox}} which works well most of the time for archiving random pages. Used to be Pocket until Mozilla decided to be evil.)
Will try karakeep.
Edit: What's the best way to support the project? I'm seeing there's an option through the Mozilla store and through GitHub. Is there's a preference?
I would look into what happened with the single file copies you made that didn't work because that is highly unusual.
I've been using an extension called WebScrapBook to locally save copies of interesting webpages. I use the basic functionality, but it comes with tons of options and settings.
For pages with dynamic content that can't be trivially reproduced by their HTTP streams— E.G., opening the archive triggers GETs with a mismatched timestamp, even if the file it's looking for is in the WARC under a different URI— There's always SingleFile [1], and Chromium's built-in MHTML Ctrl+S export, which "bake" the content into a static page.
0: https://chromewebstore.google.com/detail/webrecorder-archive...
Keep Your Stuff, for Life - https://news.ycombinator.com/item?id=23676350 - June 2020 (109 comments)
Perkeep: personal storage system for life - https://news.ycombinator.com/item?id=18008240 - Sept 2018 (62 comments)
Perkeep – Open-source data modeling, storing, search, sharing and synchronizing - https://news.ycombinator.com/item?id=15928685 - Dec 2017 (105 comments)
However, I regret this decision. Git-Annex is not usable anymore on my data because the amount of files has grown so much (millions) and Git-Annex is just too slow (it takes minutes up to even hours for some Git operation, and the FS is decently fast). I assume I would not have had those problems with Perkeep.
I've been thinking about building a similar application for a while now, and you gave me some great ideas.
Will try it out today.
This is strange in the sense that (a) didn't stop the Linux kernel from becoming more popular - if the tool satisfies the itch, use it, otherwise not. And the lack of releases could be fine if the bugs reported are minor.
Is the tool robust (no data loss)?
What has other folks on here stopped from e.g. writing more importers (if that is the main shortcoming)?
edit: typo corrected
I think this is a strange comparison. "I'm going to use this system to store all my digital stuff, and it's 1991" is altogether different from "I'm going to use this system to store all my digital stuff, and it's 2025".
Beyond that though, I'm thinking this would be nice for syncing state for a cross-platform app that features multiple incarnations anywhere being in sync to a decent extent. Just need to create a PerKeep client library for the language it's in (Python).
Indeed, big fan of the idea of Perkeep, and its authors (I learned a lot about writing network code in Go from reading from Brad Fitzpatrick's contributions.)
Where Perkeep uses a super cool blob server design that abstracts the underlying storage, Timelinize keeps things simpler by just using regular files on disk and a sqlite DB for the index (and to contain small text items, so as not to litter your file system).
Perkeep's storage architecture is probably more well thought-out. Timelinize's is still developing, but I think in principle I prefer to keep it simple.
I'm also hoping that, with time, Timelinize will be more accessible to a broader, less-technical audience.
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.