Inaturalist Keeps Full Species Classification Models Private
Posted4 months agoActive4 months ago
github.comResearchstory
controversialmixed
Debate
80/100
Open DataMachine LearningBiodiversityInaturalist
Key topics
Open Data
Machine Learning
Biodiversity
Inaturalist
The iNaturalist project keeps its full species classification models private, sparking debate among the community about open data and intellectual property, while also discussing the project's benefits and potential improvements.
Snapshot generated from the HN discussion
Discussion Activity
Active discussionFirst comment
1s
Peak period
17
0-2h
Avg / period
5
Comment distribution25 data points
Loading chart...
Based on 25 loaded comments
Key moments
- 01Story posted
Sep 2, 2025 at 3:32 PM EDT
4 months ago
Step 01 - 02First comment
Sep 2, 2025 at 3:32 PM EDT
1s after posting
Step 02 - 03Peak activity
17 comments in 0-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 3, 2025 at 10:40 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45107939Type: storyLast synced: 11/20/2025, 2:38:27 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Shame! IMHO open data input should yield open data output. The community contribute far too much time, data, expertise and money to tolerate this kind of BS, which opens questions about fundamental compatibility with science.
iNaturalist should remove non-open data and commit to fully open output within a fixed period of time to maintain community support.
If they feel like keeping the models to themselves, I think it's a fair game. I give them observations, they gave me the id service for free. Maybe they even sell the models to fund their development efforts? I wouldn't mind... they need to fund their functions somehow anyway.
And remember, their observation databases are open. In fact my observations are automatically copied to the databases of a national biodiversity institution (which is open as well, except for some critical species).
Institutions need to maintain themselves and be able to pay their employees for them being able to feed their kids, etc.
That's probably not a sustainable situation.
IMHO funds received by well-run non-profits will be banked, not spent, therefore they yield ongoing returns which are used to meet costs and sustain the organization. The fund origin is immaterial.
Meanwhile, the whole idea of iNaturalist has evolved around voluntary reporting, community involvement, and open data, and I think some of that needs to stay. They can't turn fully commercial.
If in fact what you said is true about the sources of funding, it would then seem that the US taxpayers (the relevant party here) are footing a large part of the bill from direct and indirect subsidies. I feel that it can be reasonably argued that a non-profit organization that is benefiting from significant public subsidy should make their model available for public use.
Especially selling identification services, which is related to keeping the models private, would make sense. Museums and various kinds of biodiversity monitoring schemes need mass identification, and having AI there to partially replace people would be a cost saving for the researchers and potential funding for iNaturalist. Offering such a service for free is neither practical nor justified.
(Meanwhile, I can imagine there to be lots of naturalist who hate the idea of their services being partially replaced by AI. It may lower the quality but the cost margin between a human and an iNat model is really wide.)
I think EU had a plan on using AI identification in some of their monitoring schemes. It could have been iNaturalist or someone else, anyway it demonstrates the need.
They're a scientific 501(c)(3), not a FOSS 501(c)(3), right? It seems like their missions should be to support scientific progress, sometimes that means using data that is encumbered with IP baggage. It seems like it would be against their mission (and borderline a violation of tax law) to take a stance on IP law... that isn't what they do.
This aligns with the suggestion to commit to fully open data and fully open models.
Using scientific data that they can use to do science with but they can't share is 100% legit.
IMHO it's very hard to argue that something is in the public interest if the public can't see it, hold it, analyze it, criticize it, and replicate it: particularly in the field of science where we have a replication crisis.
If it's a black-box service, it's not science.
If it's replicable and open, thus provable, it's science.
There is no requirement that a 501(c)(3) post everything publicly.
I completely understand and agree that sharing science is a good thing... but it is also dumb to suggest that scientists must put their head in the sand and ignore data that just happens to be under copyright. And just because it is, doesn't mean that it can't be reviewed -- it means it can't be redistributed.
I mean, for heavens sake, every science textbook I ever read in school was encumbered by copyright. That doesn't mean we should burn science text books or that the data in them is subject to some replication crisis.
I think you're building a mountain out of a molehill here.
My desire was to combine something like iNaturalist with BirdWeather for a bird tracker of audio and visual. BirdWeather does make it free which is great, but there's no great free API of iNaturalist quality for diverse bird tracking.
That being said, I am certain that if iNaturaist made their model public, tons of competitive apps would spring up and it'd be commercialized regardless of license immediately and would take people away from iNaturalist without giving iNaturalist anything in return.
Plus I know iNaturalist has issues with that they don't want autolabeled data uploaded as matched. They only want manually labeled data, which opening the API I'm sure would flood their server with ML labeled data. Which on the one hand, could be useful, but also a ton of noise.
I'm in favor of whatever option is most in line with keeping a long term success of a free, high quality plant/animal identifying app out there, and I don't know enough to take a definitive stance on that, and unfortunately those that do, probably have a vested interest in one of the outcomes.
I wouldn't mind these groups keeping their models private except that their success sucks all the air out of the room when it comes to developing fully open models. The vast majority of users are satisfied with the app or API and so if you aren't you're going to be going it alone. (Of course a for-profit company could have the same effect, but it feels extra bad when it's a non-profit/government agency doing it.)