How Can AI Id a Cat?
Posted5 months agoActive4 months ago
quantamagazine.orgResearchstoryHigh profile
calmpositive
Debate
20/100
Artificial IntelligenceComputer VisionMachine Learning
Key topics
Artificial Intelligence
Computer Vision
Machine Learning
The article explains how AI can identify cats in images, and the discussion revolves around the capabilities and limitations of current AI technology, as well as its potential applications and societal implications.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
3d
Peak period
54
Day 4
Avg / period
17.3
Comment distribution69 data points
Loading chart...
Based on 69 loaded comments
Key moments
- 01Story posted
Aug 20, 2025 at 2:36 PM EDT
5 months ago
Step 01 - 02First comment
Aug 23, 2025 at 5:42 PM EDT
3d after posting
Step 02 - 03Peak activity
54 comments in Day 4
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 3, 2025 at 7:49 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 44964800Type: storyLast synced: 11/20/2025, 5:39:21 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
I am struck by the conceptual framework of classification tasks so snappily rendering clear categories from such fuzziness.
https://cdn.aaai.org/IAAI/2004/IAAI04-019.pdf
It has 490 citations.
DARPA has a whole program named after it: https://www.darpa.mil/research/programs/explainable-artifici...
The real question is whether we can get some insight as to how exactly it's able to do this. For convolution neural networks it turns out that you can isolate and study the behavior of individual circuits and try to understand what "traditional image processing" function they perform, and that gives some decent intuition: https://distill.pub/2020/circuits/ - CNNs become less mysterious when you break them down as being decomposed into "edge detectors, curve detectors, shape classifiers, etc."
For LLMs it's a bit harder, but anthropic did some research in this vein.
Is this even something that's possible with current tech? Like, surely cats have some facial features that can be used to uniquely identify them? It would be cool to have a global database of all cats that users would be able to match their photos against. Imagine taking a picture of a cat you see on the street, and it immediately tells you the owner's details and whether it's missing.
Maybe not with you ;)
[1]: https://tanelpoder.com/posts/catbench-vector-search-query-th...
Tricks include facial alignment + cropping and very strong constraints on orientation to make sure you have a good frontal image (apps will give users photo alignment markers). Otherwise it's a standard visual seatch. Run a face extraction model to get the crop, warp to standard key points, compute the crop embedding, store in a database and do a nearest neighbour lookup.
There are a few startups doing this. Also look at PetFace which was a benchmark released a year or so ago. Not a huge amount of work in this area compared to humans, but it's of interest to people like cattle farmers as well.
https://github.com/mapooon/PetFace
Impressed that it can do as well as it does, I just find that amusing.
Metadata is 2015, photo is 1960
There would be hints that a photo is from the 60s vs 15s, a human would be able to tell in many cases even without other context.
That is exactly the use case AI is meant to excel at, something that is arguably hard to do algorithmically but is possible for an ML model
Anyway, it’s 40 years later and I just read this article and said, “Oh! Now I get it.” A little too late, for Dr. Hippe’s class.
That said, some people vehemently argue that it’s abus{e,ive} to let cats wander the neighborhood, so thank you for not trying to tell others what to do. It’s become so common that I’m braced for it every time this topic comes up.
If you want something mostly premade,go get an autoslide. If you want to do it completely from scratch:
1. RFID/bluetooth proximity is much easier to work with than camera + rpi + AI. For the usecase you are talking about, AI is not just overkill, but will make it actively harder to achieve your goal
2. Locking is pretty easy depending on motor mechanism - either a cheap relay'd magnetic lock, or simply a motor that can't be backdriven easily.
Motor wise, you can either use the rack and pinion style that autoslide does, or a simple linear motor if you don't want to deal with gear tracks.
Overall, i went the autoslide route and had it all set up and working in an hour or two.
It could have been. It did happen in some cases as computer vision didn't wait for neural networks (e.g. OCR). But to hijack a famous quote, "Neural networks are like violence - if it doesn't solve your problems, you are not using enough of it."
> A neuron with two inputs has three parameters. Two of them, called weights, determine how much each input affects the output. The third parameter, called the bias, determines the neuron’s overall preference for putting out 0 or 1.
So a neuron does very basic polynomial interpolation and by hooking them together you get polynomial regression. I don't know if it amusing or amazing that people use polynomial regression to write programs now.
The article glosses over activation functions, which - if non-polynomial - give the entire neural networks non-linearity. A major inflection point was proving that neural networks architectures even with very few layers (as small as one) can approximate any continuous function.
https://en.m.wikipedia.org/wiki/Universal_approximation_theo...
https://en.m.wikipedia.org/wiki/Classification_of_discontinu...
It's not all just math. Real people are what make this work.
[1] https://www.theverge.com/features/23764584/ai-artificial-int...
So, common ground with a lot of Hacker News audience?
Don't take me too seriously here, and not to excuse anything but what would these people be doing if they weren't data labeling? How would they be treated differently?
Presumably, they'd be working for some other multinational, because overall their quality of living is better than working at whatever other local industry exists?
The data labeling job itself strikes me as something dystopian. As if we're the work mules for our AI overlords.
That was not my intent. I probably worded my thoughts poorly. Indeed, though I am far more advantaged than those data laborers I’m feeling a bit exploited myself lately.
> Are you suggesting that if something is common across the globe then we shouldn't complain about it?
No. I guess the crux of my comment is what would they be doing for income otherwise? And which would they choose, given the fact they’re being exploited.
"I am, somehow, less interested in the weight and convolutions of Einstein's brain than in the near certainty that people of equal talent have lived and died in cotton fields and sweatshops"
I think you are giving too much credit to billion dollar companies that really just want to milk as much labor from poor countries as they can.
It doesn't follow that because I said you can find people without important work in low-GDP countries, that there is no important work there. If you've ever been in one though, there are always people whose job seems to be "watching a single goat" and would be much better off training AIs to identify cats.
Just like a college student working at McDs to get by, the same could apply here. Cost of living is not equal.
I'm not siding with either of you to be clear, just a different perspective. I feel both points are valid and without more information both are also irrefutable
My favorite work on digging into the models to explain this is Golden Gate Claude [0]. Basically, the folks at Anthropic went digging into the many-level, many-parameter model and found the neurons associated with the Golden Gate Bridge. Dialing it up to 11 made Claude bring up the bridge in response to literally everything.
I'm super curious to see how much of this "intuitive" model of neural networks can be backed out effectively, and what that does to how we use it.
[0] https://www.anthropic.com/news/golden-gate-claude
I'm not an expert on neural networks, but from what all I've heard, current systems can only be trained to be really good at doing the former.
I once used to have a tabby cat. When it ran away, I put up posters with a picture and description. I got several calls about cats in the neighbourhood that had the same tabby colour scheme (recognition). And from a distance they indeed looked the same. But close up, they each had a different eye colour, colour of the nose, or length of its white "socks" on its paws. (authentication)
To do the second step, the system would need to be trained not just on raw pixel data but also on which features to look for to distinguish one cat from another. I think that current system could be brute-forced to do this, somewhat, by training also on negative examples ... but I feel like that is suboptimal.
Seems extremely prescient…