ML on Apple ][+
Posted3 months agoActive3 months ago
mdcramer.github.ioTechstory
calmpositive
Debate
40/100
Machine LearningRetro ComputingApple Ii
Key topics
Machine Learning
Retro Computing
Apple Ii
The article implements k-means clustering on an Apple II+, sparking nostalgia and discussion about the feasibility and relevance of running machine learning algorithms on vintage hardware.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2h
Peak period
22
0-12h
Avg / period
7.5
Comment distribution30 data points
Loading chart...
Based on 30 loaded comments
Key moments
- 01Story posted
Sep 29, 2025 at 12:12 PM EDT
3 months ago
Step 01 - 02First comment
Sep 29, 2025 at 1:56 PM EDT
2h after posting
Step 02 - 03Peak activity
22 comments in 0-12h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 6, 2025 at 11:13 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45415510Type: storyLast synced: 11/20/2025, 5:57:30 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Diffusion, back propagation, attention, to name a few.
Back prop requires and limits to analytically differentiable in a normal way.
Attention is... Oh dear comparing linear regression to attention is comparing a diesel jet engine to a horse.
(I mean, the pictures look cool and all.)
IE, did the author want to experiment with older forms of basic; or were they trying to learn more about old computers?
Looking back, I want to say it was probably the July, 1992 issue of Scientific American that inspired me to write that ( https://www.geos.ed.ac.uk/~mscgis/12-13/s1100074/Holland.pdf ) . And as that was '92, this might have been on a Mac rather than an Apple ][+... it was certainly in Pascal (my first class in C was in August '92) and I had access to both at the time (I don't think it was turbo pascal on a PC as this was a summer thing and I didn't have a IBM PC at home at the time). Alas, I remember more about the specifics of the program than I do about what desk I was sitting at.
That's when I learned a very important principal. "When something needs doing quickly, don't force artificial constraints on yourself"
I could have spent three days figuring out how to deal with the memory constraints. But instead I just cut the data in half and gave it two runs. The quick solution was the one that was needed. Kind of an important memory for me that I have thought about quite a bit in the last 30+ years.
https://codeberg.org/DATurner/miranda
[1]: https://en.wikipedia.org/wiki/PDP-10
[2]: https://en.wikipedia.org/wiki/VAX
> The final accuracy is 90% because 1 of the 10 observations is on the incorrect side of the decision boundary.
Who is using K-means for classification? If you have labels, then a supervised algorithm seems like a more appropriate choice.
> K-means clustering is a recursive algorithm
It is?
> If we know that the distributions are Gaussian, which is very frequently the case in machine learning
It is?
> we can employ a more powerful algorithm: Expectation Maximization (EM)
K-means is already an instance of the EM algorithm.
> K-means clustering is a recursive algorithm My bad. It's iterative. I'll fix that. Thanks.
> If we know that the distributions are Gaussian, which is very frequently the case in machine learning Gaussian distributions are very frequent and important in machine learning because of the Central Limit Theorem but, beyond that, you are correct. While many natural phenomena are approximately normal, the reason for the Gaussian's frequent use is often mathematical mathematical convenience. I'll correct my post.
> we can employ a more powerful algorithm: Expectation Maximization (EM) Excellent point. I will fix that, too. "While k-means is simple, it does not take advantage of our knowledge of the Gaussian nature of the data. If we know that the distributions are at least approximately Gaussian, which is frequently the case, we can employ a more powerful application of the Expectation Maximization (EM) framework (k-means is a specific implementation of centroid-based clustering that uses an iterative approach similar to EM with 'hard' clustering) that takes advantage of this." Thank you for pointing out all of this!
[0]: https://proceedings.mlr.press/v170/marx22a/marx22a.pdf
And if it ever became too slow, you could reimplement the slow part in 6502 assembler, which has its own elegance. Great way to learn, glad I came up that way.