Z80-μLM
github.comKey Features
Tech Stack
Key Features
Tech Stack
It's just one-shot AI slop - literally, the prompt was 'make a web based version of [github url of this project]' and it spat this out. It appears to work fine.
I'll keep it up for a couple of months and then it'll be auto-deleted, no sense in keeping it around longer than that.
Speaking of - I remember my first digital camera (Fujitsu 1Mb resolution using SmartMedia)… it used so much power that you could take 20-30 photos and then needed to replace all 4 batteries lol
It could with a network this small. More generally this falls under "interpretability."
“Planting Undetectable Backdoors in Machine Learning Models”
“ … On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. …”
https://i.imgur.com/6TRe1NE.png
Thank you for posting!
I developed a browser-based CP/M emulator & IDE: https://lockboot.github.io/desktop/
I was going to post that instead, but wanted a 'cool demo' instead, and fell down the rabbit hole.
I wrote a console-based emulator, and a simple CP/M text-adventure game somewhat recently
https://github.com/skx/cpmulator/
At some point I should rework my examples/samples to become a decent test-suite for CP/M emulators. There are so many subtle differences out there.
It seems I could even upload a zipfile of my game, but the escape-codes for clearing the screen don't work, sadly:
Although from what I remember from the TV show, most of what he investigates/talks about is indeed path dependence in one way or another, although not everything was like that.
imgur was created as a sort of protest against how terrible most image hosting platforms were back then, went down the drain several years later, and it's now just like they were.
Even with modern supercomputing the computation would be outpaced by the heat death of the universe, so token output must be limited to a single integer.
Biggest pain point is likely the text input.
Have you experimented with having it less quantized, and evaluated the quality drop?
Regardless, very cool project.
It depends on the model, but from my experiments (quantizing one layer of a model to 2-bit and then training the model with that layer in 2-bit to fix the damage) the first layer is the most sensitive, and yes, the last layer is also sensitive too. The middle layers take the best to quantization.
Different components of a layer also have a different sensitivity; e.g. the MLP downscale block damages the model the most when quantized, while quantizing the Q projection in self attention damages the model the least.
The interaction is surprisingly good despite the lack of attention mechanism and the limitation of the "context" to the past three characters.
This could have worked on 60s-era hardware and would have completely changed the world (and science fiction) back then.
Tin foil hat on: i think that a huge part of the major buyout of ram from AI companies is to keep people from realising that we are essentially at the home computer revolution stage of llms. I have a 1tb ram machine which with custom agents outperforms all the proprietary models. It's private, secure and won't let me be motetized.
You can buy a kid’s toy that plays 20 questions.
Slack handles video calls and can render anything a web browser can, and it runs an entire App Store of apps.
Including Jira in the conversation doesn’t even make logical sense. Jira has such a wide scope that the word “Jira” doesn’t even describe a single product.
That's a bug not a feature, and strongly coupled to the root cause for slack's bloat.
The app ecosystem of Slack is largely responsible for its success.
Is that true? Slack was one of the first private chats that was not painful to use, circa 2015. I personally hate the integrations and wish they'd just fix the bugs in their core product.
By itself, I would agree.
However, in this metaphor, concrete got 15x cheaper in the same timeframe. Not enough to fully compensate for the difference, but enough that a whole generation are now used to much larger edifices.
(At this point the analogy breaks down because who pays for the software being slower is the users' time, not the taxes paid by a government buying a bridge from a civil engineer…)
The word processors of 30 years ago often had limits like “50k chapters” and required “master documents” for anything larger. Lotus 123 had much fewer columns or rows than modern excel.
Not an excuse, of course, but the older tools are not usable anymore if you have modern expectations.
You bring apps like Skype doing more in 2005, but Skype was barely out of its public alpha by then.
And you keep brining up things that are bad about Slack that are basically non-existent boogeymen. UI stutter and memory and load time, I can’t think of any time any of these things have impacted my experience on Slack. And you really believe the original Skype app didn’t have a start up time?
MSN Messenger and the original Skype didn’t actually do the things that Slack does now. I mean specifically multiple simultaneous screen shares plus annotations plus HD video feeds (with important features like blurred and replaced backgrounds, added by Skype in 2019) for all participants plus running an entire productivity app in the background at the same time.
The latency and stuttering and crashing and buffering and hard drives seizing and malware of the past has been erased from the rose tinted nostalgic memories of the past.
Memory is a game of telephone with itself, and I don’t trust your recollection.
Nobody cares that Slack uses RAM or whatnot. It performs well and actions respond quickly enough. Much quicker than a lot of its competition: Slack huddles are an extremely slick experience.
The 4th Gen iPod touch had 256 meg of RAM and also did those things, with video calling via FaceTime. Well, except "cross platform", what with it being the platform.
The entire operating system of the phone is more powerful, and ran on less.
Showing me that a proof of concept black and white <10FPS group video call with no other accompanying software was possible in the 90s is pointless.
I’d also like you to show my a laptop SKU sold in the last 10 years that is incapable of running Slack.
Finally, I’ll remind you that Slack for mobile is a different application that isn’t running in the same way as the desktop app. The latest version of it will run on very old phone hardware, going all the way back to the iPhone 8 (2GB RAM), and that’s assuming you even need the latest version for it to function.
1 Ghz processor, 512 MB RAM (might even manage 256 MB), 1080p monitor, and "a graphics accelerator" and "a sound card".
> and link me to an example program that has 100% feature parity that stays within those specs?
Windows 2000. Or XP.
That's the point. The OS supports all the apps needed to do whatever.
Making Slack into a monolithic blob to do all is just an example of the inner platform effect.
But if you insist: IE 7 would have been able to do all this. It's an app. It's also an example of the inner platform effect.
> Showing me a black and white <10FPS group video call with no other accompanying software running simultaneously in the 90s is pointless.
You should've thought of that before trying to "well akshually" me about which versions of FaceTime support multi-user video calling.
You want video calling? We had that 30 years ago on systems with total RAM smaller than current CPU cache, with internal busses whose bandwidth was less than your mobile's 5G signal, on screens smaller than the icon that has to be submitted to the App Store, with cameras roughly comparable to what we now use for optical mice, running over networks that were MacGyvered onto physical circuits intended for a single analogue voice signal.
Out of everything you list that Slack can do, the only thing that should even be remotely taxing is the video calling. Nothing else, at all. And the only reasons for even that to be taxing is correctly offloading work to the GPU and that you want HD.
Meanwhile I can play back multiple 1080 videos on different monitors, run a high speed curl download, saturate my gigabit LAN with a bulk transfer, and run a brrfs scrub in the background all most likely without breaking 2 GB RAM usage.
The only daily application I run that consumes a noticable quantity of resources is my web browser.
This argument is just so endless and tiring.
Saturating my bandwidth or running a btrfs scrub isn’t accomplishing the business logic I need to do my job, that’s what my web browser is doing.
People making excuses for poorly designed software is what's tiring.
This means that a directly translated 40 KB Z80 executable might be a tight squeeze on that mainframe, because 40K > 32K, counting words, not bytes. Of course if most of that size is just 2-bit weight data then it might not be so bad.
ELIZA running on later hardware would have been a different story, with the Z80 - released in 1976 - being an example.
Ultimately, if you can build an ultra tiny model that can talk and learn on the fly, you've just fully localized a personal assistant like Siri.
Not exactly "minimal viable", but a "what if RNNs where good for LLMs" case study.
-> insanely fast on CPUs
Edit: The fact this runs on a Smartphone means it is highly relevant. My only thing is, how do we give such a model an "unlimited" context window, so it can digest as much as it needs. I know some models know multiple languages, I wouldnt be surprised if sticking to only English would reduce the model size / need for more hardware and make it even smaller / tighter.
I doubt it would be able to make good use of a large context window, though.
Quake 3 is probably the last game where you would expect a chatbot, as there are few games where storytelling matters less and it is a very little known feature, but Quake 3 bots can react to what you say in the chat, in addition to the usual taunts.
But that's the thing, Quake 3 can do it because it is inconsequential, in a story-driven game like a RPG, NPCs have a well defined spot in the story and gameplay, they tell you exactly what you need to know, as to not disrupt the flow of the story. Tell you too much, and they will spoil the big reveal, tell you too little, and you don't know what to do, tell you irrelevant details and you get lost chasing them. It has to be concise and to the point, so that those who don't really care know what to do to advance the story, but with enough flavor to make the world alive. It is really hard to find the right balance, and if in addition, you have to incorporate a chatbot, it borders on impossible.
It looks like a good idea on the surface, but it most likely isn't, unless it is clearly not part of the main gameplay loop, as in Quake 3.
Some people had some success using a (big) LLM as a DM in D&D, which I think is easier since it can make up the story as it advances, it is much harder to make up game elements in a computer RPG that are not programmed in.
I tried on a cycle-accurate emulator of a TRS-80 Model I with Omikron CP/M mapper. Most Z-80 machines of the time were 4MHz, but the TRS-80 was only 1.77 MHz.
1. Type "GUESS", get question prompt.
2. User types: "Are you an animal?", ENTER key
3. Wait 25 seconds
4. Program prints "N"
5. Wait 20 seconds
6. Program prints "O"
7. Wait 23 seconds
8. Program prints linefeed, returns to question prompt
Total time to return 2-char answer to user's question: 1 min 9 sec or so. I bet a longer answer would take proportionally longer.
"The wonder isn't that it does it well, it's a wonder it does it at all."
I think I can do a little bit better; maybe 10% faster.
A web version would also be cool.
Not affiliated with Hacker News or Y Combinator. We simply enrich the public API with analytics.