Z80-μLM

12 days ago

1 reply

Then I can afford eggs, ram and a studio appartment!

lacoolj

12 days ago

1 reply

Maybe in Ohio

fuzzfactor

12 days ago

No apartment then, maybe just green, eggs, and RAM.

wewewedxfgdf

12 days ago

Maybe the rich can but not all retro computer enthusiasts are rich.

charcircuit

12 days ago

If you can afford to spend a few dollars without sacrificing housing or food, you are being financial irresponsible.

StilesCrisis

12 days ago

thats-the-joke.gif

wickedsight

12 days ago

1 reply

I just removed 128 megs of RAM from an old computer and am considering listing it on eBay to pay off my mortgage.

https://i.imgur.com/6TRe1NE.png

12 days ago

I wonder what year past 128M ram would pay off mortgage. Maybe 1985

vedmakk

12 days ago

2 replies

If one would train an actual secret (e.g. a passphrase) into such a model, that a user would need to guess by asking the right questions. Could this secret be easily reverse engineered / inferred by having access to models weights - or would it be safe to assume that one could only get to the secret by asking the right questions?

ronsor

12 days ago

> this secret be easily reverse engineered / inferred by having access to models weights

It could with a network this small. More generally this falls under "interpretability."

Kiboneu

12 days ago

I don’t know, but your question reminds me of this paper which seems to address it on a lower level: https://arxiv.org/abs/2204.06974

“Planting Undetectable Backdoors in Machine Learning Models”

“ … On the surface, such a backdoored classifier behaves normally, but in reality, the learner maintains a mechanism for changing the classification of any input, with only a slight perturbation. Importantly, without the appropriate "backdoor key", the mechanism is hidden and cannot be detected by any computationally-bounded observer. We demonstrate two frameworks for planting undetectable backdoors, with incomparable guarantees. …”

nineteen999

12 days ago

3 replies

This couldn't be more perfectly timed .. I have an Unreal Engine game with both VT100 terminals (for running coding agents) and Z80 emulators, and a serial bridge that allows coding agents to program the CP/M machines:

Thank you for posting!

quesomaster9000

12 days ago

3 replies

Oh dear, it seems we've... somehow been psychically linked...

I developed a browser-based CP/M emulator & IDE: https://lockboot.github.io/desktop/

I was going to post that instead, but wanted a 'cool demo' instead, and fell down the rabbit hole.

jaak

12 days ago

1 reply

I've been playing the Z80-μLM demos in your CP/M emulator. Works great! However, I have yet to guess the correct answer in GUESS.COM! I'm not sure if I'm just not asking the right questions or I'm just really bad at it!

quesomaster9000

12 days ago

1 reply

Don't tell anybody, but you sit on it

sailfast

12 days ago

[delayed]

nineteen999

11 days ago

Haha I love it. Just imagine if instead of DOS-based Windows, a CP/M based alternative evolved and took over the PC industry. Nice one!

stevekemp

12 days ago

That is beautiful.

I wrote a console-based emulator, and a simple CP/M text-adventure game somewhat recently

https://github.com/skx/cpmulator/

At some point I should rework my examples/samples to become a decent test-suite for CP/M emulators. There are so many subtle differences out there.

It seems I could even upload a zipfile of my game, but the escape-codes for clearing the screen don't work, sadly:

https://github.com/skx/lighthouse-of-doom

sixtyj

12 days ago

1 reply

Connections: Alternative History of Technology by James Burke documents these "coincidences".

TeMPOraL

12 days ago

2 replies

Those "coincidences" in Connections are really no coincidence at all, but path dependence. Breakthrough advance A is impossible or useless without prerequisites B and C and economic conditions D, but once B and C and D are in place, A becomes obvious next step.

embedding-shape

12 days ago

Some of those really are coincidences, like "Person A couldn't find their left shoe and ended up in London at a coffee house, where Person B accidentally ended up when their carriage hit a wall, which lead to them eventually coming up with Invention C" for example.

Although from what I remember from the TV show, most of what he investigates/talks about is indeed path dependence in one way or another, although not everything was like that.

sixtyj

12 days ago

That’s why I’ve put the word in parentheses :)

simonjgreen

12 days ago

1 reply

Super intrigued but annoyingly I can’t view imgur here

abanana

12 days ago

2 replies

Indeed, part of me wants to not use imgur because we can't access it, but a bigger part of me fully supports imgur's decision to give the middle finger to the UK after our government's censorship overreach.

homebrewer

12 days ago

1 reply

It blocks many more countries than just the UK because it's the lowest effort way of fighting "AI" scrapers.

imgur was created as a sort of protest against how terrible most image hosting platforms were back then, went down the drain several years later, and it's now just like they were.

supern0va

12 days ago

It turns out that running free common internet infrastructure at scale is both hard and expensive, unfortunately. What we really need is a non-profit to run something like imgur.

wizzwizz4

12 days ago

It was a really clever move on Imgur's part. Their blocking the UK has nothing to do with the Online Safety Act: it's a response to potential prosecution under the Data Protection Act, for Imgur's (alleged) unlawful use of children's personal data. By blocking the UK and not clearly stating why, people assume they're taking a principled stand about a different issue entirely, so what should be a scandal is transmuted into positive press.

Dwedit

12 days ago

1 reply

In before AI companies buy up all the Z80s and raise the prices to new heights.

nubinetwork

12 days ago

1 reply

Too late, they stopped being available last year.

whobre

12 days ago

Kind of. There’s still eZ80

pdyc

12 days ago

1 reply

interesting, i am wondering how far can it go if we remove some of these limitations but try to solve some extremely specific problem like generating regex based on user input? i know small models(270M range) can do that but can it be done in say < 10MB range?

Waterluvian

12 days ago

1 reply

Generate an LLM that is designed to solve one extremely specific problem: answering the ultimate question of life, the universe, and everything.

Even with modern supercomputing the computation would be outpaced by the heat death of the universe, so token output must be limited to a single integer.

12 days ago

00101010

a_t48

12 days ago

2 replies

Nice - that will fit on a Gameboy cartridge, though bank switching might make it super terrible to run. Each bank is only 16k. You can have a bunch of them, but you can only access one bank at a time (well, technically two - bank 0 is IIRC always accessible).

ColonelPhantom

11 days ago

Each layer of the LM is also at most 16 KiB, so if you want to minimize bank switching, I think making sure each layer is in one bank would be enough? Bank switching shouldn't give much overhead anyway unless it complicates an inner loop, which would be avoided if no layers are split across banks.

ant6n

12 days ago

You have 32KB of ROM, plus 8 Kb of ram on original game boy. Game boy color has more. Bank switching is super fast, as well. Given that models are likely streamed, I doubt the bank switching is a problem.

Biggest pain point is likely the text input.

magicalhippo

12 days ago

1 reply

As far as I know, the last layer is very quantization-sensitive, and is typically not quantized, or quantized lightly.

Have you experimented with having it less quantized, and evaluated the quality drop?

Regardless, very cool project.

kouteiheika

12 days ago

(Not OP)

It depends on the model, but from my experiments (quantizing one layer of a model to 2-bit and then training the model with that layer in 2-bit to fix the damage) the first layer is the most sensitive, and yes, the last layer is also sensitive too. The middle layers take the best to quantization.

Different components of a layer also have a different sensitivity; e.g. the MLP downscale block damages the model the most when quantized, while quantizing the Q projection in self attention damages the model the least.

vatary

12 days ago

1 reply

It's pretty obvious this is just a stress test for compressing and running LLMs. It doesn't have much practical use right now, but it shows us that IoT devices are gonna have built-in LLMs really soon. It's a huge leap in intelligence—kind of like the jump from apes to humans. That is seriously cool.

acosmism

12 days ago

[delayed]

Peteragain

12 days ago

1 reply

There are two things happening here. A really small LLM mechanism which is useful for thinking about how the big ones work, and a reference to the well known phenomenon, commonly dismissively referred to as a "trick", in which humans want to believe. We work hard to account for what our conversational partner says. Language in use is a collective cultural construct. By this view the real question is how and why we humans understand an utterance in a particular way. Just sayin'.

12 days ago

1 reply

MAYBE

cwmoore

12 days ago

2 replies

Universally correct reply, although honestly a bit vague.

Peteragain

11 days ago

Fair. The background reading is the EMCA stuff - conversation analysis cf Sacks etc at, and Ethnomethods (Garfunkel). And Vygotsky cf Kozulin. People such as Robert Moore at IBM and Lemon at Herriot-Watt work in this space but there is no critical mass in the face of LLM mania.

Peteragain

11 days ago

And the Chomskybot analysis is quite enlightening..

rahen

12 days ago

1 reply

I love it, instant Github star. I wrote an MLP in Fortran IV for a punched card machine from the sixties (https://github.com/dbrll/Xortran), so this really speaks to me.

The interaction is surprisingly good despite the lack of attention mechanism and the limitation of the "context" to the past three characters.

This could have worked on 60s-era hardware and would have completely changed the world (and science fiction) back then.

noosphr

12 days ago

1 reply

Stuff like this is fascinating. Truly the road not taken.

Tin foil hat on: i think that a huge part of the major buyout of ram from AI companies is to keep people from realising that we are essentially at the home computer revolution stage of llms. I have a 1tb ram machine which with custom agents outperforms all the proprietary models. It's private, secure and won't let me be motetized.

Zacharias030

12 days ago

how so? sound like you are running Kimi K2 / GLM? What agents do you give it and how do you handle web search and computer use well?

12 days ago

1 reply

We should show this every time a Slack/Teams/Jira engineer tries to explain to us why a text chat needs 1.5GB of ram to start up.

12 days ago

3 replies

> It won't write your emails, but it can be trained to play a stripped down version of 20 Questions, and is sometimes able to maintain the illusion of having simple but terse conversations with a distinct personality.

You can buy a kid’s toy that plays 20 questions.

Slack handles video calls and can render anything a web browser can, and it runs an entire App Store of apps.

Including Jira in the conversation doesn’t even make logical sense. Jira has such a wide scope that the word “Jira” doesn’t even describe a single product.

messe

12 days ago

1 reply

> can render anything a web browser can

That's a bug not a feature, and strongly coupled to the root cause for slack's bloat.

12 days ago

1 reply

One person’s “bloat” is another person’s “critical business feature.”

The app ecosystem of Slack is largely responsible for its success.

spopejoy

11 days ago

> app ecosystem of Slack is largely responsible for its success.

Is that true? Slack was one of the first private chats that was not painful to use, circa 2015. I personally hate the integrations and wish they'd just fix the bugs in their core product.

12 days ago

2 replies

My Pentium 3 in 2005 could do chat and video calls and play chess and send silly emotes. There is no conceivable user-facing reason why in 20 years the same functionality takes 30× as many resources, only developer-facing reasons. But those are not valid reasons for a professional. If a bridge engineer claims he now needs 30× as much concrete to build the same bridge as he did 20 years ago, and the reason is his/her own conveinence, that would not fly.

12 days ago

2 replies

> If a bridge engineer claims he now needs 30× as much concrete to build the same bridge as he did 20 years ago, and the reason is his/her own conveinence, that would not fly.

By itself, I would agree.

However, in this metaphor, concrete got 15x cheaper in the same timeframe. Not enough to fully compensate for the difference, but enough that a whole generation are now used to much larger edifices.

12 days ago

2 replies

So it means you could save your client 93% of their money in concrete, but you choose to make it 2× more expensive! That only makes my metaphor stronger ahaha.

12 days ago

You could save 93% of the money in concrete, at the cost of ??? in the more-expensive-than-ever time of the engineer themselves who now dominates the sticker price.

(At this point the analogy breaks down because who pays for the software being slower is the users' time, not the taxes paid by a government buying a bridge from a civil engineer…)

beagle3

12 days ago

But also, there is more traffic on the bridge.

The word processors of 30 years ago often had limits like “50k chapters” and required “master documents” for anything larger. Lotus 123 had much fewer columns or rows than modern excel.

Not an excuse, of course, but the older tools are not usable anymore if you have modern expectations.

kiicia

11 days ago

But it only shows how wasteful your new bridge is. Concrete being cheaper does not mean you somehow need to use more of it.

12 days ago

1 reply

I have great doubts that you were doing simultaneous screen sharing from multiple participants plus HD video group calls on your Pentium 3.

12 days ago

2 replies

You're grasping at anything to justify the unjustifiable. Not only did I do most (not all, obviously) of those things in my Pentium 3, including video and voice chat, screenshare, and silly animated gifs and rich text formatting, but also: that's beside the point. Let's compare like with like then, how much memory does it take to have a group chat with a few people and do a voice chat in MSN messenger or the original Skype, and how much does Slack or Teams take? What about UI stutter? Load time? There's absolutely no justification for a worse user experience in a 2025 computer that would be a borderline supercomputer in 2005.

9 days ago

Again you haven’t actually answered the question here. There wasn’t an application that compares to what Slack does at that time.

You bring apps like Skype doing more in 2005, but Skype was barely out of its public alpha by then.

And you keep brining up things that are bad about Slack that are basically non-existent boogeymen. UI stutter and memory and load time, I can’t think of any time any of these things have impacted my experience on Slack. And you really believe the original Skype app didn’t have a start up time?

MSN Messenger and the original Skype didn’t actually do the things that Slack does now. I mean specifically multiple simultaneous screen shares plus annotations plus HD video feeds (with important features like blurred and replaced backgrounds, added by Skype in 2019) for all participants plus running an entire productivity app in the background at the same time.

11 days ago

This whole argument is so tired and just “old nerd shouts at clouds.”

The latency and stuttering and crashing and buffering and hard drives seizing and malware of the past has been erased from the rose tinted nostalgic memories of the past.

Memory is a game of telephone with itself, and I don’t trust your recollection.

Nobody cares that Slack uses RAM or whatnot. It performs well and actions respond quickly enough. Much quicker than a lot of its competition: Slack huddles are an extremely slick experience.

12 days ago

1 reply

> Slack/Teams handles company-wide video calls and can render anything a web browser can, and they run an entire App Store of apps, all from a cross-platform application.

The 4th Gen iPod touch had 256 meg of RAM and also did those things, with video calling via FaceTime. Well, except "cross platform", what with it being the platform.

12 days ago

3 replies

Group FaceTime didn’t exist at the time.

12 days ago

1 reply

> It’s an extremely powerful application when you really step back and think about it. It just looks like “text” and boring business software.

The entire operating system of the phone is more powerful, and ran on less.

11 days ago

1 reply

Why don’t you just go ahead and tell me what specs you think Slack should run on and link me to an example program that has 100% feature parity that stays within those specs?

Showing me that a proof of concept black and white <10FPS group video call with no other accompanying software was possible in the 90s is pointless.

I’d also like you to show my a laptop SKU sold in the last 10 years that is incapable of running Slack.

Finally, I’ll remind you that Slack for mobile is a different application that isn’t running in the same way as the desktop app. The latest version of it will run on very old phone hardware, going all the way back to the iPhone 8 (2GB RAM), and that’s assuming you even need the latest version for it to function.

6d ago

> Why don’t you just go ahead and tell me what specs you think Slack should run on

1 Ghz processor, 512 MB RAM (might even manage 256 MB), 1080p monitor, and "a graphics accelerator" and "a sound card".

> and link me to an example program that has 100% feature parity that stays within those specs?

Windows 2000. Or XP.

That's the point. The OS supports all the apps needed to do whatever.

Making Slack into a monolithic blob to do all is just an example of the inner platform effect.

But if you insist: IE 7 would have been able to do all this. It's an app. It's also an example of the inner platform effect.

> Showing me a black and white <10FPS group video call with no other accompanying software running simultaneously in the 90s is pointless.

You should've thought of that before trying to "well akshually" me about which versions of FaceTime support multi-user video calling.

You want video calling? We had that 30 years ago on systems with total RAM smaller than current CPU cache, with internal busses whose bandwidth was less than your mobile's 5G signal, on screens smaller than the icon that has to be submitted to the App Store, with cameras roughly comparable to what we now use for optical mice, running over networks that were MacGyvered onto physical circuits intended for a single analogue voice signal.

Out of everything you list that Slack can do, the only thing that should even be remotely taxing is the video calling. Nothing else, at all. And the only reasons for even that to be taxing is correctly offloading work to the GPU and that you want HD.

fc417fc802

11 days ago

1 reply

If these applications only hogged memory when under stress (outgoing screencap plus video, multiple streams incoming, display to 3+ monitors) you might have a point. But that's not the case so you don't.

Meanwhile I can play back multiple 1080 videos on different monitors, run a high speed curl download, saturate my gigabit LAN with a bulk transfer, and run a brrfs scrub in the background all most likely without breaking 2 GB RAM usage.

The only daily application I run that consumes a noticable quantity of resources is my web browser.