I Regret Building This $3000 Pi AI Cluster
Posted4 months agoActive3 months ago
jeffgeerling.comTechstoryHigh profile
skepticalmixed
Debate
80/100
Raspberry PiAI ClusterCost-Effectiveness
Key topics
Raspberry Pi
AI Cluster
Cost-Effectiveness
The author built a $3000 Pi AI cluster but regrets it due to its poor performance and cost-effectiveness, sparking a discussion on the practicality of such projects.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
14m
Peak period
148
Day 1
Avg / period
26.7
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 19, 2025 at 10:28 AM EDT
4 months ago
Step 01 - 02First comment
Sep 19, 2025 at 10:42 AM EDT
14m after posting
Step 02 - 03Peak activity
148 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 5, 2025 at 2:30 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45302065Type: storyLast synced: 11/20/2025, 8:14:16 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Was it fast? No. But that wasn't the point. I was learning about distributed computing.
If your server has a lot of idle time, ARM will always win.
One day my primary Raspberry Pi broke (turned out to be a PSU issue), and I thought of having an old laptop running 24/7 as a home server. While being not very power hungry, it’s still wants much more energy (plus it has fans). For a casual usage (I forgot to mention Pi-Hole) it feels like an overkill. So, while a Raspberry Pi isn’t the best, it has its niche, and I’m happy of having one (actually, a few).
My realization in ordering the Rock-2Fs is I really only need an MMU (that is, an SBC instead of something like an ESP32) when I'm running something with a graphical desktop, which is, outside my workstation, never (except for kiosks, which I use Android tablets for). -OR when I want to plug something into a bloated SBC board which saves me from having to solder a connector on, which is sometimes.
I use one for running a timelapse camera (camera is USB) while another is a portable mp3 player I can put in shirt pocket and which has aux port (tho its aux line is noisy). -So that's two of the four Rock-2F boards in use.... but it took me far less time to think up uses and deploy 25/25 of seeedstudio's ESP32C3 boards I ordered a couple years ago, and have used ~5/25 of the ESP32C6s I ordered early this year. They're so cheap, and use so much less energy than ARM boards, that it's difficult to justify using the SBCs anymore.
I think they're asking $50 for a base 2GB Pi4B, now -- that's 10 ESP32C3 boards (with integrated WiFi and BMS, btw!) -- and the Pi5 is even less competitive except in what I'd characterize as a very unusual scenario where you need high compute at edge (where it's both needed AND the latency of computing at the edge is lower than sending it to central server for processing), OR you need the security of protected memory, OR you have no central server and an ESP32 isn't going to cut it (I'll say, though, that one can run a thermostat with multiple WiFi-connected thermometers, and run a web server interface just fine.).
The current RPi 5 makes no sense to me in any configuration, given its pricing.
They're good for long as the development costs dominate the total costs.
If one just wants a cheap desktop box to do desktop things with, then they're a terrible option, price-wise, compared to things like used corpo mini-PCs.
But they're reasonably cost-competitive with other new (not used!) small computers that are tinkerer-friendly, and unlike many similar constructs there's a plethora of community-driven support for doing useful things with the unusual interfaces they expose.
Faith in the perfect efficiency of the free market only works out over the long term. In the short term we have a lot of habits that serve as heuristics for doing a good job most of the time.
For those like me that don't know the joke:
Two economists are walking down the street. One of them says “Look, there’s a twenty-dollar bill on the sidewalk!” The other economist says “No there’s not. If there was, someone would have picked it up already.”
... and even then it doesn't always prove true.
Competition is what creates efficiency. Without it you live in a lie.
Was it a learning experience?
More importantly, did you have some fun? Just a little? (=
Also no. The guy's a youtuber
On the other hand, will this make him 100+k views? Yes. It's bait - the perfect combo to attract both the AI crowd and the 'homelab' enthusiasts (of which the bulk are yet to find any use for their raspberry devices)...
Jeff has many useful OSS software used by many companies around the world daily - including mine. What have you created ?
https://www.youtube.com/c/JeffGeerling
"978K subscribers 527 videos"
Jeff's had a pattern of embellishing controversies, misrepresenting what people say, and using his platform to create narratives that benefit his content's engagement. This is yet another example of farming outrage to get clicks. I don't understand why people drool over his content so much.
I then used many of his ansible playbooks on my day to day job, which paid my bills and made my career progress.
I don't check youtube so I didn't know that he was an "youtuber", I do know his other side and how mucH I have leveraged his content/code in my career
https://www.jeffgeerling.com/projects
And the inference is that he is doing this for clicks, i.e. clickbait. The very title is disingenuous.
Your attack on the poster above you is childish.
Not that its a problem, I don't see why it would inherently be a negative thing. Dude seems to make some good content across a lot of different mediums. Cheers to Jeff.
Nothing that is not AGPL-licensed, so you and your company haven't taken advantage of it.
I am not sure how this relates to my comment though.
I would be pretty regretful of just the first sentence in the article, though:
> I ordered a set of 10 Compute Blades in April 2023 (two years ago), and they just arrived a few weeks ago.
That's rough.
Somehow I've actually gotten every item I backed shipped at some point (which is unexpected).
Hardware startups are _hard_, and after interacting with a number of them (usually one or two people with a neat idea in an underserved market), it seems like more than half fail before delivering their first retail product. Some at least make it through delivering prototypes/crowdfunded boards, but they're already in complete disarray by the end of the shipping/logistics nightmares.
And then, there's the sourcing problem. Components that looked like they were in big supply when the hardware was specced, can end up being in short supply, or worse end of lifed while you're trying to get all the firmware working.
It's most fun when you can prove the vendor's datasheet is lying about some pin or some function, but they still don't update it after a decade or more. So everyone integrating the chip who hasn't before hits the exact same speed bump!
I assumed this was a novelty, like building a RAID array out of floppy drives.
Unless you can keep your compute at 70% average utilization for 5 years - you will never save money purchasing your hardware compared to renting it.
$3,000 is well under many "oopsie billsies" from cloud providers.
And that's outside of the whole "I own it" side of the conversation, where things like latency, control, flexibility, & privacy are all compelling reasons to be willing to spend slightly more.
I still run quite a number of LLM services locally on hardware I bought mid-covid (right around 3k for a dual RTX3090 + 124gb system ram machine).
It's not that much more than you'd spend if you're building a gaming machine anyways, and the nifty thing about hardware I own is that it usually doesn't stop working at the 5 year mark. I have desktops from pre-2008 still running in my basement. 5 year amortization might have the cloud win, but the cloud stops winning long before most hardware dies. Just be careful about watts.
Personally - I don't think pi clusters really make much sense. I love them individually for certain things, and with a management plane like k8s, they're useful little devices to have around. But I definitely wouldn't plan to get good performance from 10 of them in a box. Much better off spending roughly the same money for a single large machine unless you're intentionally trying to learn.
If it's for personal use, do whatever... there's nothing wrong with buying a $60,000 sports car if you get a lot of enjoyment out of driving it. (you could also lease if you want to trade up to the "faster model" next year) For business, renting (and managed hosting) makes more sense.
If I spill something on my own hardware, the max out-of-pocket amount I lose is the amount I spent on that hardware.
If I run up an AWS/GCP/Azure bill accidentally... the max out-of-pocket amount I lose is often literally unbounded. Are there some guardrails you can put around this? Sure. But they're often confusing, misleading, delayed, or riddled with "holes" which they don't catch.
Ex - the literal best AWS offers you is delayed "billing alarms" which need to be manually enabled and configured, and even then don't cover all the services you might incur billing charges for.
It's not that "Oopsies" can't happen locally - it's that even if they do, I have a clear understanding of the potential costs by default, and they're much less intangible than "I left a thing running overnight and I now I owe AWS a new car worth of cash".
The worst case for a misconfigured bit of software locally is that my machine stalls and my services go down (ex - overloaded). The worst case for a misconfigured bit of software in AWS is literal bankruptcy.
Think about that for a minute.
Like, if you buy that card it can still be processing things for you a decade from now.
Or you can get 3 months of rental time.
---
And yes, there is definitely a point where renting makes more sense because the capital outlay becomes prohibitive, and you're not reasonably capable of consuming the full output of the hardware.
But the cloud is a huge cash cow for a reason... You're paying exorbitant prices to rent compared to the cost of ownership.
But also when it comes to Vast/RunPod it can be annoying and genuinely become more expensive if you have to rent 2x the number of hours because you constantly have to upload and download data, checkpoints, continuous storage costs, transfer data to another server because the GPU is no longer available, etc. It's just less of a headache if you have an always available GPU with a hard drive plugged into the machine and that's it
Plus cloud gaming is always limited in range of games, there are restrictions on how you can use the PC (like no modding and no swapping savegames in or out).
2) Hardware optimization (the exact GPU you want may not always be available for some providers)
3) Not subject to price changes
4) Not subject to sudden Terms of Use changes
5) Know exactly who is responsible if something isn't working.
6) Sense of pride and accomplishment + Heating in the winter
If your goal is to play with or learn on a cluster of Linux machines, the cost effective way to do it is to buy a desktop consumer CPU, install a hypervisor, and create a lot of VMs. It’s not as satisfying as plugging cables into different Raspberry Pi units and connecting them all together if that’s your thing, but once you’re in the terminal the desktop CPU, RAM, and flexibility of the system will be appreciated.
Makes me wonder if I should unplug more stuff when on vacation.
What's the margin on unplugging vs just powering off?
The EU (and maybe China?) have been regulating standby power consumption, so most of my appliances either have a physical off switch (usually as the only switch) or should have very low standby power draw.
I don't have the equipment to measure this myself.
Fuckin nutty how much juice those things tear through.
Rates have gone up enormously because the cost of wildfires is falling on ratepayers, not the utility owners.
Regulated monopolies are pretty great, aren’t they? Heads I win, tales you lose.
That said, I'm of the opinion that power/water/internet should all be state/county/city ran. I don't want my utilities companies to have profit motives.
My water company just got bought up by a huge water company conglomerate and, you guessed it, immediate rate increases.
If your local regulators approved the merger and higher rates, your complaint is with them as much as the utility company.
Not saying that some regulators are not basically rubber stamps or even corrupt.
I did (as did others), in fact, write in comments and complaints about the rate increases and buyout. That went unheard.
https://core.coop/my-cooperative/rates-and-regulations/rate-...
Still only $50/month, not $150, but I very much care about 100W loads doing no work.
That said, I am not sure those numbers are true. I am in California (PG&E with East Bay community generation), and my TOU rates are much lower than those.
Minimum Delivery Charge (what’s paid monthly, which is largely irrelevant, before annual true-up of NEM charges): $11.69/month
Actual charges, billed annually, per kWh:
Plus 3-20% extra (depending on the month) in “non-bypassable charges” (I haven’t figured out where these numbers come from), then a 7.5% local utility tax.Those rates do get a little lower in the winter (.30 to .48), and of course the very high rates benefit me when I generate more energy than I consume (which only happens when I’m on vacation). But the marginal all-in costs are just very high.
That’s NEM2 + TOU-EV2A, specifically.
$50/month for 100W continuous usage isn't totally mad, and that could climb even higher over the rest of the decade.
https://www.servethehome.com/lenovo-system-x3650-m5-workhors...
Also $150 for 100w is crazy, thats like $1.70 per kWh; it would cost about $150 a year at the (high) rates of southern Sweden.
Personally it’s cheaper to buy the hardware that does spend most of its time idling. Fast turnaround on very large private datasets being key.
In this case that 'new' is energy efficient software down to the individual lines of code and what their energy cost is on certain hardware. Academics are publishing about it in niche corners of the web and some entrepreneurs are doing it but of course none of this is cool now so we remain a mockery for our objectives. In time this too will become a real thing as many now are just beginning to feel the ever rising costs of energy which is only just starting to increase from decisions made years ago. The worst is yet to come as seen and heard directly from every single expert that has testified in the last years before the Energy and Commerce committee however only the outside-the-boxers among us watch such educational content to better prepare for tomorrow.
Electricity powers our world and nearly all take it for granted, time too will change this thinking.
:D
It also means it performs like a 10 year old server CPU, so those 28 threads are not exactly worth a lot. The geekbench results, for whatever value those are worth, are very mediocre in the context of anything remotely modern: https://browser.geekbench.com/processors/intel-xeon-e5-2690-...
Like a modern 12-thread 9600x runs absolute circles around it https://browser.geekbench.com/processors/amd-ryzen-5-9600x
The homelab group on Reddit is full of people who don't understand any of this - they have full racks in their house that could be replaced with one high-end desktop.
A lot of that group is making use of the IO capabilities of these systems to run lots of PCI-E devices & hard drives. There's not exactly a cost-effective modern equivalent for that. If there were cost-effective ways to do something like take a PCI-E 5.0 x2 and turn it into a PCI-E 3.0 x8 that'd be incredible, but there isn't really. So raw PCI-E lane count is significant if you want cheap networking gear or HBAs or whatever, and raw PCI-E lane count is $$$$ if you're buying new.
Also these old systems mean cheap RAM in large, large capacities. Like 128GB RAM to make ZFS or VMs purr is much cheaper to do on these used systems than anything modern.
Like if you have a large media library, you need to push maybe 10MB/s, you don't need 128GB of RAM to do that...
It's mostly just hardware porn - perhaps there are a few legit use cases for the old hardware, but they are exceedingly rare in my estimate.
For just streaming a 4k bluray you need more than 10MB/s, Ultra HD bluray tops out at 144 Mbit/s. Not to mention if that system is being hit by something else at the same time (backup jobs, etc...).
Is the 128GB of RAM just hardware porn? Eh, maybe, probably. But if you want 8+ bays for a decent sized NAS then you're already quickly into price points at which point these used servers are significantly cheaper, and 128GB of RAM adds very little to the cost so why not.
If anything, 2nd hand AMD gaming rigs make more sense than old servers. I say that as someone with always off r720xd at home due to noise and heat. It was fun when I bought it during winter years ago, until summer came.
A lot of business are paying obscene money to cloud providers when they could have a pair of racks and the staff to support it.
Unless you're paying attention to the bleeding edge of the server market, to its costs (better yet features and affordability) this sort of mistake is easy to make.
The article is by someone who does this sort of thing for fun, and views/attention, and im glad for it... it's fun to watch. But it's sad when this same sort of misunderstanding happens in professional settings, and it happens a lot.
For dedicated build boxes that crunch through lots of sources (whole distributions, AOSP) but do run seldomly, getting your hands on lots of Cores and RAM very cheaply can still trump buying newer CPUs with better perf/watt but higher cost.
I'm well aware of the costs of power and the lgostics of colocation, this is purely about how I'm more willing to spend $100-$200 for a toy than I am $1000-$2000.
Commodity desktop cpus with 32 or 64GB RAM can do all of this in a low-power and quiet way without a lot more expense.
The only problem in practice is that server CPUs don't support S3 suspend, so putting whole thing to sleep after finishing with it doesn't work.
That combo gives you the better part of a gigabyte of L3 cache and an aggregate memory bandwidth of 600 GB/s, while still below 1000W total running at full speed. Plus your NICs are the fancy kind that let you play around with RoCEv2 and such nifty stuff.
It would also be relevant to then also learn how to do stuff properly with SLURM and Warewulf etc. instead of a poor mans solution with Ansible playbooks like in these blog posts.
If the goal is a lot of RAM and you don’t care about noise, power, or heat then these can be an okay deal.
Don’t underestimate how far CPUs have come, though. That machine will be slower than AMD’s slowest entry-level CPU. Even an AMD 5800X will double its single core performance and even walk away from it on multithreaded tasks despite only having 8 cores. It will use less electricity and be quiet, too. More expensive, but if this is something you plan to leave running 24/7 the electricity costs over a few years might make the power hungry server more expensive over time.
So for $3000, that's 3000 hours, or 125 days, (if just wastefully leave them on all the time, instead of turning them on when needed).
Say you wanted to play around for a couple of hours, that's like.. $3.
(That's assuming there's no bonus for joining / free tier, too.)
I regularly rent this for a few hours at a time for learning and prototyping
The desktop equivalent of your 10 T3 Micro instances is about $600 if you buy new. For example a Lenovo ThinkCentre M75q Gen 2 Tiny 11JN009QGE has 8x3.2GHz processor with hyperthreading. That's 16 virtual cores compared to the 20 vcpus of the T3 instances, but with much faster cores. And 16GB RAM allows you to match the 1GB per instance.
If you don't have anything and feel generous throw in another $200 for a good monitor and keyboard plus mouse. But you can get a used crap monitor for $20. I'd give you one for free just to be rid of it.
That's a total of $800, or 33 days of forgetting to shut down the 10 VMs. Maybe half that if you buy used.
Granted not everyone has $800 or even $400 to drop on hobby projects, renting VMs often does make sense
But if you're someone like me who intends to actively use the hardware for real-world purposes, the cloud often simply can't compete on price. At home, I have a mini PC with a 5600G, 32GB of RAM, and a few TBs of NVME storage. The entire thing cost less than $600 a few years ago, and consumes around 20W of power on average.
Even on the cheapest cloud providers available, an equivalent setup would exceed that price in less than half a year. SSD storage in particular is disproportionately expensive on the cloud. For small VMs that don't need much storage, it does make sense, but as soon as you scale up, cloud prices quickly start ballooning.
You don’t need hardware to learn. Sure it helps but you can learn from a book and pen and paper exercises.
It's definitely not suited for production, but there, you won't find old blade servers either (for the power to performance issue).
Handy: https://700c.dk/?powercalc
My Pi CM4 NAS with a PCIe switch, SATA and USB3 controllers, 6 SATA SSDs, 2 VMs, 2 LXC containers, and a Nextcloud snap pretty much sits at 17 watts most of the time, hitting 20 when a lot is being asked of it, and 26-27W at absolute max with all I/O and CPU cores pegged. €3.85/mo if I pay ESB, but I like to think that it runs fully off the solar and batteries :)
Pretty sure most of us aren't running anywhere close to full load 24/7, but whoa, Irish power is expensive. In the central US I pay $0.14/KWh.
[1] https://en.wikipedia.org/wiki/Shannon_hydroelectric_scheme
'Worth it any more'? At this size, never. A Pi is a Pi is a Pi!
A few are fine for toying around; beyond that, hah. Price:perf is rough, does not improve with multiplication [of units, cost, or complexity].
Or the oldie-but-goodie paper "Scalability! But at what COST?": https://www.usenix.org/system/files/conference/hotos15/hotos...
Long story short, performance considerations with parallelism go way beyond Amdahl's Law, because supporting scale-out also introduces a bunch of additional work that simply doesn't exist in a single node implementation. (And, for that matter, multithreading also introduces work that doesn't exist for a sequential implementation.) And the real deep down black art secret to computing performance is that the fastest operations are the ones you don't perform.
201 more comments available on Hacker News