Launch HN: Cactus (YC S25) – AI inference on smartphones
github.come.g. if I built a basic LLM chat app with Qwen3 600m + Cactus, whats the total app size?
Makes it really easy to plug and play different models on my phone.
If anybody is curious what a Pixel 9 Pro is capable of:
Tokens: 277- TTFT: 1609ms 9 tok/sec
qwen2.5 1.5b instruct q6_k
Sure, here's a simple implementation of the Bubble Sort algorithm in Python:
def bubble_sort(arr): n = len(arr) for i in range(n): # Flag to detect any swap in current pass swapped = False for j in range(0, n-i-1): # Swap if the element found is greater than the next element if arr[j] > arr[j+1]: arr[j], arr[j+1] = arr[j+1], arr[j] swapped = True # If no swap occurs in the inner loop, the array is already sorted if not swapped: break
# Example usage: arr = [64, 34, 25, 12, 22, 11, 90] bubble_sort(arr) print("Sorted array is:", arr)
This function sorts the array in ascending order using the Butbble Sort algorithm. The outer loop runs n times, where n is the length of the array. The inner loop runs through the array, comparing adjacent elements and swapping them if they are in the wrong order. The swapped flag is used to detect if any elements were swapped in the current pass, which would indicate that the array is already sorted and can be exited early.
Same model should run 3x faster on the same phone.
These improvements are still being pushed to the SDKs though.
I've had great experiences with gpt-oss20b on my laptop, a genuinely useful local model.
3x probably doesn't get my Pixel Pro 9 to being able to run 20b models, but its getting close!
[0] react-native-ai.dev
I already knew to avoid "please share your thoughts," although I guess I am kind of violating that one by even commenting
Mixing in a “you have to pay if you’re a corporation” licence makes this difficult if not impossible, particularly if we wanted deep integration with eg Cactus. We don’t want to police a “corporation” who wants to use our open source software.
I downloaded Cactus a couple months back because I saw a comment, but bait and switch like this makes we want to look for an actual open source solution.
Your license change goes against that. You say it’s free for personal use but how many times do people create something for personal use and monetize it later? What if I use Cactus chat to control a commercial app? Does that make Cactus chat use “commercial”?
https://github.com/cactus-compute/cactus/commit/b1b5650d1132...
Use open source and stick with it, or don't touch it at all, and tell any VC shitheels saying otherwise to pound sand.
If your business is so fragile or unoriginal that it can't survive being open source, then it will fail anyway. If you make it open source, embrace the ethos and build community, then your product or service will be stronger for it. If the big players clone your work, you get instant underdog credibility and notoriety.
It’s still free for the community, just that corporations need a license. Should we make this clearer in the license?
Just say that in the license.
> We are open-source (https://github.com/cactus-compute/cactus). Cactus is free for hobbyists and personal projects, with a paid license required for commercial use.
If it is open-source, one is free to distribute even for commercial use by definition. Which one is correct and what's your business model?
> Open-source software is software released under a license where the copyright holder grants users the rights to use, study, change, and distribute the software and its source code, for any purpose.
That’s the first result you get on Google—and it’s exactly why so many companies relicensed their projects (Redis, HashiCorp, Elasticsearch, MongoDB…).
If it’s open source, you can sell it, host it, or give it away for free. The only difference is which obligations the license attaches:
GPL → you must keep the license.
AGPL → you must keep it and extend it to hosted services.
BSD/MIT → do almost whatever you want.
But the core right is always the same: distribute, host, and sell. Courts have even confirmed this is the accepted definition of “open source.”
> While Cactus can be used for all Apple devices including Macbooks due to their design, for computers/AMD/Intel/Nvidia generally, please use HuggingFace, Llama.cpp, Ollama, vLLM, MLX. They're built for those, support x86, and are all great!
It reads like you're saying for all Apple devices (which would include iOS), use these other things.(?) For iOS, are you trying to beat performance of other options? If so, it would be helpful to include comparison benchmarks.
> guarantees privacy by default, works offline, and doesn't rack up a massive API bill at scale.
I’ve been really interested in on-device ML for most of my career, and now I wonder how valuable these benefits really are. LLM vendor APIs are pretty performant these days, security is security, and with an on-device model you have to provide updates every time a new model comes out.
I am very curious what could be done with your impressive optimization on an rk3588, since it has pretty decent bits in all 3 categories, and am now seriously considering a Radxa Orion to play with this on :)
One more if you have a moment: will this be limited to text generation, or will it have audio and image capabilities as well? Would be neat to enable not only image generation, but also explore voice recognition, translation, computer vision, as well as image editing and enhancement features in mobile apps beyond what the big players daign to give us :)
We don't advise using GPUs on smartphones, since they're very energy-inefficient. Mobile GPU inference is actually the main driver behind the stereotype that "mobile inference drains your battery and heats up your phone".
Wrt to your last question – the short answer is yes, we'll have multimodal support. We currently support voice transcription and image understanding. We'll be expanding these capabilities to add more models, voice synthesis, and much more.
In our deployments, we've seen open source models rival and even outperform lower-tier cloud counterparts. Happy to share some benchmarks if you like.
Our pricing is on a per-monthly-active-device basis, regardless of utilization. For voice-agent workflows, you typically hit savings as soon as you process over ≈2min of daily inference.