Vectorware – From Creators of `rust-Gpu` and `rust-Cuda`

Posted3 months agoActive3 months ago

ashvardanian

82 points

24 comments

vectorware.comTechstory

calmmixed

Debate

60/100

GPU ComputingRust Programming LanguageSoftware Development

Key topics

GPU Computing

Rust Programming Language

Software Development

VectorWare, a company founded by creators of rust-GPU and rust-CUDA, announces its mission to develop GPU-native software, sparking discussion about the current state of GPU applications and the potential for improvement.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

21m

Peak period

0-2h

Avg / period

Comment distribution24 data points

Loading chart...

Based on 24 loaded comments

Key moments

01Story posted
Oct 23, 2025 at 11:41 AM EDT
3 months ago
Step 01
02First comment
Oct 23, 2025 at 12:03 PM EDT
21m after posting
Step 02
03Peak activity
11 comments in 0-2h
Hottest window of the conversation
Step 03
04Latest activity
Oct 24, 2025 at 12:23 PM EDT
3 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (24 comments)

Showing 24 comments

billconan

3 months ago

1 reply

after reading this page, I still don't know what gpu native ware they want to work on.

> If you look at existing GPU applications, their software implementations aren't truly GPU-native. Instead, they are architected as traditional CPU software with a GPU add-on.

I feel that this is due to the current hardware architecture, not the fault of software.

LegNeato

3 months ago

We have some demos coming in the next couple weeks. The hardware is there, the software isn't!

zozbot234

3 months ago

1 reply

What does this mean for the rust-gpu and rust-cuda projects themselves? Will they go unmaintained now that the creators are running a business?

(Don't miss the "Pedantic mode" switch on the linked page, it adds relevant and detailed footnotes to the blog post.)

LegNeato

3 months ago

We are investing in them and they form the basis of what we are doing. That being said, we are also exploring other technical avenues with different tradeoffs...we don't want to assume a solution merely because we are familiar with them.

wrs

3 months ago

2 replies

Be sure to turn on "pedantic mode" to get the footnotes that make this make more sense. Some examples of what this means by "applications" would help. I don't think the prediction here is that Excel's main event loop is going to run on the GPU, but I can see that its calculation engine might.

simonask

3 months ago

1 reply

With current GPU architectures, this seems unlikely. Like, you would need a ton of cells with almost perfectly aligned inputs before even the DMA bus roundtrip is worth it.

We’re talking at least hundreds of thousands of cells, depending on the calculation, or at least a number that will make the UI very sad long before you’ll see a slowdown from calculation.

Databases, on the other hand…

zozbot234

3 months ago

There isn't always a DMA roundtrip; unified memory is a thing. But programming for the GPU is very awkward at a systems level. Even with unified memory, there is generally no real equivalent to virtual memory or mmap() so you have to shuffle your working set in and out of VRAM by hand anyway (i.e. backing and residency is managed explicitly, even with "sparse" allocation api's that might otherwise be expected to ease some of the work). Better GPU drivers may be enough to mitigate this, along with broad-based standardization of some current vendor-specific extensions (it's not clear that real HW changes are needed) but this creates a very real limitation in the scale of software (including the AI kind) you can realistically run on any given device.

LegNeato

3 months ago

More software than you think can run fully on the GPU, especially with datacenter cards. We'll be sharing some demos in the coming weeks.

the__alchemist

3 months ago

2 replies

`rust-GPU` and `rust-CUDA` fall in the category to me of "Rust is great, let's build the X ecosystem in rust". Meanwhile, it's been in a broken and dormant state for years. There was a leadership/dev change recently, (Are the creators of VectorWare the creators of Rust-CUDA, or the new leaders?), and more activity. I haven't tried since.

If you have a Rust application or library and want to use the GPU, these approaches are comparatively smooth:

  - WGPU: Great for 3D graphics
  - Ash and other Vulkan bindings: Low-level graphics bindings
  - Cudarc: Nice API for running CUDA kernels.

I am using WGPU and Cudarc for structural biology + molecular dynamics computations, and they work well.

Rust - CUDA feels like lots-of-PR, but not as good of a toolkit as these quieter alternatives. What would be cool for them to deliver, and I think is in their objectives: Cross-API abstractions, so you could, for example, write code that runs on Vulkan Compute in addition to CUDA.

Something else that would be cool: High-level bindings to cuFFT and vkFFT. You can FFI them currently, but that's not ideal. (Not too bad to impl though, if you're familiar with FFI syntax and the `cc` crate)

jjallen

3 months ago

+1 for cudarc. I've been using it for a couple of years now and has worked great. I'm using it for financial markets backtesting.

LegNeato

3 months ago

Yes, it is all these folks getting together and getting resources to push those projects to the next level: https://www.vectorware.com/team/

wgpu, ash, and cudarc are great. We're focusing on the actual code that runs on the GPU in Rust, and we work with those projects. We have cust in rust-cuda, but that existed before cudarc and we have been seriously discussing just killing it in favor of cudarc.

LegNeato

3 months ago

2 replies

Pedantic note: rust-cuda was created by https://github.com/RDambrosio016 and he is not currently involved in VectorWare. rust-gpu was created by the folks at embark software. We are the current maintainers of both.

We didn't post this or the title, we would never claim we created the projects from scratch.

ashvardanianAuthor

3 months ago

2 replies

My bad! "contributors" is more accurate, but HN doesn't allow editing titles, sadly :(

kibwen

3 months ago

1 reply

HN allows the submitter to edit the title, at least it did last time I checked.

pjmlp

3 months ago

It still does, but you have a timeout for the first set of minutes after submission.

I routinely have to fix the autoformating done by HN.

LegNeato

3 months ago

No worries, just wanted to correct it for folks. Thanks for posting!

Keyframe

3 months ago

folks at embark software

seems like embark has disembarked from Rust and support for it altogether

LegNeato

3 months ago

1 reply

One of the founders here, feel free to ask whatever. We purposefully didn't put much technical detail in the post as it is an announcement post (other people posted it here, we didn't).

structural

3 months ago

1 reply

1. What does it mean to be a GPU-native process?

2. Can modern GPU hardware efficiently make system calls? (if you can do this, you can eventually build just about anything, treating the CPU as just another subordinate processor).

3. At what order-of-magnitude size might being GPU-native break down? (Can CUDA dynamically load new code modules into an existing process? That used to be problematic years ago)

Thinking about what's possible, this looks like an exceptionally fun project. Congrats on working on an idea that seems crazy at first glance but seems more and more possible the more you think about it. Still it's all a gamble of whether it'll perform well enough to be worth writing applications this way.

LegNeato

3 months ago

1. The GPU owns the control loop And the only sparingly kicks to the CPU when it can't do something.

2. Yes

3. We're still investigating the limitations. A lot of them are hardware dependent, obviously data center cards have higher limits more capability than desktop cards.

Thanks! It is super fun trailblazing and realizing more of the pieces are there than everybody expects.

jiehong

3 months ago

2 replies

Sounds interesting.

But, languages like Java or python simply lack even programming constructs to program on GPUs easily.

No standardised ISA on GPUs also mean compilers can’t really provide a translation layer.

Let’s hope things get better over time!

LegNeato

3 months ago

You might be interested in a previous blog post where we showed one codebase running on many types of GPUs: https://rust-gpu.github.io/blog/2025/07/25/rust-on-every-gpu...

binarymax

3 months ago

Python has decorators which can be used to add sugar to methods for things like true parallelization. For example, see modal.com’s Python snippets.

https://modal.com/docs/examples/batched_whisper

cutlilacs

3 months ago

> If you look at existing GPU applications, their software implementations aren't truly GPU-native. Instead, they are architected as traditional CPU software with a GPU add-on. For example, pytorch uses the CPU by default and GPU acceleration is opt-in. Even after opting in, the CPU is in control and orchestrates work on the GPU. Furthermore, if you look at the software kernels that run on the GPU they are simplistic with low cyclomatic complexity. This is not unique to pytorch. Most software is CPU-only, a small subset is GPU-aware, an even smaller subset is GPU-only, and no software is GPU-native.

> We are building software that is GPU-native. We intend to put the GPU in control. This does not happen today due to the difficulty of programming GPUs, the immaturity of GPU software and abstractions, and the relatively few developers targeting GPUs.

Really feels like fad engineering. The CPU works better as a control structure and the design of GPUs are not fitted for proper orchestration compared to CPUs. What really worries me is their mention of GPU abstractions, which is completely the wrong way to think about hardware designed for HPC. Their point about PyTorch. and kernels having low cyclomatic complexity is confusing to me. GPUs aren't optimize for control flow. The nature of SIMD/SIMT values throughput and the hardware design forgoes things like branch prediction. Having many independent paths a GPU kernel could take would make it perform much worse. You could very well end up with kernels that are slower than their optimize CPU counterparts.

I'm sure the people behind this are talented and know what they're doing, but these statements don't make sense to me. GPU algorithms are harder to reason about and implement. You often need to do more work just to gain the parallizable benefit. There aren't actually that many use cases were the GPU being the primary compute platform is a better choice. My cynical view is that people like the GPU because they compare unoptimize slow CPU code with decent GPU/tensorized code. They never see how much a modern CPU can actually do, and how fast it can be.

View full discussion on Hacker News

ID: 45683153Type: storyLast synced: 11/20/2025, 7:50:26 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN