CUDA Ontology
Mood
thoughtful
Sentiment
neutral
Category
tech
Key topics
CUDA
GPU Computing
Ontology
The author shares their exploration of CUDA ontology, likely discussing the structure and organization of concepts related to CUDA, a parallel computing platform.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4d
Peak period
37
Day 4
Avg / period
19
Based on 38 loaded comments
Key moments
- 01Story posted
Nov 16, 2025 at 1:56 PM EST
8 days ago
Step 01 - 02First comment
Nov 20, 2025 at 3:49 AM EST
4d after posting
Step 02 - 03Peak activity
37 comments in Day 4
Hottest window of the conversation
Step 03 - 04Latest activity
Nov 20, 2025 at 2:59 PM EST
3d ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
We were on AWS when we used this so setting up seemed easy enough - AWS gave you the driver, and a matching docker image was easy enough to find.
For some versions there's even sometimes compat layers built into the container to allow forward version compatibility.
There were other problems, such as the research cluster of my university not having Docker, but that is a different issue.
The idea that getting a PCIe FPGA board to crunch numbers is less headache prone than a GPU is laughable, but that's the absurd reality we live in.
nvcc from the CUDA toolkit has a compatibility range with the underlying host compilers like gcc. If you install a newer CUDA toolkit on an older machine, likely you'll need to upgrade your compiler toolchain as well, and fix the paths.
While orchestration in many (research) projects happens from Python, some depend on building CUDA extensions. An innocently looking Python project may not ship the compiled kernels and may require a CUDA toolkit to work correctly. Some package management solutions provide the ability to install CUDA toolkits (conda/mamba, pixi), the pure-Python ones do not (pip, uv). This leaves you to match the correct CUDA toolkit to your Python environment for a project. conda specifically provides different channels (default/nvidia/pytorch/conda-forge), from conda 4.6 defaulting to a strict channel priority, meaning "if a name exists in a higher-priority channel, lower ones aren't considered". The default strict priority can make your requirements unsatisfiable, even though there would be a version of each required package in the collection of channels. uv is neat and fast and awesome, but leaves you alone in dealing with the CUDA toolkit.
Also, code that compiles with older CUDA toolkit versions may not compile with newer CUDA toolkit versions. Newer hardware may require a CUDA toolkit version that is newer than what the project maintainer intended. PyTorch ships with a specific CUDA runtime version. If you have additional code in your project that also is using CUDA extensions, you need to match the CUDA runtime version of your installed PyTorch for it to work. Trying to bring up a project from a couple of years ago to run on latest hardware may thus blow up on you on multiple fronts.
Conversely, nvcc often stops working with major upgrades of gcc/clang. Fun times, indeed.
This is why a lot of people just use NVIDIA's containers even for local solo dev. It's a hassle to set up initially (docker/podman hell) but all the tools are there and they work fine.
Yeah, which I feel like is fine for one project, or one-offs, but once you've accumulated projects, having individual 30GB images for each of them quickly adds up.
I found that most of my issues went away as I started migrating everything to `ux` for the python stuff, and nix for everything system related. Now I can finally go back to a 1 year old ML project, and be sure it'll run like before, and projects share a bit more data.
This is the part I find confusing, especially as NVIDIA doesn't make it easy to find and download the old toolkits. Is this effectively saying that just choosing the right --arch and --code flags isn't enough to support older versions? But that as it statically links in the runtime library (by default) that newer toolkits may produce code that just won't run on older drivers? In other words, is it true that to support old hardware you need to download and use old CUDA Toolkits, regardless of nvcc flags? (And to support newer hardware you may need to compile with newer toolkits).
That's how I read it, which seems unfortunate.
It should probably also add that everything CUDA is owned by NVIDIA, and "CUDA" itself is a registered trademark. The official way to refer to it is that the first time you spell it out as "NVIDIA® CUDA®" and then subsequently refer to just CUDA.
Now direct from actual sources... From [1]
> Intended users of this Brand Guideline are members of the NVIDIA Partner Network (NPN), including original equipment manufacturers (OEMs), solution advisors, cloud partners, solution providers, distributors, solutions integrators, and service delivery partners.
From [2]:
> Always include the correct trademark (™ vs ®) by referring to the content documents provided or using the list of common NVIDIA products and technologies. After the first mention of the NVIDIA product or technology, which includes the appropriate trademarks, the trademark does not need to be included in future mentions within the same document, article, etc.
> CUDA®
[1]: https://brand.nvidia.com/d/wGtgoY2mtYYM/nvidia-partner-netwo...
[2]: https://brand.nvidia.com/d/wGtgoY2mtYYM/nvidia-partner-netwo...
"The CUDA "driver version" looks like the CUDA runtime version - so what's the difference?" https://stackoverflow.com/q/40589814/1593077
or consider the version you get when you run nvidia-smi, versus the version you get when you run nvcc --version. Those are very different numbers...
The compatibility between different versions of the driver and the toolkit is also a cause for some headaches in my experience.
I wouldn't have been able to tell you this a few months ago, and it was confusing! Machine that compiles vs machine that runs, CUDA toolkit which includes both vs nvidia driver which just includes one part of it etc... The article explicitly describes this.
That library is actually a rather poor idea. If you're writing a CUDA application, I strongly recommend avoiding the "runtime API". It provides partial access to the actual CUDA driver and its API, which is 'simpler' in the sense that you don't explicitly create "contexts", but:
* It hides or limits a lot of the functionality.
* Its actual behavior vis-a-vis contexts is not at all simple and is likely to make your life more difficult down the road.
* It's not some clean interface that's much more convenient to use.
So, either go with the driver, or consider my CUDA API wrappers library [1], which _does_ offer a clean, unified, modern (well, C++11'ish) RAII/CADRe interface. And it covers much more than the runtime API, to boot: JIT compilation of CUDA (nvrtc) and PTX (nvptx_compiler), profiling (nvtx), etc.
> Driver API ... provides direct access to GPU functionality.
Well, I wouldn't go that far, it's not that direct. Let's call it: "Less indirect"...
Probably the worst part of this: for the most part, in practice, it will work just fine. Until it doesn’t. You will have lots of fun debugging subtle bugs in a closed-source black box, which reproduces only against certain driver API header versions, which potentially does not match the version of the actual driver API DSO you’ve dlopened, and which only produces problems when mixed with certain Linux kernel versions.
(I have the exact opposite opinion; people reach too eagerly for the driver API when they don’t need it. Almost everything that can be done with the driver api can be done with the runtime API. If you absolutely must use the driver API, which I doubt, you should at least resolve the function pointers through cudaGetDriverEntrypointByVersion.)
That's the first instance in my life when somebody coherently described what the word 'ontology' means. I'm sure this explanation is wrong, but still...
However, it misses the polyglot part (Fortran, Python GPU JIT, all the backends that support PTX), the library ecosystem (writing CUDA kernels should be the exception not the rule), the graphical debugging tools and IDE integration.
2 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.