How I Solved Pytorch's Cross-Platform Nightmare

Posted4 months agoActive4 months ago

msvana

73 points

29 comments

svana.nameTechstory

calmmixed

Debate

60/100

PytorchCross-Platform CompatibilityPackage Management

Key topics

Pytorch

Cross-Platform Compatibility

Package Management

The author shares their solution to managing PyTorch's cross-platform compatibility issues, sparking a discussion on package management strategies and alternative solutions.

Snapshot generated from the HN discussion

Discussion Activity

Active discussion

First comment

Peak period

84-90h

Avg / period

9.7

Comment distribution29 data points

Loading chart...

Based on 29 loaded comments

Key moments

01Story posted
Sep 8, 2025 at 12:59 AM EDT
4 months ago
Step 01
02First comment
Sep 11, 2025 at 9:51 AM EDT
3d after posting
Step 02
03Peak activity
17 comments in 84-90h
Hottest window of the conversation
Step 03
04Latest activity
Sep 12, 2025 at 3:55 AM EDT
4 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (29 comments)

Showing 29 comments

4 months ago

1 reply

Note that https://peps.python.org/pep-0440/#direct-references says:

> Public index servers SHOULD NOT allow the use of direct references in uploaded distributions. Direct references are intended as a tool for software integrators rather than publishers.

This means that PyPI will not accept your project metadata as you currently have it configured. See https://github.com/pypi/warehouse/issues/7136 for more details.

doctorpangloss

4 months ago

Guess the guy who wrote this article will learn the hard way: The last 20% of packaging is 800% of your time.

mdaniel

4 months ago

3 replies

> Cross-Platform

  cpu = [
  "torch @ <https://download.pytorch.org/whl/cpu/torch-2.7.1%2Bcpu-cp312-cp312-manylinux_2_28_x86_64.whl> ; python_version == '3.12'",
  "torch @ <https://download.pytorch.org/whl/cpu/torch-2.7.1%2Bcpu-cp313-cp313-manylinux_2_28_x86_64.whl> ; python_version == '3.13'",
  ]

:-/ It reminds me of Microsoft calling their thing "cross platform" because it works on several copies of Windows

In all seriousness, I get the impression that pytorch is such a monster PITA to manage because it cares so much about the target hardware. It'd be like a blog post saying "I solved the assembly language nightmare"

gobdovan

4 months ago

1 reply

Torch simply has to work this way because it cares about performance on a combination of multiple systems and dozens of GPUs. The complexity leaks into packaging.

If you do not care about performance and would rather have portability, use an alternative like tinygrad that does not optimize for every accelerator under the sun.

This need for hardware-specific optimization is also why the assembly language analogy is a little imprecise. Nobody expects one binary to run on every CPU or GPU with peak efficiency, unless you are talking about something like Redbean which gets surprisingly far (the creator actually worked on the TensorFlow team and addressed similar cross-platform problems).

So maybe the the blogpost you're looking for is https://justine.lol/redbean2/.

dragonwriter

4 months ago

> Torch simply has to work this way because it cares about performance on a combination of multiple systems and dozens of GPUs

Or, looked at a different way, Torch has to work this way because Python packaging has too narrow of an understanding of platforms which treats many things that are materially different platforms as the same platform.

esafak

4 months ago

1 reply

https://github.com/pypa/manylinux is for building cross-platform wheels.

mdaniel

4 months ago

1 reply

> Python wheels that work on any linux (almost)

So, you're doubling down on OP's misnomer of "cross platform means whatever platforms I use", eh?

esafak

4 months ago

I do not know what you are objecting to. I have successfully packaged wheels for MacOS and Ubuntu with manylinux. I presume it works with some others too, but that is what I can personally attest to. Alpine is catered to by the musllinux project.

You should be specific about which distributions you have in mind.

cstrahan

4 months ago

I think a more charitable interpretation of TFA would be: "I Have Come Up With A Recipe for Solving PyTorch's Cross-Platform Nightmare"

That is: there's nothing stopping the author from building on the approach he shares to also include Windows/FreeBSD/NetBSD/whatever.

It's his project (FileChat), and I would guess he uses Linux. It's natural that he'd solve this problem for the platforms he uses, and for which wheels are readily available.

lynndotpy

4 months ago

3 replies

> Setting up a Python project that relies on PyTorch, so that it works across different accelerators and operating systems, is a nightmare.

I would like to add some anecdata to this.

When I was a PhD student, I already had 12 years of using and administrating Linuxes as my personal OS, and I'd already had my share of package manager and dependency woes.

But managing Python, PyTorch, and CUDA dependencies were relatively new to me. Sometimes I'd lose an evening here or there to something silly. But I had one week especially dominated by these woes, to the point where I'd have dreams about package management problems at the terminal.

They were mundane dreams but I'd chalk them up as nightmares. The worst was having the pleasant dream where those problems went away forever, only to wake up to realize that was not the case.

dleeftink

4 months ago

1 reply

Wake up, lynndotpy

gchamonlive

4 months ago

Follow the white rabbit.

aitchnyu

4 months ago

2 replies

How well do you read in your dreams? Do you read full outputs or just diffrentiate between a green [OK] status and stack traces?

lynndotpy

4 months ago

I don't recall the details but I do remember having to write down details by hand.

But the point is more that, for me, this is a somewhat rare instance where I think using the term "nightmare" in the title is justified.

godelski

4 months ago

I can read perfectly well in my dreams. Like the letters are sharp, clear, and perfectly legible. The problem is when I look away from something and then look back the text usually changes. Once I lucid dreamed because I walked past a street sign, realized it was the name of a different street than the one I was on, looked back and say the other side of the sign read a third street name, walked back to the other side and saw a fourth name. I decided I should take this opportunity and be cliche and try to fly. I just kept going up till it was really bright and I woke up. Mostly now I just recognize I'm in a dream and go along for the ride, but better able to remember it.

godelski

4 months ago

  > When I was a PhD student, I already had 12 years of using and administrating Linuxes as my personal OS, and I'd already had my share of package manager and dependency woes.

I'm in a very similar boat (just defended a few months ago).

More than once I had installed pytorch into a new environment and subsequently spent hours trying to figure out why things suddenly aren't working. Turns out, PyTorch had just uploaded a bad wheel.

Weirdly I feel like CUDA has become easier yet Python has become worse. It's all package management. Honestly, I find myself wanting to use package managers less and less because of Python. Of course `pip install` doesn't work, and that is probably a good thing. But the result of this is that any time you install a package it adds the module as a system module, which I thought was the whole thing we were trying to avoid. So what? Do I edit every package build now so that it runs a uv venv? If I do that, then this seems to just get more complicated as I have to keep better track of my environments. I'd rather be dealing with environment modules than that. I'd rather things be rapped up in a systemd service or nspawn than that!

I mean I just did a update and upgrade and I had 13 python packages and 193 haskell modules, out of 351 packages! This shit is getting insane.

People keep telling me to keep things simple, but I don't think any of this is simple. It really looks like a lot of complexity created by a lot of things being simplified. I mean isn't every big problem created out of a bunch of little problems? That's how we solve big problems -- break them down to small problems -- right? Did we forget the little things matter? If you don't think they do, did you question if this comment was written by an LLM because I used a fucking em dash? Seems like you latched onto something small. I think it is hard to know when the little things matter or don't matter, often we just don't realize the little things are part of the big things.

kwon-young

4 months ago

2 replies

In my opinion, anything that touch compiled packages like pytorch should be packaged with conda/mamba on conda-forge. I found it is the only package manager for python which will reliably detect my hardware and install the correct version of every dependency.

zbowling

4 months ago

1 reply

Try pixi! Pixi is a much more sane way for building with conda + pypi packages in a single tool that makes this so much easier for torch development, regardless if you get the condaforge or pypi builds of pytorch. https://pixi.sh/latest/

kwon-young

4 months ago

I don't see the advantage ?

In the comparative table, they claim that conda doesn't support:

* lock file: which is false, you can freeze your environment

* task runner: I don't need my package manager to be a task runner

* project management: You can do 1 env per project ? I don't see the problem here...

So no, please, just use conda/mamba and conda-forge.

levocardia

4 months ago

Likewise, this was my experience. If ever I need to "pip anything" I know I'm in for a bad time. Conda is built for literally this exact problem. Still not a breeze, but much better than trying to manually freeze all your pip dependencies.

antimora

4 months ago

1 reply

Check out https://github.com/tracel-ai/burn project! It makes deploying models across different platforms easy. It uses Rust instead of Python.

liuliu

4 months ago

The reason why people go distances to package PyTorch is because the skill of translating models between different frameworks manually is "easy" but not well dispensed in developer community.

That's why people will go stupid lengths to convert model from PyTorch / TensorFlow with onnxtools / coremltools to avoid touch the model / weights themselves.

The only one that escaped this is llama.cpp, which weirdly, despite the difficulty of model conversion with ggml, people seem to do it anyway.

arun-mani-j

4 months ago

This is so nice, I wish more packages followed something like this. I'm on AMD integrated GPU (doesn't even support Rocm). Whenever I install a Python package that depends on PyTorch, it automatically installs some GBs of CUDA related packages.

This ends up wasting space and slowing down installation :(

Speaking of PyTorch and CUDA, I wish the Vulkan backend becomes stable, but that seems to super far dream...

https://docs.pytorch.org/executorch/stable/backends-vulkan.h...

tuna74

4 months ago

Is there a problem using distro packages for Pytorch? What are the downsides of using the official Fedora Pytorch for example?

cmdr2

4 months ago

https://pypi.org/p/torchruntime might help here, it's designed precisely for this purpose.

`pip install torchruntime`

`torchruntime install torch`

It figures out the correct torch to install on the user's PC, factoring in the OS (Win, Linux, Mac), the GPU vendor (NVIDIA, AMD, Intel) and the GPU model (especially for ROCm, whose configuration varies per generation and ROCm version).

And it tries to support quite a number of older GPUs as well, which are pinned to older versions of torch.

It's used by a few cross-platform torch-based consumer apps, running on quite a number of consumer installations.

zbowling

4 months ago

Check out Pixi! Pixi is an alternative to the common conda and pypi frontends and has better system for hardware feature detection and get the best version of Torch for your hardware that is compatible across your packages (except for AMD at the moment). It can pull in the condaforge or pypi builds of pytorch and help you manage things automagically across platforms. https://pixi.sh/latest/python/pytorch/

It doesn't solve how you package your wheels specifically, that problem is still pushed on your downstream users because of boneheaded packaging decisions by PyTorch themselves but as the consumer, Pixi soften's blow. The condaforge builds of PyTorch also are a bit more sane.

userabchn

4 months ago

I maintain a package that provides some PyTorch operators that are written in C/C++/CUDA. I have tried various approaches over the years (including the ones endorsed by PyTorch), but the only solution I have found that seems to work flawlessly for everyone who uses it is to have no Python or PyTorch dependence in the compiled code, and to load the compiled libraries using ctypes. I use an old version of nvcc to compile the CUDA, use manylinux2014 for the Linux builds, and ask users to install PyTorch themselves before installing my package.

ashvardanian

4 months ago

Related, but wasn’t broadly discussed on HN: https://astral.sh/blog/wheel-variants

Simulacra

4 months ago

Good writeup. PyTorch has generally been very good to me when I can mitigate its resource hogging at times. Production can be a little wonky but for everything else it works

View full discussion on Hacker News

ID: 45164762Type: storyLast synced: 11/20/2025, 1:26:54 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN