Nvidia Dgx Spark: When Benchmark Numbers Meet Production Reality

Posted2 months agoActive2 months ago

RyeCatcher

152 points

117 comments

publish.obsidian.mdTechstoryHigh profile

heatedmixed

Debate

80/100

Nvidia Dgx SparkAi/ml HardwareGPU Performance

Key topics

Nvidia Dgx Spark

Ai/ml Hardware

GPU Performance

The article discusses the author's hands-on experience with Nvidia's DGX Spark, highlighting both its impressive performance and several issues, including GPU inference problems, which sparked a lively discussion among commenters about the product's strengths and weaknesses.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

0-12h

Avg / period

23.2

Comment distribution116 data points

Loading chart...

Based on 116 loaded comments

Key moments

01Story posted
Oct 26, 2025 at 1:53 PM EDT
2 months ago
Step 01
02First comment
Oct 26, 2025 at 3:30 PM EDT
2h after posting
Step 02
03Peak activity
91 comments in 0-12h
Hottest window of the conversation
Step 03
04Latest activity
Oct 31, 2025 at 2:37 AM EDT
2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (117 comments)

Showing 116 comments of 117

stuckinhell

2 months ago

2 replies

I'm utterly shocked at the article saying GPU inference (PyTorch/Transformers)isn't working. Numerical instability produces bad outputs, Not viable for real-time serving, Wait for driver/CUDA updates!

My job just got me and our entire team a DGX spark. I'm impressed at the ease of use for ollama models I couldn't run on my laptop. gpt-oss:120b is shockingly better than what I thought it would be from running the 20b model on my laptop.

The DGX has changed my mind about the future being small specialized models.

jasonjmcghee

2 months ago

> I'm utterly shocked at the article saying GPU inference (PyTorch/Transformers)isn't working

Are you shocked because that isn't your experience?

From the article it sounds like ollama runs cpu inference not GPU inference. Is that the case for you?

RyeCatcherAuthor

2 months ago

Totally agree. I’ve been training nanochat models all morning. Hit some speed bumps. I’ll share more later in another article. Buts it’s absolutely amazing. I fine tuned a Gemma3 model in a day yesterday.

jsheard

2 months ago

1 reply

No mention of the monstrous 200GbE NIC, seems like a waste if people aren't finding a use for it.

RyeCatcherAuthor

2 months ago

Need to buy 2 and connect em. :-)

RyeCatcherAuthor

2 months ago

5 replies

I absolutely love it. I’ve been up for days playing with it. But there are some bleeding edge issues. I tried to write a balanced article. I would highly recommend for people that love to get their hands dirty. Blows away any consumer GPU.

furyofantares

2 months ago

1 reply

Since the text is obviously LLM output, how much prompting and editing went into this post? Did you have to correct anything that you put into it that it then got wrong or added incorrect output to?

NathanielK

2 months ago

1 reply

Definitely reeks of someone who doesn't know what makes a readable blogpost and hoped the LLM did.

I was not familiar with the hardware, so I was disappointed there wasn't a picture of the device. Tried to skim the article and it's a mess. Inconsistent formatting and emoji without a single graph to visualize benchmarks.

furyofantares

2 months ago

1 reply

I read the whole thing now and it's filled with slop. I don't really care about the emojis and the marketing voice too much. I do care that it's impossible to tell what the author cared about what they didn't, or if any of it is made up or extrapolated.

I bet the input to the LLM would have been more interesting.

furyofantares

2 months ago

> Training Performance is Real (When It Works)

It looks like it worked? Why's it say this?

> Verdict: Inference speed scales proportionally with model size.

Author only tried one model size and it's faster than NVIDIA's reported speed at a larger model. Not really a "Verdict".

> Verdict: 4-bit quantization is production-viable.

That's not really something you can conclude from messing around with it and saying you like the outputs.

> GPU Inference is Fundamentally Broken

Probably not? It probably just doesn't work in llama.cpp right now? Takes a while reading this to work out they tried ollama and then later llama.cpp, which I'd guess is basically testing llama.cpp twice. Actually I don't even believe that, I'm sure author ran into errors that might be a pain to figure out, but there's no evidence it's worse than that.

But then it says this is the "root cause":

    ARM64 + Blackwell + CUDA 13.0 = Bleeding Edge
    ↓
    Limited production testing
    ↓
    Edge cases in numerical precision (inference)
    ↓
    Memory management issues (training)

Am I to believe GPU inference is really fundamentally broken? I'm not seeing the case made here, just claims. At this point the LLM seems to have gotten confused about whether it's talking about the memory fragmentation issue or the GPU inference issue. But it's hard to believe anything from this point on in the post.

enum

2 months ago

1 reply

I have H100s to myself, and access to more GPUs than I know what to do with in national clusters.

The Spark is much more fun. And I’m more productive. With two of them, you can debug shallow NCCL/MPI problems before hitting a real cluster. I sincerely love Slurm, but nothing like a personal computer.

latchkey

2 months ago

3 replies

Your complaint sounds more like the way that you have to access the HPC (via slurm), not the compute itself. After having now tried slurm myself, I don't understand the love for it at all.

As for debugging, that's where you should be allowed to spin up a small testing cluster on-demand. Why can't you do that with your slurm access?

enum

2 months ago

1 reply

I’m not complaining. The clusters are great. The non-Slurm H100s are great. The Spark is more fun.

latchkey

2 months ago

2 replies

What makes it more fun?

enum

2 months ago

1 reply

I think that personal computing is more fun than time-shared computing. :)

It's remarkable what can now be done on a whisper-quiet little box. I hope the Strix Halo's will be just as much fun, and they should be, so long as Flash Attention works.

latchkey

2 months ago

Haven't tried to compile it for SH, but did compile it for MI355x and it worked. LONG compile time though ninja sure helped.

Fair, thanks for the answer.

The bane of my existence...

  salloc: Granted job allocation 1978
  salloc: Waiting for resource configuration

moondev

2 months ago

1 reply

You can't attach a monitor to a h100, it has no video out.

Even ignoring GPU details spark is an awesome little quiet powerhouse arm64 workstation that is 100% Linux first

latchkey

2 months ago

1 reply

At least on our offerings (MI300x), we offer console and even iDrac bios access (bare metal) and it is all running Ubuntu.

moondev

2 months ago

1 reply

Sure what I mean is a physical monitor is more fun than a virtual console.

Curious though how you offer idrac to customer, do you have another OOB BMC for the idrac? Or is this internal engineering context

latchkey

2 months ago

1 reply

I really don't understand the difference. Either way, it is just a window into a computer. ¯\_(ツ)_/¯

We rent bare metal on-demand and our whole business is to be able to offer compute that you probably wouldn't be able to host in your house $, as if you own it yourself.

So, we made it so that users can get access into the BMC and modify the box however they want. When they are done, we've automated the reset as well. Fully self-service.

$ These boxes are very expensive, weigh 350lbs, sound like a jet engine and consume ~10kW.

moondev

2 months ago

Can we agree that 10 minute boot time is not fun?

yunohn

2 months ago

1 reply

100% - slurm is aimed at job maintenance and resource management on HPC clusters. Thus being a pain in the ass for the kind of fast adhoc iteration and testing that AI/ML requires.

mbreese

2 months ago

Unless you can submit an interactive slurm job and get exclusive access to an H100 for a few hours of dedicated time. If the cluster is overloaded, it’s hard to get those to run when you’d like, but there are still ways. But you do have to be patient.

But it’s still not quite like exclusive access to resources when you want them. So I can see it from both ways.

pinewurst

2 months ago

The love for Slurm comes from experience with other, older HPC batch schedulers which were/are obliquely worse in so many ways.

yunohn

2 months ago

1 reply

Thanks for this bleeding edge content!

But please have your LLM post writer be less verbose and repetitive. This is like the stock output from any LLM, where it describes in detail and then summarizes back and forth over multiple useless sections. Please consider a smarter prompt and post-editing…

Tepix

2 months ago

I agree whole-heartedly. Two thirds of the article read like slop.

Tepix

2 months ago

1 reply

> Blows away any consumer GPU.

Nah. Do you have 1st hand experience with Strix Halo? At less than 1600€ for a 128GB configuration it manages >45 tokens/s with gpt-oss 120b. Which is faster than DGX Spark at a fraction of the cost.

storus

2 months ago

Strix Halo has awful token prefill speed. Only suitable for very small contexts.

CompoundEyes

2 months ago

One thing I can’t find anyone mention in reviews - does inference screech to a halt when using large context windows on models? Say if you’re in the 100k range on gpt-oss. I’m not concerned about lightning inference speed overall as I understand the purpose of the spark is to be well rounded / trainer tuner. I just want to know if it becomes unusable vs reasonable slowdown at larger contexts. That’s the thing people are unpleasantly surprised to find about a Mac Studio which has prevented me from going that route.

veber-alex

2 months ago

3 replies

The llama.cpp issues are strange.

There are official benchmarks of the Spark running multiple models just fine on llama.cpp

https://github.com/ggml-org/llama.cpp/discussions/16578

CaptainOfCoit

2 months ago

There wasn't any instructions how the author got ollama/llama.cpp, could possibly be something nvidia shipped with the DGX Spark and is an old version?

moffkalast

2 months ago

Llama.cpp main branch doesn't run on Orins so it's actually weird that it does run on the Spark.

RyeCatcherAuthor

2 months ago

Cool I’ll have a look. All reflections I made were first pass stuff.

suprjami

2 months ago

2 replies

So I can spend thousands of dollars to have an unstable training environment and inference performance worse than a US$200 3060.

Wow. Where do I sign up?

vardump

2 months ago

2 replies

3060 doesn't have 128 GB RAM.

moffkalast

2 months ago

2 replies

128GB / 12 GB = ~11, * 200€ = only 2200€ plus mining rig mobo.

It would be cheaper to buy up a dozen 3060s and build a custom PC around them than to buy the Spark.

pjmlp

2 months ago

1 reply

Except the Spark was designed to have everything nicely working.

suprjami

2 months ago

1 reply

And as this post shows, it doesn't.

pjmlp

2 months ago

More than most AMD stuff.

Mars008

2 months ago

1 reply

It will be really hard to put a dozen in a single PC. Then to connect them at good speed. Add to that a few new power lines to feed it all.

moffkalast

2 months ago

1 reply

True, and overkill too. With DDR5 partial offloading it would only take probably two or three at most to outperform it in every metric except power draw. My point was more that the pricing is absurd for the performance.

Mars008

2 months ago

I agree on pricing. But it includes 'free' software. Similar model Apple has. You don't buy just hardware.

suprjami

2 months ago

2 replies

And a 14B model running at 22tg/s means you won't be using that 128G RAM for inference either.

yunohn

2 months ago

1 reply

Yeah I’m honestly unclear on Nvidia’s thinking here - inference speed is unbelievably slow for the price.

Given the extreme advantage they have with CUDA and the whole AI/ML ecosystem, barely matching Apple’s M-ultra speeds is a choice…

airspresso

2 months ago

Definitely a choice to give it low memory bandwidth. Probably to avoid customers thinking it can replace any data center GPU for inference use-cases.

kgeist

2 months ago

You can use all 128 GB if you use a MoE model

thehamkercat

2 months ago

1 reply

The memory bandwidth on this thing is absolute trash, better buy a mac mini/studio with this much ram if you're throwing this much money, it'll be faster (M4 Max)

suprjami

2 months ago

1 reply

Agree, any Max or Ultra should walk all over this thing, and has the advantage of many years of already-working software.

Apple benchmarks: https://github.com/ggml-org/llama.cpp/discussions/4167

bigyabai

2 months ago

It really depends, the metrics are kinda all over the place right now: https://docs.google.com/spreadsheets/d/1SF1u0J2vJ-ou-R_Ry1JZ...

(cited from https://lmsys.org/blog/2025-10-13-nvidia-dgx-spark/)

MaKey

2 months ago

4 replies

Why would you get this when a Ryzen AI Max+ 395 with 128 GB is a fraction of the price?

d3m0t3p

2 months ago

1 reply

Because the ML ecosystem is more mature on the NVidia side. Software-wise the cuda platform is more advanced. It will be hard for AMD to catch up. It is good to see competition tho.

shikon7

2 months ago

1 reply

But the article shows that the Nvidia ecosystem isn't that mature either on the DGX Spark with ARM64. I wonder if Nvidia is still ahead for such use cases, all things considered.

bigyabai

2 months ago

On the DGX Spark, yes. On ARM64, Nvidia has been shipping drivers for years now. The rest of the Linux ecosystem is going to be the problem, most distros and projects don't have anywhere near the incentive Nvidia does to treat ARM like a first-class citizen.

pjmlp

2 months ago

2 replies

Complete computer with everything working.

simjnd

2 months ago

2 replies

The complete Framework Desktop with everything working (including said Ryzen AI Max 395+ and 128 GB of RAM) is 2500 EUR. In Europe the DGX Spark listings are at 4000+ EUR.

pjmlp

2 months ago

1 reply

Framework doesn't sell in Europe and they are sponsoring the wrong kind of folks nowadays.

simjnd

2 months ago

1 reply

Framework does absolutely sell in several countries in Europe.

pjmlp

2 months ago

1 reply

Media market, Cool Blue, FNAC, Saturn, Publico,.... where?

simjnd

2 months ago

Online only at https://frame.work AFAIK. I don't think people shelling out 2-4k for an AI training machine are concerned whether or not they can find it at a hardware store locally or online, but I may be wrong.

Mars008

2 months ago

1 reply

It's a different animal. Ryzen wins on memory bandwidth and has 'AI' accelerator (my guess matrix multiplication). Spark has times lower bandwidth, but much better and more generic compute. Add to that CUDA ecosystem with libs and tools. I'm not saying Ryzen is bad, actually it's great Mac substitute for poor man. $2K for 128GB version on Amazon now.

z3ratul163071

2 months ago

the macs are indeed the best consumer hw out there. they have a big downside: mac os only.

the reason we use ryzens is because we run linux with almost no problems on them.

zamadatix

2 months ago

1 reply

The vast majority of Ryzen AI Max+ 395s (by volume at least) are sold as complete system offerings as well. About as far as you can go the other way is getting one without an SSD, as the MB+RAM+CPU are an "all or nothing" bundle anyways.

pjmlp

2 months ago

3 replies

Including a Linux distribution with working drivers?

zamadatix

2 months ago

1 reply

Needing a customized spin of Ubuntu to have working video drivers is an Nvidia thing. One can also choose a Windows option, if they like, and run AI from there as it's just a standard x86 PC. That might actually be the best option for those worried about pre-installed OSs for AI tinkering.

The userspace side is where AI is difficult with AMD. Almost all of the community is build around Nvidia tooling first, others second (if it all).

z3ratul163071

2 months ago

i cannot state how much i despise this 'old ubuntu needed' state of affairs with the ai stuff

overfeed

2 months ago

2 replies

Fortunately, AMD upstreams its changes so no custom distro is required for Strix Halo boxes. The DGX is the platform more at risk of being left behind on Linux - just like Jetson before it, which also had a custom, now-abandoned distro.

nullfield

2 months ago

1 reply

Does NVIDIA really not have a defined support lifetime/cycle?

spwa4

2 months ago

You may remember this: https://www.youtube.com/watch?v=iYWzMvlj2RQ

rubatuga

2 months ago

This right here, Jetson is abandoned - while Strix Halo is x86 and will run new Linux distributions for years (decades?)

TiredOfLife

2 months ago

1 reply

Amd works with recent kernels oob. DGX runs on custom Ubuntu with a year old kernel

pjmlp

2 months ago

1 reply

It is not what the Romc experience tells.

zamadatix

2 months ago

Does Romc=ROCm, or something else? If the former, ROCm is just a userspace compute library for the in-kernel amdgpu driver. The "kernels" it runs are GPU compute programs, not customized Linux kernels.

simlevesque

2 months ago

1 reply

CUDA

moffkalast

2 months ago

WOULDA

SHOULDA

zamadatix

2 months ago

Theoretically it has slightly better memory bandwidth, (you are supposed to get) the Nvidia AI software ecosystem support out of the box, and you can use the 200G NIC to stick 2 together more efficiently.

Practically, if the goal is 100% about AI and cloud isn't an option for some reason, both options are likely "a great way to waste a couple grand trying to save a couple grand" as you'd get 7x the performance and likely still feel it's a bit slow on larger models using an RTX Pro 6000. I say this as a Ryzen AI Max+ 395 owner, though I got mine because it's the closest thing to an x86 Apple Silicon laptop one can get at the moment.

eitally

2 months ago

3 replies

One of my colleagues wrote a first impressions blog post last week. It's from our company's perspective, but is a solid overview of the product and intended capabilities, from the POV of an AI developer or data scientist.

https://www.anaconda.com/blog/python-nvidia-dgx-spark-first-...

victor106

2 months ago

1 reply

< The CPU memory is the same as the GPU memory and is much larger than any other discrete GPU available in a desktop. That means much larger datasets and bigger models can be run locally than would be possible otherwise.

Isin't this the same architecture that the Mx from Apple implements from a memory perspective?

LtdJorge

2 months ago

Yep, it is

CaptainOfCoit

2 months ago

1 reply

> There you’ll see the 10 Cortex-X925 (“performance”) cores listed with a peak clock rate of 4 GHz, along with the 10 Cortex-A725 (“efficiency”) cores listed with a peak clock rate of 2.8 GHz

> If you start Python and ask it how many CPU cores you have, it will count both kinds of cores and report 20

> Note that because of the speed difference between the cores, you will want to ensure there is some form of dynamic scheduling in your application that can load balance between the different core types.

Sounds like a new type of hell where I now not only need to manage the threads themselves, but also take into account what type of core they run on, and Python straight up report them as the same.

sidewndr46

2 months ago

This is one of the many things that has kept me away from the newer Intel platforms. I don't see the appeal of E-cores on a desktop platform.

NathanielK

2 months ago

This is a much better introduction to the hardware.

MomsAVoxell

2 months ago

1 reply

So, it seems like this makes the DGX a viable ARM-based workstation, for those of us who need/want such a thing, while also offering a relatively decent AI/ML environment.

Two things need to happen for me to get excited about this:

1. It stimulates other manufacturers into building their own DGX-class workstations.

2. This all eventually gets shipped in a decent laptop product.

As much as it pains me, until that happens, it still seems like Apple Sillicon is the more viable option, if not the most ethical.

gjsman-1000

2 months ago

2 replies

NVIDIA, ethical?

bigyabai

2 months ago

1 reply

My heart goes out to all the gamers who discovered they were chopped liver during the crypto boom.

Besides that though, I don't see how Nvidia is particularly non-ethical. They cooperate with Khronos, provide high-quality Linux and BSD drivers free of charge, and don't deliberately block third parties from writing drivers to support new standards. From a relativist standpoint that's as sanctimonious as server hardware gets.

cramsession

2 months ago

1 reply

They make significant investments in Israel and even said they’d build a new factory there. It doesn’t get any less ethical than that!

bigyabai

2 months ago

1 reply

American tech leaders often have no other choice. In most states you can be sued for boycotting, divesting or sanctioning Israel for any reason. If you acquire a company with outstanding obligations to Israel, your only option is to fulfill them.

Specifically WRT Mellanox, Nvidia's behavior was more petty than callous.

cramsession

2 months ago

2 replies

Who would sue them for boycotting and divesting from Israel? This is a false statement (or our democracy is in a much dire state than advertised!).

CamperBob2

2 months ago

Google anti-boycott laws.

And yes... yes it is.

2 months ago

It's not a false statement: https://www.bis.doc.gov/index.php/enforcement/oac

agoodusername63

2 months ago

I thought about how to reply to this for a minute and then realized that I'm so desensitized by American tech companies that all the nonsense NVIDIA gets up to to maintain their economic position barely registers to me anymore.

pertymcpert

2 months ago

3 replies

This article is AI garbage:

ARM64 Architecture: Not x86_64 (limited ML ecosystem maturity) No PyTorch wheels for ARM64+CUDA (must use Docker) Most ML tools optimized for x86

No evidence for any of this whatsoever. The author just asked Claude/claude code to write their article and it just plain hallucinated some rubbish.

furyofantares

2 months ago

1 reply

We're getting slopped every day now and upvoting it.

blurbleblurble

2 months ago

We're like little pigs!

Like in Upstream Color: https://www.youtube.com/watch?v=zfDyEr8Ykcg

mgdev

2 months ago

Yes. Obvious to anyone who writes AI garbage all day.

bradfa

2 months ago

Aarch64 and CUDA has been a thing for many years on Jetson boards. Claiming CUDA is immature on arm is very strange.

amelius

2 months ago

1 reply

Kind of weird that (gpu) training works but inference doesn't ...

mgdev

2 months ago

That makes zero sense.

renaudr

2 months ago

1 reply

Have you tried to run GPT-OSS-120b using TRT-LLM (as you hint NVIDIA probably did it for their benchmark)?

https://cookbook.openai.com/articles/gpt-oss/run-nvidia

EnPissant

2 months ago

Did Nvidia release benchmark numbers for this?

fxtentacle

2 months ago

1 reply

„273 GB/sec memory bandwidth“

Really? Less RAM bw than an Epyc CPU? And 4x to 8x less than a consumer GPU?

How come this doesn’t massively limit LLM inference speeds?

qskousen

2 months ago

It does - the inference speed is much slower than a consumer video card. The draw for the Spark and systems like it are the massive amounts of memory available to the GPU.

RyeCatcherAuthor

2 months ago

5 replies

Author here. I've updated the article based on your feedback. Thank you.

Key corrections:

Ollama GPU usage - I was wrong. It IS using GPU (verified 96% utilization). My "CPU-optimized backend" claim was incorrect.

FP16 vs BF16 - enum caught the critical gap: I trained with BF16, tested inference with FP16 (broken), but never tested BF16 inference. "GPU inference fundamentally broken" was overclaimed. Should be "FP16 has issues, BF16 untested (likely works)."

llama.cpp - veber-alex's official benchmark link proves it works. My issues were likely version-specific, not representative.

ARM64+CUDA maturity - bradfa was right about Jetson history. ARM64+CUDA is mature. The new combination is Blackwell+ARM64, not ARM64+CUDA itself.

The HN community caught my incomplete testing, overclaimed conclusions, and factual errors.

Ship early, iterate publicly, accept criticism gracefully.

Thanks especially to enum, veber-alex, bradfa, furyofantares, stuckinhell, jasonjmcghee, eadwu, and renaudr. The article is significantly better now.

colechristensen

2 months ago

1 reply

This looks like better peer review than most of what gets done for scientific papers.

anticensor

2 months ago

This is what I would call the value of post-publication review. Pre-publication review is not enough.

Tiberium

2 months ago

1 reply

Is there a reason why you used an LLM for the entire article, and moreover, even for this comment? Couldn't you have at least written this comment yourself?

CamperBob2

2 months ago

To be charitable, I'm assuming that their English skills aren't good. If LLMs allow us to hear from potentially billions of people who may have something worthwhile to say but who fall into that category, I wouldn't want to discourage their use in articles like this one.

But if that's not the case, then yeah, it's a crappy practice and I'd hate to see it spread any further than it already has.

sgillen

2 months ago

Late to the party here, but you should definitely be using pytorch 25.09 (or whatever is latest when you go to check) rather than 24.10. That's a year old pytorch on new hardware, I suspect a lot of these bugs have been fixed.

loufe

2 months ago

Yeah, kudos, OP. It's a very different read before-after.

justinclift

2 months ago

> Ollama 0.3.9 for inference

Is that version correct?

Asking because (in Ollama terms) it's positively ancient. 0.12.6 being the most recent release (currently).

I'm guessing it _might_ make a difference, as the Ollama crowd do seem to be changing things, adding new features and optimisations (etc) quite often.

For example, that 0.12.6 version is where initial experimental support for Vulkan (ie Intel Xe gpus) was added, and in my testing that worked. Not that Vulkan support would do anything in your case. ;)

enum

2 months ago

- https://publish.obsidian.md/aixplore/Practical+Applications/...

   Does it work if you change to torch.bfloat16?

- https://publish.obsidian.md/aixplore/Practical+Applications/...

  The PyTorch 2.9 wheels do work. You can pip install torch --index-url <whatever-it-is> and it just works. You do need to build flash attention from source, which takes an hour or so.

spwa4

2 months ago

Am I reading this right? I was expecting much more performance. My 64G M1 Max has 40.72 tok/s on ollama/GPT-OSS-20B (less than half the price of this machine), and M4 Max 128G from a colleague (but 32G would work) gets about 67 tok/s on ollama/GPT-OSS-20B, and apparently the most recent software updates push that to 78 tok/s. The DGX Spark gets 82.74 tok/s.

Ryzen Max 395+ gets you 55 tok/s [1]

[1] https://www.reddit.com/r/LocalLLaMA/comments/1nabcek/anyone_...

semessier

2 months ago

Nvidia products including from the GPU/CUDA libraries world, the NICs and switches tend to feel like MVP frequently. It works in some cases, hopefully in the end but they are far from polished products without rough edges.

aseipp

2 months ago

I'm not yet using mine for ML stuff because there are still a lot of various issues like this post outlined. But I am using mine as an ARM dev system in the meantime, and as a "workstation" it's actually quite good. The Cortex-X925 cores are Zen5 class in performance and it is overall an absolute unit for its size, I'm very impressed that a standard ARM core is pushing this level of performance for a desktop-class machine. I thought about buying a new Linux desktop recently, and this is good enough I might just plug it into a monitor and use it instead.

It is also a standard UEFI+ACPI system; one Reddit user even reported that they were able to boot up Fedora 42 and install the open kernel modules no problem. The overall delta/number of specific patches for the Canonical 6.17-nvidia tree is pretty small when I looked (the current kernel is 6.11). That and the likelihood the consumer variant will support Windows hopefully bodes well for its upstream Linux compatibility, I hope.

To be fair, most of this also true of Strix Halo from what I can tell (most benchmarks put the DGX furthest ahead at prompt processing and a bit ahead at raw token output. But the software is still buggy and Blackwell is still a bumpy ride overall, so it might get better). But I think it's mostly the pricing that is holding it back. I'm curious what the consumer variant will be priced at.

buyucu

2 months ago

Strix Halo from AMD appears to be a much more consumer-friendly alternative than DGX Spark.

RyeCatcherAuthor

2 months ago

Would love to hear from others using the spark for model training and development.

eadwu

2 months ago

There are bleeding edge issues, everyone dials into transformers so that's generally pain proof.

I haven't exactly bisected the issue but I'm pretty sure convolutions are broken on sm_121 after a certain size, getting 20x memory blowup from a convolution from a 2x batch size increase _only_ on the DGX Spark.

I haven't had any problems with inference, but I also don't use the transformers library that much.

llama.cpp was working for openai-oss last time I checked and on release, not sure if something broke along the way.

I don't exactly know if memory fragmentation is something fixable on the driver side - this might just be the problem with kernel's policy and GPL, it prevents them from automatically interfering with the memory subsystem to the granularity they'd like - see zfs and their page table antics - or so my thoughts on it is.

If you've done stuff on WSL, you have similar issues and you can fix it by running a service that normally compacts and clean memory, I have it run every hour. Note that this does impact at the very least CPU performance and memory allocation speeds, but I have not have any issue with long training runs with it (24hr+, assuming that is the issue, I have never tried without it and put that service in place since getting it due to my experience on WSL).

1 more comments available on Hacker News

View full discussion on Hacker News

ID: 45713835Type: storyLast synced: 11/20/2025, 8:42:02 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN