Powerful Gpus or Fast Interconnects: Analyzing Relational Workloads
Posted4 months agoActive4 months ago
vldb.orgResearchstory
informativeneutral
Debate
20/100
GPU AllocationRelational WorkloadsDatabase Systems
Key topics
GPU Allocation
Relational Workloads
Database Systems
Discussion Activity
Light discussionFirst comment
3d
Peak period
1
78-84h
Avg / period
1
Key moments
- 01Story posted
Aug 25, 2025 at 5:48 PM EDT
4 months ago
Step 01 - 02First comment
Aug 29, 2025 at 3:49 AM EDT
3d after posting
Step 02 - 03Peak activity
1 comments in 78-84h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 29, 2025 at 3:49 AM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45019458Type: storyLast synced: 11/18/2025, 12:06:58 AM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
While most research is commendable I think this feels a bit as one that goes in from the wrong starting point.
Unified memory has become a thing (Apple machines, Nvidia AI machines like the GH200, recent AMD "AI" machines) and as people are aware AI workloads (similar to DB) are bandwidth bound (why we often use 4bit and 8bit values today), to become compute bound one would need to do more expensive stuff than graphics shaders (not common in DB queries).
So, the focus of research should be:
A: How are the queries in these setups vs simply running on unified memory machines, is there enough of a win for discrete to trounce the complexity (the GH200 perf advantage seems to partially answer it since iirc it's unified?).
B: What is the overhead of firing off query operations VS just running on-CPU? is query compilation overhead noticable if it's mostly novel non-cached queries?
C: For keeping it on the GPU, are there options today for streaming directly to-GPU bypassing ram / host entirely?