Memory Optimizations to Reduce CPU Costs
Posted4 months agoActive4 months ago
ayende.comTechstory
calmpositive
Debate
60/100
Memory OptimizationPerformanceData Processing
Key topics
Memory Optimization
Performance
Data Processing
The article discusses memory optimizations to reduce CPU costs, and the discussion revolves around the importance of efficient data processing and alternative approaches to handling large datasets.
Snapshot generated from the HN discussion
Discussion Activity
Moderate engagementFirst comment
20h
Peak period
9
18-24h
Avg / period
3.6
Comment distribution18 data points
Loading chart...
Based on 18 loaded comments
Key moments
- 01Story posted
Aug 25, 2025 at 6:42 AM EDT
4 months ago
Step 01 - 02First comment
Aug 26, 2025 at 2:21 AM EDT
20h after posting
Step 02 - 03Peak activity
9 comments in 18-24h
Hottest window of the conversation
Step 03 - 04Latest activity
Aug 28, 2025 at 2:11 PM EDT
4 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45012414Type: storyLast synced: 11/20/2025, 4:38:28 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
but, maybe we'd need to know more about how the output data is consumed to know if this would actually help much in the real application. if the next stage of processing wants to randomly access records using Get(int i), where i is the index of the item, then even if we transform the input to JSON with a constant amount of RAM, we still have to store this output JSON somewhere so we can Get those items.
the blog post mentioned "padding", i didn't immediately understand what that was referring to (padding in the output format?) but i guess it must be talking about struct padding, where the items were previously stored as an array of structs, while the code in the article transposed everything into homogeneous arrays, eliminating the overhead of padding
If we had an "array of structs" instead of "struct of arrays" it would be: string(8) + long(8) + int(4) + padding(4) = 24 bytes
The sibling comment provided a good hint already. All you need to store are some file offsets, amounting to a few dozen bytes.
The problem isn’t well constrained because it seems to imply that for some reason it needs to be all accessible in memory, doesn’t specify the cardinality of terms, doesn’t specify whether Get(i) is used in a way that requires that particular interface for accessing a row by number.
If I were to do it, I’d just parse a Page at a time and update a metadata index saying Page P contains entries starting at N. The output file could be memmapped and only the metadata loaded, allowing directly index into the correct Page which could be quickly scanned for a record, and would maybe use 1-2MB of RAM for metadata and whatever Pages are actually being touched.
But like I said the problem is not well constrained enough for even a solution like that to be optimal, since it would suffer from full dataset sequential or random access, as opposed to hot Pages and a long tail.
/shrug specs matter if you’re in the optimization phase
The significant decrease they talk about is a side effect of their chosen language having a GC. This means the strings take more work to deal with than expected.
This feels more like this speaks to the fact that the often small costs associated with certain operations do eventually add up. it's not entirely clear in the post where and when the cost from the GC is incurred, though; I'd presume on creation and destruction?
edit: There are tricks to not traverse a compound object every time, but assume that at least one of the 80M objects in that giant array gets modified in between GC activations.
How much a GC is of total cpu cost totally depends on the application, the GC implementation and the language. It's famously hard to measure what the memory management overhead is, GC in production is anywhere between 7-82% (Cai ISPASS2022). I measured about 19% geomean overhead in accurate simulation by ignoring instructions involved in GC/MM in python's pyperf benchmarks.
https://gameprogrammingpatterns.com/data-locality.html