Sep 4, 2025 at 6:09 AM EDT
Untitled
I think we're talking past each other, I'll try once more. Suppose you train an LLM on a very small corpus of data, such as all the content of the library of congress. Then you have that LLM author new works. Then you train a new LLM on the original corpus plus this new material. Do you really think you've addressed the core issue in the SP? Can more parameters be meaningfully trained even if you add more GPU?
To me, the answer is clearly no. There is no new information content in the generated data. Its just a remix of what already exists.
Discussion Activity
Light discussionFirst comment
3h
Peak period
2
Day 1
Avg / period
2
Comment distribution2 data points
Loading chart...
Based on 2 loaded comments
Key moments
- 01Story posted
Sep 4, 2025 at 6:09 AM EDT
3 months ago
Step 01 - 02First comment
Sep 4, 2025 at 9:18 AM EDT
3h after posting
Step 02 - 03Peak activity
2 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 4, 2025 at 2:23 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Discussion (0 comments)
Discussion hasn't started yet.
ID: 45125539Type: commentLast synced: 11/17/2025, 10:10:46 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.