Auto-Grading Decade-Old Hacker News Discussions with Hindsight
Key topics
The auto-grading of decade-old discussions is sparking a lively debate about the implications of being judged by AI on past thoughts and predictions. Commenters are drawing parallels to dystopian concepts like Roko's Basilisk and the Panopticon, highlighting the unease of being surveilled by LLMs, whether directly or through human intermediaries. While some, like HighGoldstein, point out that online posts are already being used in various ways, others are making bets on future technological shifts, such as Nvidia's GPU dominance and the potential for LLMs to build applications from scratch. The discussion is also revisiting infamous past comments, like the Dropbox prediction, and reassessing their prescience in light of current trends.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
4m
Peak period
100
0-6h
Avg / period
14.5
Based on 160 loaded comments
Key moments
- 01Story posted
Dec 10, 2025 at 12:23 PM EST
24 days ago
Step 01 - 02First comment
Dec 10, 2025 at 12:27 PM EST
4m after posting
Step 02 - 03Peak activity
100 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 14, 2025 at 3:11 AM EST
20 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Shades of Roko's Basilisk!
Your past thoughts have been dredged up and judged for their prescience.
For each $TOPIC, you have been awarded a grade by GPT-5.1 Thinking.
Your grade is based solely on what OpenAI's blob of weights considers factual in 2025.
Did you think well?
Are you an Alpha or a Delta-Minus?
Where will the dragnet grading of your thoughts happen next?
* Nvidia GPUs will see heavy competition and most chat-like use-cases switching to cheaper models and inference-specific-silicon but will be still used on the high end for critical applications and frontier science
* Most Software and UIs will be primarily AI-generated. There will be no 'App Stores' as we know them.
* ICE Cars will become niche and will be largely been replaced with EVs, Solar will be widely deployed and will be the dominate source of power
* Climate Change will be widely recognized due to escalating consequences and there will be lots of efforts in mitigations (e.g, Climate Engineering, Climate-resistant crops, etc).
Swift is Open Source https://hn.unlurker.com/replay?item=10669891
Launch of Figma, a collaborative interface design tool https://hn.unlurker.com/replay?item=10685407
Introducing OpenAI https://hn.unlurker.com/replay?item=10720176
The first person to hack the iPhone is building a self-driving car https://hn.unlurker.com/replay?item=10744206
SpaceX launch webcast: Orbcomm-2 Mission [video] https://hn.unlurker.com/replay?item=10774865
At Theranos, Many Strategies and Snags https://hn.unlurker.com/replay?item=10799261
Says who? But also, it doesn’t suggest what you imply. I could as easily conclude: “Oh wow, the people who actually experience the system like it that much? Awesome!”
Miss it for reddit as well. Top day/week/month/alltime makes it hard to find top a month in 2018.
What do you mean?
I suppose they want to make the comments seem "fresh" but it's a deliberate misrepresentation. You could probably even contrive a situation where it could be damaging, e.g. somebody says something before some relevant incident, but the website claims they said it afterwards.
But, I'm just guessing here based on my own refactoring experience through the years, may be a completely different reason, or even by mistake? Who knows? :)
This only manipulates the children references though, never the item ID itself. So if you have the item ID of an item (submission, comment, poll, pollItem), it'll be available there as long as moderators don't remove it, which happens very seldom.
It's a shame that maintaining the web is so hard that only a few websites are "good citizens". I wish the web was a -bit- way more like git. It should be easier to crawl the web and serve it.
Say, you browse and get things cached and shared, but only your "local bookmarks" persist. I guess it's like pinning in IPFS.
It is not possible right now to make hosting democratized/distributed/robust because there's no way for people to donate their own resources in a seamless way to keeping things published. In an ideal world, the internet archive seamlessly drops in to serve any content that goes down in a fashion transparent to the user.
The wanting to is in my mind harder. How do you convince people that having the network is valuable enough? It's easy to compare it with the web backed by few feuds that offer for the most part really good performance, availability and somewhat good discovery.
Keeps the spotlight on carefully protected communities like this one.
It's not hard actually. There is a lack of will and forethought on the part of most maintainers. I suspect that monetization also plays a role.
1. https://www.w3.org/Provider/Style/URI
And scroll down to the bottom.
According to the ratings for example, one person both had extremely racist ideas but also made a couple of accurate points about how some tech concepts would evolve.
I try to temper my tendency to believe the Halo effect with Warren Buffett's notion of the Circle of Competence; there is often a very narrow domain where any person can be significantly knowledgeable.
I wonder if that research is replicable at present (lots of social science research isn't!). I also hope that in the future, as people get serious about making testable predictions and calibrating themselves, that studies will reflect this gain. One great metric of societal intellectual progress would be the distribution of calibration scores across a population. In other words, we want people to be appropriately confident given their personal experiences.* This capability unlocks so many others.
* If we get this far, imagine what happens when people start realizing that experience is so key. A person might actually think to themself e.g. "The biggest difference between myself and Other Guy isn't demographic or socioeconomic. The salient different is he spent two years in a different country. Maybe if I had that experience, I would have a clearer lens on my home country, and maybe I would agree with more of his takes on the world."
> I try to temper my tendency to believe the Halo effect with Warren Buffett's notion of the Circle of Competence; there is often a very narrow domain where any person can be significantly knowledgeable. (commenter above)
Putting aside Buffett in particular, I'm wary of claims like "there is often a very narrow domain where any person can be significantly knowledgeable" because the follow-up questions are make-or-break. How often? How narrow of a domain? Doesn't it depend on arbitrary definitions of what qualifies as a category? Is this a testable theory? Is it a predictive theory? What does empirical research and careful analysis show?
It would be very interesting to see this applied year after year to see if people get better or worse over time in the accuracy of their judgments.
It would also be interesting to correlate accuracy to scores, but I kind of doubt that can be done. Between just expressing popular sentiment and the first to the post people getting more votes for the same comment than people who come later it probably wouldn’t be very useful data.
A non trivial amount of people get laid off, likely due to a finanical crisis which is used as an excuse for companies scale up use of AI. Good chance the financial crisis was partly caused by AI companies, which ironically makes AI cheaper as infra is bought up on the cheap (so there is a consolidation, but the bountiful infra keeps things cheap). That results in increased usage (over a longer period of time). and even when the economy starts coming back the jobs numbers stay abismal.
Politics are divided into 2 main groups, those who are employed, and those who are retired. The retired group is VERY large, and has alot of power. They mostly care about entitlements. The employed age people focus on AI which is making the job market quite tough. There are 3 large political forces (but 2 parties). The Left, the Right, and the Tech Elite. The left and the right both hate AI, but the tech elite though a minority has outsized power in their tie breaker role. The age distributions would surprise most. Most older people are now on the left, and most younger people are split by gender.
Unlike the 20th century America is a more focused global agenda. We're not policing everyone, just those core trading powers. We have not gone to war with China, China has not taken over Taiwan.
Physical robotics is becoming a pretty big thing, space travel is becoming cheaper. We have at least one robot on an astroid mining it. The yield is trivial, but we all thought it was neat.
Energy is much much greener, and you wouln't have guessed it... but it was the data centers that got us there. The Tech elite needed it quickly, and used the political connections to cut red tape and build really quickly.
I know that "X is destroying democracy, vote for Y" has been a prevalent narrative lately, but is there any evidence that it's true? I get that it's death by a thousand cuts, or "one step at a time" as they say.
https://hn.algolia.com/?dateRange=all&page=0&prefix=true&que...
But yesterday's thread and this one are quite clearly exceptions—far above the median. https://news.ycombinator.com/item?id=46212180 was particularly incredible I think!
A personal favourite is “the contrarian dynamic”.
Do you have a list of those at the ready or do you just remember them? If you feel like sharing, what’s your process and is there a list of those you’d make public?
I imagine having one would be useful, e.g. for onboarding someone like tomhow, though that doesn’t really happen often.
The process is simply that moderation is super repetitive, so eventually certain pathways get engraved in one's memory. A lot of the time, though, I can't quite remember one of these patterns and I'm unable to dig up my past comments about it. That's annoying, in that particular way when your brain can feel something's there but is unable to retrieve it.
I cannot believe this is just put out there unexamined of any level of "maybe we shouldn't help this happen". This is complete moral abdication. And to be clear, being "good" is no defense. Being good often means being unaligned with the powerful, so being good is often the very thing that puts you in danger.
I would read his "Best to be good." as a warning or reminder that everything you do or say online will be collected and analyzed by an "intelligence". You can't count on hiding amongst the mass of online noise.
If you have any ideas on how to stop everyone from building the torment nexus, I am willing to listen.
1. Don't build the Torment Nexus yourself. Don't work for them and don't give them your money.
2. When people you know say they're taking a new job to work at Torment Nexus, act like that's super weird, like they said they're going to work for the Sinaloa cartel. Treat rich people working on the Torment Nexus like it's cringe to quote them.
3. Get hostile to bots. Poison the data. Use AdNauseum and Anubis.
4. Give your non-tech friends the vague sense that this stuff is bad. Some might want to listen more, but most just take their sense of what's cool and good from people they trust in the area.
While I don't have a general solution, I do believe that the solution will need to be multi-faceted and address multiple aspects of the technologies enabling this. My first step would be for society to re-evaluate and shift its views towards information, both locally and internationally.
For example, if you proposed to get rid of all physical borders between countries, everyone would likely be aghast. Obviously there are too many disagreements and conflicting value sets between countries for this to happen. Yet in the west we think nothing have having no digital information borders, despite the fact that the lack of them in part enables this data collection and other issues such as election interference. Yes, erecting firewalls is extremely unpalatable to people in the west, but is almost certainly part of the solution on the national level. Countries like China long ago realized this, though they also use firewalls as a means of control, not just protection (it doesn't have to be this way).
But within countries we also need to shift away from a default position of "I have the right to say whatever I want so therefore I should" and into one of "I'm not putting anything online unless I'm willing to have my employer, parents, literally everyone, read it." Also, we need to systematically attack and dismantle the advertising industry. That industry is one of the single biggest driving factors behind the extreme systematic collection and correlation of data on people. Advertising needs to switch to a "you come to me" approach not a "I'm coming to you" approach.
Don't know why that just popped into my head.
We can't start clutching our pearls now as if programmatic mass surveillance hasn't been running on all cylinders for over 20 years.
Don't get me wrong, we should absolutely care about this, everyone should. I'm just saying any vague gestures at imminent privacy-doom thanks to LLMs is liable to be doing some big favors of inadvertently sanitizing the history of prior (and still) egregious privacy offenders.
I'm just suggesting more "Yes and" and less "pearl clutching" is all.
https://old.reddit.com/r/funny/comments/1pj5bg9/al_companies...
Governments around the world have profiles on people and spiders that quietly amass the data that continuously updates those profiles.
It's just a matter of time before hardware improves and we see another holocaust scale purge facilitated by robots.
Surveillance capitalism won.
It's subjective of course but at least it's transparently so.
I just think it's neat that it's kinda sorta a loose proxy for what you're talking about but done in arguably the simplest way possible.
You can give them a "venting sink" though. Instead of having a downvote button that just downvotes, have it pop up a little menu asking for a downvote reason, with "spam" and "disagree" as options. You could then weigh downvotes by which option was selected, along with an algorithm to discover "user honesty" based on whether their downvotes correlate with others or just with the people on their end of the political spectrum, a la Birdwatch.
a group of them certainly is an echo chamber; why isn't your view?
The tools for controlling your feed are reducing on social media like Instagram, TikTok, Youtube, etc., but simply saying that you follow and respect the opinions of a select group doesn't necessarily mean you're forming an echo chamber.
This is different from something like flat earth/other conspiracy theories where when confronted with opposite evidence, they aren't likely to engage with it in good faith.
Actually they mostly don't. Lots of infighting over the real true answer .. (infinite flat earth, finite but with impassable ice walls, ..)
Of course in the above example of stocks there are clear predictions (HNWS will go up) and an oracle who resolves it (stock market). This seems to be a way harder problem for generic free form comments. Who resolves what prediction a particular comment has made and whether it actually happened?
Didn't somebody make an ETF once that went against the prediction of some famous CNBC stock picker, showing that it would have given you alpha in the past.
> seems to be a way harder problem for generic free form comments.
That's what prediction markets are for. People for whom truth and accuracy matters (often concentrated around the rationalist community) will often very explicitly make annual lists of concrete and quantifiable predictions, and then self-grade on them later.
Makes for great pump n dump if you're day trading and willing to ride
https://www.investopedia.com/terms/c/cramerbounce.asp
long-term his choices don't do well, so the Inverse Cramer basically says "do the opposite of this goober" and has solid returns (sorta; depends a lot on methodology, and the sole hedgefund playing that strategy shutdown)
What came back were the usual suspects: GLP-1 companies and AI.
Back to the "boring but right" thesis. Not much alpha to be found
[1]: https://sybilpredicttrust.info/
IIRC, when comment moderation and scoring came to Slashdot, only a random (and changing) selection of users were able to moderate.
Meta-moderation came a bit later. It allowed people to review prior moderation actions and evaluate the worth of those actions.
Those users who made good moderations were more likely to become a mod again in the future than those who made bad moderations.
The meta-mods had no idea whose actions they were evaluating, and previous/potential mods had no idea what their score was. That anonymity helped keep it honest and harder to game.
Even today, "ASI will kill us all" can be a pretty divisive declaration - hardly safe and boring.
From the couple of threads I clicked, it seemed like this LLM-driven analysis was picking up on that, too: the top comments were usually bold, and some of the worst-rated comments was the "safe and boring" declaration that nothing interesting ever really happens.
Why stop there?
If you can do that you can score them on all sorts of things. You could make a "this person has no moral convictions and says whatever makes the number go up" score. Or some other kind of score.
Stuff like this makes the community "smaller" in a way. Like back in the old days on forums and IRC you knew who the jerks were.
(And we do have that in real life. Just as, among friends, we do keep track of who is in whose debt, we also keep a mental map of whose voice we listen to. Old school journalism still had that, where people would be reading someone’s column over the course of decades. On the internet, we don’t have that, or we have it rarely.)
They were right, Duolingo.
https://news.ycombinator.com/item?id=10654216
The Cannons on the B-29 Bomber "accurate account of LeMay stripping turrets and shifting to incendiary area bombing; matches mainstream history"
It gave a good grade to user cstross but to my reading of the comment, cstross just recounted a bit of old history. The evaluation gave cstross for just giving a history lesson or no?
I took the narcissistic approach of searching for myself. Here's a grade of one of my comments[1]:
>slg: B- (accurate characterization of PH’s “networking & facade” feel, but implicitly underestimates how long that model can persist)
And here's the actual comment I made[2]:
>And maybe it is the cynical contrarian in me, but I think the "real world" aspect of Product Hunt it what turned me off of the site before these issues even came to the forefront. It always seemed like an echo chamber were everyone was putting up a facade. Users seemed more concerned with the people behind products and networking with them than actually offering opinions of what was posted.
>I find the more internet-like communities more natural. Sure, the top comment on a Show HN is often a critique. However I find that more interesting than the usual "Wow, another great product from John Developer. Signing up now." or the "Wow, great product. Here is why you should use the competing product that I work on." that you usually see on Product Hunt.
I did not say nor imply anything about "how long that model can persist", I just said I personally don't like using the site. It's a total hallucination to claim I was implying doom for "that model" and you would only know that if you actually took the time to dig into the details of what was actually said, but the summary seems plausible enough that most people never would.
The LLM processed and analyzed a huge amount of data in a way that no human could, but the single in-depth look I took at that analysis was somewhere between misleading and flat out wrong. As I said, a perfect example of what LLMs do.
And yes, I do recognize the funny coincidence that I'm now doing the exact thing I described as the typical HN comment a decade ago. I guess there is a reason old me said "I find that more interesting".
[1] - https://karpathy.ai/hncapsule/2015-12-18/index.html#article-...
[2] - https://news.ycombinator.com/item?id=10761980
With that context, if someone were to read your comment and be asked 'does this person think the product's model is viable in the long run' I think a lot of people would answer 'no'.
The LLM isn't misinterpreting the text, it's just representing people who misinterpreted the text isn't the defense you seem to think it is.
I scoped my comment specifically around what a reasonable human answer would be if one were asked the particular question it was asked with the available information it had. That's all.
Btw I agree with your comment that it hallucinated/assumed your intent! Sorry I did not specify that. This was a bit of a 'play stupid games win stupid prizes' prompt by the OP. If one asks an imprecise question one should not expect a precise answer. The negative externality here is reader's takeaways are based on false precision. So is it the fault of the question asker, the readers, the tool, or some mix? The tool is the easiest to change, so probably deserves the most blame.
I think we'd both agree LLMs are notoriously overly-helpful and provide low confidence responses to things they should just not comment on. That to me is the underlying issue - at the very least they should respond like humans do not only in content but in confidence. It should have said it wasn't confident about its response to your post, and OP should have thus thrown its response out.
Rarely do we have perfect info, in regular communications we're always making assumptions which affect our confidence in our answers. The question is what's the confidence threshold we should use? This is the question to ask before the question of 'is it actually right?', which is also an important question to ask, but one I think they're a lot better at than the former.
Fwiw you can tell most LLMs to update its memory to always give you a confidence score 0.0-1.0. This helps tremendously, it's pretty darn accurate, it's something you can program thresholds around, and I think it should be built in to every LLM response.
The way I see it, LLMs have lots and lots of negative externalities that we shouldn't bring into this world (I'm particularly sensitive to the effects on creative industries), and I detest how they're being used so haphazardly, but they do have some uses we also shouldn't discount and figure out how to improve on. The question is where are we today in that process?
The framework I use to think about how LLMs are evolving is that of transitioning mediums. Like movies started as a copy/paste of stage plays before they settled into the medium and understand how to work along the grain of its strengths & weaknesses to create new conventions. Speech & text are now transitioning into LLMs. What is the grain we need to go along?
My best answer is the convention LLMs need to settle into is explicit confidence, and each question asked of them should first be a question of what the acceptable confidence threshold is for such a question. I think every question and domain will have different answers for that, and we should debate and discuss that alongside any particular answer.
This seems to be the result of the exercise? No evaluation?
My concern is that, even if the exercise is only an amusing curiosity, many people will take the results more seriously than they should, and be inspired to apply the same methods to products and initiatives that adversely affect people's lives in real ways.
That will most definitely happen. We already have known for awhile that algorithmic methods have been applied "to products and initiatives that adversely affect people's lives in real ways", for awhile: https://www.scientificamerican.com/blog/roots-of-unity/revie...
I guess the question is if LLMs for some reason will reinvigorate public sentiment / pressure for governing bodies to sincerely take up the ongoing responsibility of trying to lessen the unique harms that can be amplified by reckless implementation of algorithms.
My original goal was to prune the account deleting all the useless things and keeping just the unique, personal, valuable communications -- but the other day, an insight has me convinced that the safer / smarter thing to do in the current landscape is the opposite: remove any personal, valuable, memorable items, and leave google (and whomever else is scraping these repositories) with useless flotsam of newsletters, updates, subscription receipts, etc.
Any chance you can outline the steps/prompts/tools you used to run this?
I've been building a 2nd brain type project, that plugs into all my work places and a custom classifier has been on that list that would enhance that.
Compared to what happens next? Does tptacek's commentary become market signal equivalent to the Fed Chair or the BLS labor and inflation reports?
It's a good comment, but "prescient" isn't a word I'd apply to it. This is more like a list of solid takes. To be fair there probably aren't even that many explicit, correct predictions in one month of comments in 2015.
If an LLM were acting as a kind of historian revisiting today’s debates with future context, I’d bet it would see the same pattern again and again: the sober, incremental claims quietly hold up, while the hyperconfident ones collapse.
Something like "Lithium-ion battery pack prices fall to $108/kWh" is classic cost-curve progress. Boring, steady, and historically extremely reliable over long horizons. Probably one of the most likely headlines today to age correctly, even if it gets little attention.
On the flip side, stuff like "New benchmark shows top LLMs struggle in real mental health care" feels like high-risk framing. Benchmarks rotate constantly, and “struggle” headlines almost always age badly as models jump whole generations.
I bet theres many "boring but right" takes we overlook today and I wondr if there's a practical way to surface them before hindsight does
LLMs have seen huge improvements over the last 3 years. Are you going to make the bet that they will continue to make similarly huge improvements, taking them well past human ability, or do you think they'll plateau?
The former is the boring, linear prediction.
Sure yeah why not
> taking them well past human ability,
At what? They're already better than me at reciting historical facts. You'd need some actual prediction here.
They're already better than you at reciting historical facts. I'd guess they're probably better at composing poems (they're not great but far better than the average person).
Or you agree with me? I'm not looking for prescience marks, I'm just less convinced that people really make the more boring and obvious predictions.
I'll make one prediction that I think will hold up. No LLM-based system will be able to take a generic ask like "hack the nytimes website and retrieve emails and password hashes of all user accounts" and do better than the best hackers in the world, despite having plenty of training data to go off of. It requires out-of-band thinking that they just don't possess.
s/"free"/stolen/
It does seem better than just upvotes and downvotes though.
The EU may give LLM surveillance an F at some point.
> (Copying my comment here from Reddit /r/rust:) Just to repeat, because this was somewhat buried in the article: Servo is now a multiprocess browser, using the gaol crate for sandboxing. This adds (a) an extra layer of defense against remote code execution vulnerabilities beyond that which the Rust safety features provide; (b) a safety net in case Servo code is tricked into performing insecure actions. There are still plenty of bugs to shake out, but this is a major milestone in the project.
106 more comments available on Hacker News