Why Proteins Fold and How Gpus Help Us Fold
Key topics
The intricacies of protein folding and the role of GPUs in cracking the code sparked a lively debate, with some commenters tearing apart the original article as "drivel" and "garbage" due to its oversimplification of complex scientific issues. Critics pointed out glaring errors, such as incorrect chemical formulas and a misguided narrative that AI companies solved protein folding "in an afternoon." Despite the backlash, some readers appreciated the article's accessible explanation of proteins and the optimism surrounding the field, with one commenter noting that the text was "pretty reasonable overall" aside from a flawed graphic. The discussion ultimately highlighted the challenges of communicating nuanced scientific concepts to a broad audience.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
60m
Peak period
28
0-6h
Avg / period
11.7
Based on 35 loaded comments
Key moments
- 01Story posted
Dec 15, 2025 at 1:05 AM EST
19 days ago
Step 01 - 02First comment
Dec 15, 2025 at 2:05 AM EST
60m after posting
Step 02 - 03Peak activity
28 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 17, 2025 at 5:47 AM EST
16 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
This article is garbage and makes many incorrect claims, and it is clearly AI-generated. E.g. the claim that "AlphaFold doesn't simulate physics. It recognizes patterns learned from 170,000+ known protein structures" couldn't be farther from the truth. Physical models are baked right into AlphaFold models and development at multiple steps, it is a highly unique architecture and approach.
> as you're reading this, there are approximately 20,000 different types of proteins working inside your body.
From https://biologyinsights.com/how-many-human-proteins-are-ther...
"The human genome contains approximately 19,000 to 20,000 protein-coding genes. While each gene can initiate the production of at least one protein, the total count of distinct proteins is significantly higher. Estimates suggest the human body contains 80,000 to 400,000 different protein types, with some projections reaching up to a million, depending on how a “distinct protein” is defined."
Plus, that's just in the human DNA. In your body are a whole bunch of bacteria, adding even more types of protein.
> The actual number of protein molecules? Billions. Trillions if we're counting across all your cells.
There are on average 10 trillion proteins in a single cell. https://nigms.nih.gov/biobeat/2025/01/proteins-by-the-number... There are over 30 trillion human cells in an adult. https://pmc.ncbi.nlm.nih.gov/articles/PMC4991899/ . That's about 300 septillion proteins in the body. While yes, that's "trillions" in some mathematical sense, in that case it's also "tens" of proteins.
(The linked-to piece later says "every single one of your 37 trillion cells", showing that "trillions" is far from the correct characterization. "trillions of trillions" would get the point across better.)
> Each one has a specific job.
Proteins can do multiple jobs, unless you define "job" as "whatever the protein does."
Eg, from https://pmc.ncbi.nlm.nih.gov/articles/PMC3022353/
"many of the proteins or protein domains encoded by viruses are multifunctional. The transmembrane (TM) domains of Hepatitis C Virus envelope glycoprotein are extreme examples of such multifunctionality. Indeed, these TM domains bear ER retention signals, demonstrate signal function and are involved in E1:E2 heterodimerization (Cocquerel et al. 1999; Cocquerel et al. 1998; Cocquerel et al. 2000). All these functions are partially overlapped and present in the sequence of <30 amino acids"
> And if even ONE type folds wrong, one could get ... sickle cell anemia
Sickle cell anemia is due to a mutation in the hemoglobin gene causing a hydrophobic patch to appear on the surface, which causes the hemoglobins to stick to each other.
It isn't caused by misfolding. https://en.wikipedia.org/wiki/Sickle_cell_disease
(I haven't researched the others to see if they are due to misfolding.)
> Your body makes these proteins perfectly
No, it doesn't. The error rate is quite low, but not perfect. Quoting https://pmc.ncbi.nlm.nih.gov/articles/PMC3866648/
"Errors are more frequent during protein synthesis, resulting either from misacylation of tRNAs or from tRNA selection errors that cause insertion of an incorrect amino acid (misreading) shifting out of the normal reading frame (frameshifting), or spontaneous release of the peptidyl-tRNA (drop-off) (Kurland et al. 1996). Misreading errors are arguably the most common translational errors (Kramer and Farabaugh 2007; Kramer et al. 2010; Yadavalli and Ibba 2012)."
> Then AI companies showed up in 2020 and said "we got this" and solved it in an afternoon.
They didn't simply "show up" in 2020. Google DeepMind was working on it since 2016 or so. https://www.quantamagazine.org/how-ai-revolutionized-protein...
> we're DESIGNING entirely new proteins that have never existed in nature
We've been designing new proteins that have never existed in nature for decades. From https://en.wikipedia.org/wiki/Protein_design
"The first protein successfully designed completely de novo was done by Stephen Mayo and coworkers in 1997 ... Later, in 2008, Baker's group computationally designed enzymes for two different reactions.[7] In 2010, one of the most powerful broadly neutralizing antibodies was isolated from patient serum using a computationally designed protein probe.[8] In 2024, Baker received one half of the Nobel Prize in Chemistry for his advancement of computational protein design, with the other half being shared by Demis Hassabis and John Jumper of Deepmind for protein structure prediction."
> These are called secondary structures, local patterns in the protein backbone
The corresponding figure is really messed up. The sequence of atoms in the amino acids are wrong, and the pairs of atoms which are hydrogen bonded are wrong. For example, it shows a hydrogen bond between two double-bonded oxygens, which don't have a hydrogen, and a hydrogen bond between two hydrogens, which would both have partial positive charge. The hydrogen bonds are suppose to go from the N-H to the O=C. See https://en.wikipedia.org/wiki/Beta_sheet#Hydrogen_bonding_pa...
> Given the same sequence, you get the same structure.
The structure may depend on environmental factors. For example, https://en.wikipedia.org/wiki/%CE%91-Lactalbumin "α-lactalbumin is a protein that regulates the production of lactose in the milk of almost all mammalian species ... A folding variant of human α-lactalbumin that may form in acidic environments such as the stomach, called HAMLET, probably induces apoptosis in tumor and immature cells."
There can also be post-translational modifications.
> The sequence contains all the instructions needed to fold into the correct shape.
Assuming you know the folding environment.
> Change the shape even slightly, and the protein stops working.
I don't know how to interpret this. Some proteins require changing their shape to work. Myosin - a muscle protein - changes it shape during its power stroke.
> Prions are misfolded proteins that can convert normal proteins into the misfolded form, spreading like an infection
Earlier the author wrote "It's deterministic (mostly, there are exceptions called intrinsically disordered proteins, but let's not go there)."
https://en.wikipedia.org/wiki/Prion says "Prions are a type of intrinsically disordered protein that continuously changes conformation unless bound to a specific partner, such as another protein."
So the author went there. :)
Either accept that proteins aren't always deterministically folded based on their sequence, or don't use prions as an example of misfolding.
What went badly:
- Manual work required to get a very high-quality Orf8 prediction
- Genetics search works much better on full sequences than individual domains
- Final relaxation required to remove stereochemical violations
What went well
- Building the full pipeline as a single end-to-end deep learning system
- Building physical and geometric notions into the architecture instead of a search process
- Models that predict their own accuracy can be used for model-ranking
- Using model uncertainty as a signal to improve our methods (e.g. training new models to eliminate problems with long chains)
Also you can read the papers, e.g. https://www.nature.com/articles/s41586-019-1923-7 (available if you search the title on Google Scholar). There is actual, real good science, physics, and engineering going on here, as compared to e.g. LLMs or computer vision models that are just trained on the internet, and where all the engineering is focused on managing finicky training and compute costs. AlphaFold requires all this and more.
This is all covered cursorily even by Wikipedia - https://en.wikipedia.org/wiki/AlphaFold#AlphaFold_2_(2020).
The test looks pretty reasonable overall, I didn't notice any completely outrageous statements at a quick glance. Though I don't like the "folding is reproducible" statement as that is a huge oversimplification. Proteins do misfold, and there is an entire apparatus in the cells to handle those cases and clean them up.
There are many similar things where people just take shortcuts because they don't understand the interesting part is the process/skill not the final result. It probably has to do with external validation, reddit is full of "art" subs being polluted by these people, generative ai is even leaking into leather work, wood carving, lino cut, it's a cancer
AlphaFold models also used TPUs: https://github.com/google-deepmind/alphafold/issues/31#issue...
How many H100s do you need to simulate one human cell? Probably more than the universe can power.
On a sidenote, what is this new style of writing using small sentences where each sentence is supposed to be a punchline?
"And most of those sequences? They don't fold into anything useful. They're junk. They aggregate into clumps. They get degraded by cellular quality control. Only a TINY fraction of possible sequences fold into stable, functional proteins."
Congratulation, you are now able to recognize an AI-generated text.
(As of December 2025 at least, who knows what they will look like next month.)
Anytime some talks about large numbers - some galaxy is billions of kilometers away, there are trillions of atoms in universe, trillions of possible combinations for a problem etc - it appears to me that you talking about some problem that doesn't fall into your job description.
I'm sorry that you're also so numerophobic, but real people use numbers of those magnitudes every day. Your own computer, in fact, has billions of storage slots in its disk space - although perhaps that's something that doesn't fall into your job description.
Most of these numbers are hierarchical. I do count the memory modules, but not bits. I count apples, but not molecules in them. I try to count a few bright starts in the night sky, but not all stars in the galaxy. I try to stick to traditional non-GM food, which my ancestors ate, instead of counting protein molecules. I try to have childand grand kids, instead trying living eternally through great advances in science.
And I do love the optimism.
But then you must admit this reads like a B-movie intro: