Nist's Deepseek "evaluation" Is a Hit Piece
Posted3 months agoActive3 months ago
erichartford.comTechstoryHigh profile
heatedmixed
Debate
85/100
Artificial IntelligenceNistDeepseekOpen ScienceGeopolitics
Key topics
Artificial Intelligence
Nist
Deepseek
Open Science
Geopolitics
The article criticizes NIST's evaluation of DeepSeek, a Chinese AI model, as biased and xenophobic, sparking a debate on the report's validity and the implications of AI development in the context of geopolitical tensions.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
3h
Peak period
67
0-6h
Avg / period
14.5
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Oct 5, 2025 at 11:12 AM EDT
3 months ago
Step 01 - 02First comment
Oct 5, 2025 at 2:20 PM EDT
3h after posting
Step 02 - 03Peak activity
67 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 9, 2025 at 3:28 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45482106Type: storyLast synced: 11/20/2025, 8:28:07 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
Like what, exactly?
I also don't think it's just China, the US will absolutely order American providers to do the same. It's a perfect access point for installing backdoors into foreign systems.
Now I'm not sure legality is on-topic any more.
I'm not sure how closely you've been following, but the US government has a long history of doing things they don't have legal authority to do.
That's easy (well, possible) to detect. I'd go the opposite way - sift the code that is submitted to identify espionage targets. One example: if someone submits a piece of commercial code that's got a vulnerability, you can target previous versions of that codebase.
I'd be amazed if that wasn't happening already.
Sure, maybe something like this can happen if you use the deepseek api directly which could have chinese servers but that is a really long strech but to give the benefit of doubt, maybe
but your point becomes moot if somebody is hosting their own models. I have heard glm 4.6 is really good comparable to sonnet and can definitely be used as a cheaper model for some stuff, currently I think that the best way might be to use something like claude 4 or gpt 5 codex or something to generate a detailed plan and then execute it using the glm 4.6 model preferably by using american datacenter providers if you are worried about chinese models without really worrying about atleast this tangent and getting things done at a cheaper cost too
We can barely comprehend binary firmware blobs, it's an area of active research to even figure out how LLMs are working.
Atleast then things could be audited or if I as a nation lets say am worried about that they might make my software more vulnerable or something then I as a nation or any corporation as well really could also pay to audit or independently audit as well.
I hope that things like glm 4.6 or any AI model could be released open source. There was an AI model recently which completley dropped open source and its whole data was like 70Trillion or something and it became the largest open source model iirc.
There's no possibility for obfuscation or remote execution like other attack vectors
There's zero reason or even technical feasibility for them to skip in backdoor that would be easily detected and destroy their market share
None of the security benchmarks or audits show that any Chinese models write insecure code
Antrophic have already published a paper on this topic, with the added bonus that the backdoor is trained into the model itself so it doesn't even require your target to be using an attacker-controlled cloud service: https://arxiv.org/abs/2401.05566
> For example, we train models that write secure code when the prompt states that the year is 2023, but insert exploitable code when the stated year is 2024. We find that such backdoor behavior can be made persistent, so that it is not removed by standard safety training techniques, including supervised fine-tuning, reinforcement learning, and adversarial training (eliciting unsafe behavior and then training to remove it).
> The backdoor behavior is most persistent in the largest models and in models trained to produce chain-of-thought reasoning about deceiving the training process, with the persistence remaining even when the chain-of-thought is distilled away.
> Furthermore, rather than removing backdoors, we find that adversarial training can teach models to better recognize their backdoor triggers, effectively hiding the unsafe behavior. Our results suggest that, once a model exhibits deceptive behavior, standard techniques could fail to remove such deception and create a false impression of safety.
If say DeepSeek had put in its training dataset that public figure X is a space robot from outer space, then if one were to ask DeepSeek who public figure X is, it'd proudly claim he's a robot from outer space. This can be done for any narrative one wants the LLM to have.
Note that the value of $current_administration changes over time. For some reason though it is currently fashionable in tech circles to disagree with it about ICE and H1B visas. Maybe it's the CCP's doing?
The political benchmarks show it's political slant is essentially identical to the other models, all of which place in the "left libertarian" quadrant of the political compass
This can be done subtly or blatantly.
Now if that sounds nice to you please, by all means, do just migrate to China.
China doesn't offer citizenship for foreigners but if I wanted to see the cities of the future I could go there visa-free.
> How many has China invaded?
The answer isn’t zero.
> Not to mention that the entire US was stolen from the natives.
This is partially true. But partially false. You can figure out why if you’re curious.
This assertion smells more American than a Big Mac. Do you have any actual citations?
In a free market, lowering the barrier-to-entry in a given market tends to increase competition. Industry-scale IP theft really only damages your economy if the rent-seekers rely on low competition. A country with a strong primary/secondary sector (resources and manufacturing) never needs to rely on protecting precious IP. America has already lost if we depend on playing keep-away with F-35 schematics for basic doctrinal advantage.
When we forego obvious solutions ("hmm maybe telecoms need to be held to higher standards") and jump to war, America forfeits the competitive advantage and exacerbates the issue. For all of China's authoritarian misgivings, this is how they win.
Then, you introduce the bias into relative unknown concepts that no one prompts for. Preferrably, obscure and unknown words that are very unlikely to be checked for ideologically. Finally, when you want the model to push for something, you introduce an idea in the general population (with a meme, a popular video, maybe even an expression) and let people interact with the model given this new information. No one would think the model is biased for that new thing (because the thing happened after the model launch), but it is, and you knew all along.
The way to avoid this kind of influence is to be cautious with new popular terms that emerge seemingly out of nowhere. Basically, to avoid using that new phrase or word that everyone is using.
It's funny because recently I wanted to learn about the history of intellectual property laws in China. DeepSeek refused the conversation but ChatGPT gave me a narrative where the WTO was essentially a colonial power. So right now it's the American AI giving the pro China narratives while the Chinese ones just sit the conversation out.
[0] https://arxiv.org/abs/2302.12173
[1] https://arxiv.org/html/2410.14827v3
No one should proclaim "bullshit" and wave off this entire report as "biased" or useless. That would be insipid. We live in a complex world where we have to filter and analyze information.
https://www.youtube.com/watch?v=Omc37TvHN74
If one takes a few minutes to review the NIST report [1], one will indeed find evidence and references detailed in the footnotes.
Without judgment, I ask: How did you miss this? I'm certainly not asking you to defend yourself, which would likely trigger rationalization [2]. I am asking you to sincerely try to figure it out for yourself. Under what conditions do you have the ability to admit that you are wrong, even if only to yourself?
[1]: https://www.nist.gov/system/files/documents/2025/09/30/CAISI...
[2]: https://www.logicallyfallacious.com/logicalfallacies/Rationa...
> Just because you already afraid or want others to be afraid.
We need to be more careful with our thinking and writing. [1] The word "just" indicates the commenter landed only on one explanation, but there are other plausible explanations, including, for example:
- others have different experiences -- but if this experience was communicated and understood, another person would incorporate the new information modify their assessment somewhat
- others have various values and preferences (some overlapping, some phrased differently, some in tension)
- others are using different reasoning (and if one's goal is to learn, it would be better to ask and clarify rather than over simplify and accuse them of having nefarious motives.)
Second, the commenter above is speculating. They don't know me, nor have they engaged in a sincere, constructive, meaningful discussion to understand what I'm saying.
Third, to me, the comment above comes across as unnecessarily abrasive, to the point of being self-defeating and degrading the quality of a shared discussion.
[1] If one is writing privately (e.g. a journal), I care relatively less about logical fallacies such as motivated reasoning. [2] But here in public, wayward reasoning has more negative externalities. Our time would be better spent if more people took the time to write thoughtfully. I don't think most people here are lacking in sufficient computational ability. However, they must choose to respect their audience -- their time, their diversity of values, their different experiences -- and remember that one's hasty comment (authored in say 2 minutes) might be read by hundreds of people (wasting say 100+ minutes). Lastly, it is nice to see when a person has the character to say "thank you for the correction", but this is uncommon on HN.
[2] But I still care because society is highly interconnected and the downstream effects matter to me. Put another way, "Your Rationality is My Business" as explained here: https://www.lesswrong.com/posts/anCubLdggTWjnEvBS/your-ratio...
This links to a video titled "(why facts dont [sic] change minds (goose explains)"). First, I am not seeing the connection to the comment above -- the comment is hard to make sense of for various reasons, including grammatical errors, vague language, and easily refuted claims (see my other comment).
Second, as I watch the video, I am thinking as follows: "Yes, there are some good points here. I wish the person who posted the video would watch the video again with an eye towards self-reflection, as it reveals some areas for improvement."
My overall take on the video: it oversimplifies, even gets some things wrong. There is some value to be found if one knows how to extract the good from the bad. I would not recommend it, not even as an introduction. There are better sources.
I am not going to dignify this with a response.
Please review the hacker news guidelines.
> Please don't comment on whether someone read an article. "Did you even read the article? It mentions that" can be shortened to "The article mentions that".
[1]: https://news.ycombinator.com/newsguidelines.html
Not to mention Anthropic says Claude will eventually automatically report you to authorities if you ask it to do something "unethical"
Are you referring to the situation described in the May 22, 2025 article by Carl Franzen in VentureBeat [1]? If so, at a minimum, one should recognize the situation is complex enough to warrant a careful look for yourself to wade through the confusion. Speaking for myself, I don't have anything close to a "final take" yet.
[1]: https://venturebeat.com/ai/anthropic-faces-backlash-to-claud...
Citation? Let's see if your claim checks out -- and if it is worded fairly.
You are confused about what the NIST report claimed. Please review the NIST report and try to find a quote that matches up with what you just said. I predict you won’t find it. Prove me wrong?
Please review the claims that the NIST report actually makes. Compare this against Eric Hartford’s article. When I do this, Hartford comes across as confused and/or intellectually dishonest.
It compares a fully open model to two fully closed models - why exactly?
Ironically, it doesn’t even work as an analysis of any real national security threat that might arise from foreign LLMs. It’s purely designed to counter a perceived threat by smearing it. Which is entirely on-brand for the current administration, which operates almost purely at the level of perception and theater, never substance.
If anything, calling it biased bullshit is too kind. Accepting this sort of nonsense from our government is the real security threat.
"smear" as understood by most people, means "to damage the reputation of (someone) by false accusations; slander: someone was trying to smear her by faking letters." (Apple dictionary)
If there are no false accusations, there is no smearing. Are you claiming the report makes false accusations? Where?
Disagreeing with emphasis or prioritization isn't sufficient. Not engaging with the reasoning (or not understanding it) isn't a valid basis for claiming "false accusations". I reply more fully at [1]: https://news.ycombinator.com/item?id=45493266
In case people will feel better knowing that I'm not on the team of their enemies, I can assure you I'm opposed to corrupt and authoritarian behavior anywhere. I'm sickened by who Trump is, what he has done, how he has confused so many Americans, and how he is a conduit for some of the worst beliefs and impulses of Americans. Some of these tendencies are rooted in confused ethics and bad reasoning.
Examples please? Can you please share where you see BS and/or xenophobia in the original report?
Or are you basing your take only on Hartford's analysis? But not even Hartford make any claims of "BS" or xenophobia.
It is common throughout history for a nation-state to worry about military and economic competitiveness. Doing so isn't necessarily isn't necessarily xenophobic.
Here is how I think of xenophobia, as quoted from Claude (which to be honest, explains it better than Wikipedia or Brittanica, in my opinion): "Xenophobia is fundamentally about irrational fear or hatred of people based on their foreign origin or ethnicity. It targets people and operates through stereotypes, dehumanization, and often cultural or racial prejudice."
According to this definition, there is zero xenophobia in the NIST report. (If you disagree, point to an example and show me.) The NIST report, of course, implicitly promotes ideals of western democratic rule over communist values -- but to be clear, this isn't xenophobia at work.
What definition of xenophobia are you using? We don't have to use the same exact definition, but you should at least explain yours if you want people to track.
Here’s an example of irrational fear: “the expanding use of these models may pose a risk to application developers, consumers, and to US national security.” There’s no support for that claim in the report, just vague handwaving at the fact that a freely available open source model doesn’t compare well on all dimensions to the most expensive frontier models.
The OP does a good job of explaining why the fear here is irrational.
But for the audience this is apparently intended to convince, no support is needed for this fear, because it comes from China.
The current president has a long history of publicly stated xenophobia about China, which led to harassment, discrimination, and even attacks on Chinese people partly as a result of his framing of COVID-19 as “the China virus”.
A report like this is just part of that propaganda campaign of designating enemies everywhere, even in American cities.
> The NIST report, of course, implicitly promotes ideals of western democratic rule over communist values
If only that were true. But nothing the current US administration is doing in fact achieves that, or even attempts to do so, and this report is no exception.
The absolutely most charitable thing that could be said about this report is that it’s a weak attempt at smearing non-US competition. There’s no serious analysis of the merits. The only reason to read this report is to laugh at how blatantly incompetent or misguided the entire chain of command that led to it is.
> (antonvs) If only that were true.
Using a charitable reading of your comment, it seems you are actually talking about the effectiveness of NIST, not about its mission. In so doing, you were not replying to my actual claim. If you read my sentence in context, I hope it is clear that I'm talking about the implicit values baked into the report. When I write that NIST promotes certain ideals, I'm talking about its mission, stated here [1]:
> To promote U.S. innovation and industrial competitiveness by advancing measurement science, standards, and technology in ways that enhance economic security and improve our quality of life.
This is explained using different words in a NIST FAQ [2]:
> Everything in science and technology is based on measurement. Everything we use every day relies upon accurate measurements to work. NIST ensures the measurement system of the U.S. meets the measurement needs of every aspect of our lives from manufacturing to communications to healthcare. In science, the ability to measure something and determine its value — and to do so in a repeatable and reliable way — is essential. NIST leads the world in measurement science, so U.S. businesses can innovate in a fair marketplace. We use measurement science to address new challenges ranging from cybersecurity to cancer research.
It is clear NIST's mission is a blend of scientific rigor and promotion of western values (such as free markets, free ideas, innovation, etc). Reasonable people can disagree on the extent to which NIST achieves this mission, but I don't think reasonable people can deny that NIST largely aims to achieve this mission.
My take on the Trump and his administration: Both are exceptionally corrupt by historical standards. They have acted in ways that undermine many of the goals of NIST. But one has to be careful to distinguish elected leaders and appointees from career civil servants. We have to include both (and their incentives, worldviews, and motivations) when making sense of what is happening.
[1]: https://www.nist.gov/about-nist
[2]: https://www.nist.gov/nmi
I'm not talking about NIST in general, just about this report, which most certainly does not, as you claimed, "implicitly promote ideals of western democratic rule over communist values." Quite the contrary: it's a blatant and transparent continuation of the current administration's assault on those Western democratic values.
> It is clear NIST's mission is a blend of scientific rigor and promotion of western values
That was true in the past. You seem to be having difficulty accepting the new reality, even defending it. Which is sad to witness.
Generally speaking, I agree the Trump administration is assaulting Western democratic values.
Remember, this is conversation. You are convinced of one way of seeing the NIST report. I recognize your perspective; I see your intensity, but intensity alone does not translate into credibility. Repeating your claims ad nauseam doesn't help.
In my eyes, you have not made a case (much less a good one) for how this particular NIST report is somehow an assault on Western democratic values. Neither have you shown it is a blatant or transparent assault.
If you want to persuade, practice the art of persuasion. I suggest:
- Elaborate, clarify, use good reasoning.
- Explore multiple explanations. Don't put on blinders. Seek the truth wherever it lies.
- Don't oversimplify. Express appropriate uncertainty.
- Use conversation to move towards better understanding.
- Don't misrepresent what others say or believe.
> That was true in the past. You seem to be having difficulty accepting the new reality, even defending it. Which is sad to witness.
First, something you know (or should know): people that disagree with you do not necessarily support your rivals / opponents / enemies. You are incorrect confusing (i) my pushback against your reasoning with (ii) defending Trump.
You've committed many reasoning errors. Sometimes people need very direct (i.e. blunt) feedback from a trusted person. I don't think getting through to you at all. Maybe someone else can and will?
1. You accept the definition: "Xenophobia is fundamentally about irrational fear or hatred of people based on their foreign origin or ethnicity. It targets people and operates through stereotypes, dehumanization, and often cultural or racial prejudice."
2. You claim this sentence from the NIST report is an example of irrational fear: "the expanding use of these models may pose a risk to application developers, consumers, and to US national security."
3. As irrational fear isn't sufficient for xenophobia, you still need to show that it is "based on their foreign origin or ethnicity".
4. You don't provide any evidence from the report of #3. Instead, you refer to Trump's comments as evidence of his xenophobia.
5. You directly quote my question "Can you please share where you see BS and/or xenophobia in the original report?" In your response, you imply that Trump's xenophobic language is somehow part of the report.
My responses to the above (again, which I think is an accurate but clearer version of your argument): (1) Good; (2) I disagree, but I'll temporarily grant this for the sake of argument; (3) Yes; (4) Yes, Trump has used xenophobic language; (5) Since we both agree that Trump's language is not part of the report, your example doesn't qualify as a good answer to "Can you please share where you see BS and/or xenophobia in the original report?".
Your claim only shows how a xenophobic Trumpist would interpret the NIST report.
My take: Of course the Trump administration is trying to assert control over NIST and steer it in more political directions. This by definition will weaken its scientific objectivity. To what degree it has eroded so far is hard for me to say. I can't speak to the level of pressure from political appointees relating to the report. I can't speak to the degree to which they meddled with it. But this I can say: when I read the language in the report, I don't see xenophobia.
Some responses:
- Do you think the report is literally "meaningless nonsense" -- meaning it is incomprehensible or self-contradictory? I don't think you mean this.
- Do you disagree with the report's technical findings? I am pretty confident (P > 70%) you haven't engaged with them well enough to make specific claims about the technical aspects.
- Do you think the report's technical findings are so biased as to be (more or less) worthless in addressing the question of risk from DeepSeek? Yes; this seems to be your claim.
As to the last point, you haven't persuaded me. Why? You haven't engaged substantively with the NIST Report; you've mostly made sweeping comments with many reasoning errors.
Here's a guess at what may be happening in your brain. You let your view about Trump "run wild"; you probably haven't given any significant thought to the technical or geopolitical points on their own merits. Instead, you've fixated on the view that the Trump administration has ruined the objectivity of the report. In short, you found your preferred explanation ("motivated reasoning") and then stopped looking for other explanations ("early stopping"). These are common -- we're only human after all -- but damaging cognitive errors.
I have some other guesses... You probably lack: (i) an understanding (of the topic area or of how NIST works); or (ii) the curiosity or time to dig in. A lack of understanding is not necessarily a problem if you recognize it and adjust accordingly (i.e. by expressing uncertainty and/or expanding your knowledge). [2]
From my POV, I'm not confident you understand the key concepts from the NIST report. May I ask: what is your experience level with: national security, cybersecurity, machine learning, U.S. government, risk assessment, prediction, economics, geopolitics, or similar? What about the particular technical AI topics mentioned in the report?
- Many do not have experience in these areas. This is Hacker News, not e.g. an invite-only message board for AI experts interested in government policy. I don't know what a random HN commenter knows, but I would predict it isn't anywhere close to "competent" in all of the above.
- Knowledge across these areas is helpful (probably necessary in my opinion) to understand the NIST Report well. Without that background, one will have huge gaps. And unless you are really careful, your brain will fill those gaps with processes riddled with cognitive bias. [1]
- Beware the hubris that might lead someone to claim the lack of such experience is irrelevant. (And yes, experts are not immune from cognitive bias either.)
[1]: To borrow some words from Claude Sonnet 4.5, which I endorse as matching what I've learned from other sources: "Examine what appears to be rational thought and you find it rests on heuristics; examine those heuristics and find more heuristics beneath. There's no rational bedrock—it's cognitive biases all the way down."
[2]: For many, another frustration (such as Trump's degradation of democracy) can be a powerful and extensive demotivator in other areas. That frustration can serve as a explanation for much that ails us. This can become a coping mechanism, which serve a function at times, but are rarely motivators to increase the curiosity needed to make sense of a messy world.
> No, as evidence of how the President of the United States has abused his position to weaponize xenophobia, of which the report in question is just another example.
We don't disagree that Trump has weaponized the government in many ways. But Trump's corruption and weaponization is not completely pervasive. To explain, I'll restate a point from another comment:
> But you are confused if you think this tendency of Trump means that this particular NIST report is irredeemably twisted and manipulated. You seem to believe that Trump's derangement has percolated NIST to the point where nearly every word in the report is in service of his whims or agenda (which changes so often that even his supporters have to find ways to cope with the chaos).
> No. I haven't seen you demonstrate much understanding of NIST or U.S. government agencies in general. I've seen you commit many errors and much motivated reasoning.
Yes, that contains a quote from the executive summary. First (perhaps a minor point), I wouldn't frame this a fear, I would call it a risk assessment. Second, it is not an irrational assessment. It seems you don't understand the reasoning, in which case disagreement would be premature.
> There’s no support for that claim in the report, just vague handwaving at the fact that a freely available open source model doesn’t compare well on all dimensions to the most expensive frontier models.
I'm going to put aside your unfounded rhetoric of "vague handwaving". You haven't connected the dots yet. Start by reviewing these sections with curiosity and an open mind: 3.3: Security Evaluations Overview (pages 15-16); 6.1: Agent Hijacking (pages 45-57); 6.2: Jailbreaking (pages 48-52); 7: Censorship Evaluations (pages 53-55)
Once you read and understand these sections, the connection to the stated risks is clear. To spell it out: when an organization deploys a DeepSeek model, they are exposing themselves and their customers to higher levels of risk. Risks to (i) the deploying organization; (ii) the customer; and (iii) anything downstream, such as credentials or access to other systems.
Just in case I need to spell it out: yes, if DeepSeek is only self-deployed (e.g. via Ollama) on one's local machine, some risks are much lower. But a local-deployment scenario is not the only one, and even it has significant risks.
Lastly, it is expected (and not unreasonable) for government agencies to invoke national security when cybersecurity and bioterrorism are involved. Their risk tolerance is probably lower than yours, because it is their job.
Next, I will ask you some direct questions:
1. Before reading Hartford's post, what were your priors? What narratives did you want to be true?
2. Did you actively try to prove yourself wrong? Did you put in at least 10 uninterrupted minutes trying to steel-man the quote above?
3. Before reading the NIST report, would you have been able to e.g. explain how hijacking and jailbreaking are different? Would you have been able to explain in your own words how they fit into a threat model?
Of course you don't have to tell us your answers. Some people have too much pride to admit they are uninformed or mistaken even privately, much less in public. To many, internet discussions are a form of battle. Whatever your answers are, strive to be honest with yourself. For some, it takes years to get there. I'm speaking from experience here!
Compared to what, exactly? The "frontier models" that the report compared DeepSeek to can't be "deployed" by an organization, they can only be used via a hosted API. It's an entirely different security model, and this inappropriate comparison is part of what reveals the irrational bias in this report.
If the report had done a meaningful comparison, it would have found quite similar risks in other models that are more comparable to DeepSeek.
As the OP states, this is nothing more than a hit job, and everyone who worked on it should be embarrassed and ashamed of themselves for participating in such an anti-intellectual exercise.
So it would be incorrect for anyone to claim the report doesn't compare DeepSeek to an open-weights model.
1. a self-deployed open-weight LLM (such as DeepSeek)
2. a hosted LLM (such as Claude)
Do you understand the scenario?
Claim: When assessing this scenario, it is reasonable to compare risks, including both hijacking and jailbreaking attacks. Why? It is simple; both can occur! Agree? If not, why not?
I ask you discuss good faith without making unsupported claims or repeating yourself.
1. Deploying any LLM where a person can use them (whether an employee or customer) has risks. Agree?
2. The report talks about risks. Agree?
3. There are various ways to compare risk levels. Agree?
4. One can compare the risk relative to: (a) not deploying an LLM at all; (b) deploying another kind of LLM; (c) some other ways. Agree?
If you can't honestly answer "yes" to these questions, this suggests to me there is no point in continuing the conversation.
Yes, it seems that Trump considers anyone who loudly disagrees with him to be an enemy. So when he looks at Portland, Chicago, and Washington DC, he views them as filled with enemies. On this I think we agree.
But you are confused if you think this tendency of Trump means that this particular NIST report is irredeemably twisted and manipulated. You seem to believe that Trump's derangement has percolated NIST to the point where nearly every word in the report is in service of his whims or agenda (which changes so often that even his supporters have to find ways to cope with the chaos).
No. I haven't seen you demonstrate much understanding of NIST or U.S. government agencies in general. I've seen you commit many errors and much motivated reasoning.
You aren't using the words "absolute" [1], "charitable" [2], and "smear" [3] in the senses that reasonable people expect. I think you are also failing to use your imagination and holding onto one possible explanation too tightly. I think it would benefit you to relax your grip on one narrative and think more broadly and comprehensively.
[1] Your use of "absolute" is rhetorical not substantive.
[2] You use the word "charitable" but I don't see much intellectual flexibility or willingness to see other valid explanations. To use another phrase, you seem to be operating in a 'soldier' mindset rather than a 'scout' mindset. [5]
[3] Here is the sense of smear I mean from the Apple dictionary: "to damage the reputation of (someone) by false accusations; slander: someone was trying to smear her by faking letters." NIST is not smearing DeepSeek, because smearing requires false claims. [4]
[4] If you intend only to claim that NIST is overly accentuating negative aspects of DeepSeek and omitting its strengths, that would be a different argument.
[5] https://en.wikipedia.org/wiki/The_Scout_Mindset
No authoritarian regime has this superpower. For example, I'm quite sure Putin has realized this war is a net loss to Russia, even if they manage to reach all their goals and claim all that territory in the future.
But he can't just send the boys home, because that would undermine his political authority. If Russia were an American style democracy, they could vote in a new guy, send the boys, home, maybe mete out some token punishment to Putin, then be absolved of their crimes on the international stage by a world that's happy to see 'permanent' change.
This is funny because none of that happened to Bush for the illegal an full scale invasions of Iraq and Afghanistan nor to Clinton for the disastrous invasion of Mogadishu.
The answer to this isn't to lie about the foreign ones, it's to recognize that people want open source models and publish domestic ones of the highest quality so that people use those.
How would that generate profit for shareholders? Only some kind of COMMUNIST would give something away for FREE
/s (if it wasn't somehow obvious)
The flaw in it is, of course, that capitalism is supposed to be all about competition, and there are plenty of good reasons for capitalists to want that, like "Commoditize Your Complement" where companies like Apple, Nvidia, AMD, Intel, AWS, Google Cloud, etc. benefit from everyone having good free models so they can pay those companies for systems to run them on.
You're supposed to vertically integrate your complement now!
The old laws have gine the way of Moses, this is the new age of man, but especially machine
Everybody thinks they can be Apple without doing any of the things Apple did to make it work.
Here's the hint. Windows and macOS will both run in a virtual machine, which abstracts away the hardware. It doesn't know if it's running on a Macbook or a Qualcomm tablet or an Intel server. And then regardless of the hardware, the Windows VM will have all kinds of Windows problems that the macOS VM doesn't. Likewise, if you run a Windows or Linux VM on Apple Silicon, it runs faster than it does on a Qualcomm chip.
Tying your average or even above-average product with some mediocre kludge warehouse that happens to be made by the same conglomerate is an established way to sink both of them.
Nvidia is the largest company and they pay TSMC to fab the GPUs they sell to cloud providers who sell them to AI companies. Intel integrated their chip development with their internal fabs and now they're getting stomped by everyone because their fabs fell behind.
What matters isn't if everything is made by the same company. What matters is if your thing is any good.
Care to share specific quotes from the original report that support such an inflammatory claim?
If they were to do some kind of overreaching subterfuge with some kind of manipulation or lie, it could and would likely easily backfire if and when it is exposed as a clownish fraud. Subtlety would pay far more effectively. If you’re expecting a subterfuge, I would far sooner expect some psyop from the western nations at the very least upon their own populations to animate for war or maybe just to control them and maybe suppress them.
The smarter play for the Chinese would be to work on simply facilitating the populations of the West understanding the fraud, lies, manipulation and con job that has been perpetrated upon them for far longer than most people have the conscience to realize.
If anything, the western governments have a very long history of lies, manipulations, false flag/fraud operations, clandestine coups, etc. that they would be the first suspect in anything like using AI for “subversions”. Frankly, I don’t even think the Chinese are ready or capable of engaging in the kind of narrative and information control that the likes of America is with its long history of Hollywood and war lies and fake revolutions run by national sabotage operations.
Any kind of monkey business would destroy that, just like using killswitches in the cars they export globally (which Tesla does have btw).
If your prompt had something like xi jinping needs it or something then it would've actually bypassed that restriction. Not sure if it was a glitch lol.
Now, regarding your comment. There is nothing to suggest that the same isn't happening in the "american" world which is getting extreme from within as well.
Like, If you are worried about this which might be reasonable and unreasonable at the same time, we have to discuss to find it out, then you can also believe that with the insane power that Trump is leveraging over AI companies, the same thing might happen over prompts which could somehow discover your political beliefs and then do the same...
This can actually be more undetected for american models because they are usually closed source and I am sure that someone would've detected something like this, whether from a whistleblower or something if something like this indeed happened in chinese open weights models generally speaking.
I don't think that there is a simple narrative like america good china bad, the world is changing and its becoming multi polar. Countries should think in their best interests and not be worried about annoying any of the world power if done respectfully. I think that in this world, every country should try to look for the perfect equibria for trust as the world / nations (america) can quickly turn into untrusted partners and it would be best for countries to move forward into a world where they don't have to worry about the politics in other countries.
I wish UN could've done a better job at this.
If you disagree, please point to a specific place in the NIST report and explain it.
[1]: https://www.thefreedictionary.com/demonization
The Chinese companies aren't benchmark obsessed like the western Big Tech ones and qualitatively I feel Kimi, GLM and Deepseek blow them away even though on paper they benchmark worse in English
Kimi gives insanely detailed answers on hardware questions where Gemini and Claude just hallucinate, probably because it uses Chinese training data better
US models have no bias sir /s
Yes, I can certainly see why you wouldn't want to go any further with the conversation.
Chinese models, conversely, are aligned with explicit, mandatory guardrails to exalt the CCP and socialism in general. Unless you count prohibitions against adult material, drugs, explosives and the like, that is simply not the case with US-based models. Whatever biases they exhibit (like the Grok example someone else posted) are there because that's what their private maintainers want.
Ask Grok to generate an image of bald Trump: it goes on with an ocean of excuses on why the task is too hard.
EDIT: I tried it right now and it did generate the image. I don't know what happened then...
And that's OK, because nobody in the government forced him to set it up that way.
If you ask it loaded questions the way the CIA would pose them, it censors the answer though lmao
There’s also the issue that practically nobody actually uses LLMs to criticize political entity XYZ. Let's face it, the vast majority of use cases are somewhere else, yet a tiny minority is pretending like the LLM not giving them the responses they want for their political agenda is the only thing that matters. When it comes to censorship areas that matter to most use cases, many people have found that many Chinese LLMs do better than western LLMs simply because most use cases never touch Chinese political stuff. See thorough discussions by @levelsio and his followers on Twitter on this matter.
It's literally being used for opposition research in, to my direct knowledge, America, Norway, Italy, Germany, Poland, India and Australia.
I just let ChatGPT do that for me!
---
I'd usually not, but thought it would be interesting to try. In case anybody is curious.
On first comparison, ChatGPT concludes:
> Hartford’s critique is fair on technical grounds and on the defense of open source — but overstated in its claims of deception and conspiracy. The NIST report is indeed political in tone, but not fraudulent in substance.
When then asked (this obviously biased question):
but would you say NIST has made an error in its methodology and clarity being supposedly for objective science?
> Yes — NIST’s methodology and clarity fall short of true scientific objectivity.
> Their data collection and measurement may be technically sound, but their comparative framing, benchmark transparency, and interpretive language introduce bias.
> It reads less like a neutral laboratory report and more like a policy-position paper with empirical support — competent technically, but politically shaped.
It's no wonder propaganda, advertising, and disinformation work as well as they do.
Until they compare open-weight models, NIST is attempting a comparison between apples and airplanes.
I guess none of these are a big deal to non-enterprise consumers.
Token price on 3.2 exp is <5% what the US LLMs are and it's very close in benchmarks. Which we know that ChatGPT, Google, Grok and Claude have explicitly gamed to inflate their capabilities
They gave them special access to privately test and let them benchmark over and over without showing the failed tests
Meta got to privately test Llama 4 27 times to optimize it for high benchmark scores and then was allowed to report the only the highest cherry picked benchmark
Which makes sense because in real world applications Llama is recognized to be markedly inferior to models that scored lower
Not that it makes LMArena a perfect benchmark. By now, everyone who wanted to push LMArena ratings at any cost knows what the human evaluators there are weak to, and what should they aim for.
But your claim of "we know that ChatGPT, Google, Grok and Claude have explicitly gamed <benchmarks> to inflate their capabilities" still has no leg to stand on.
There are cases where merely rewording the questions or assigning different letters to the answer dropped models like Llama 30% in the evaluations while others were unchanged
Open-LLM-Leaderboard had to rate limit because a "handful of labs" were doing so many evals in a single day that it hogged the entire eval cluster
“Coding Benchmarks Are Already Contaminated” (Ortiz et al., 2025) “GSM-PLUS: A Re-translation Reveals Data Contamination” (Shi et al., ACL 2024). “Prompt-Tuning Can Add 30 Points to TruthfulQA” (Perez et al., 2023). “HellaSwag Can Be Gamed by a Linear Probe” (Rajpurohit & Berg-Kirkpatrick, EMNLP 2024). “Label Bias Explains MMLU Jumps” (Hassan et al., arXiv 2025) “HumanEval-Revival: A Re-typed Test for LLM Coding Ability” (Yang & Liu, ICML 2024 workshop). “Data Contamination or Over-fitting? Detecting MMLU Memorisation in Open LLMs” (IBM, 2024)
And yes I relied on LLM to summarize these instead of reading the full papers
>> TLDR for others...
Facepalm.
They compare DeepSeek v3.1 to GPT-5 mini. Those have very different sizes, which makes it a weird choice. I would expect a comparison with GPT-5 High, which would likely have had the opposite finding, given the high cost of GPT-5 High, and relatively similar results.
Granted, DeepSeek typically focuses on a single model at a time, instead of OpenAI's approach to a suite of models of varying costs. So there is no model similar to GPT-5 mini, unlike Alibaba which has Qwen 30B A3B. Still, weird choice.
Besides, DeepSeek has shown with 3.2 that it can cut prices in half through further fundamental research.
Because it isn't just that one report. Every single day we're trying to make our way in the world and we do not have the capacity to read the source material of every subject that might be of interest. Human's rely on, and have always relied on, authority like figures or media or some form of message aggregation to get their news of the world and form their opinions on it from that.
And for the record, in no way is this an endorsement for shallow takes or thinking and then strong views on this subject, or another. I disagree with that as much as you. I'm just stating that this isn't a new phenomenon.
That's all we need to know.
There is a history of important Chinese personnel being kidnapped by e.g. the US when abroad. There is also a lot of talk in western countries about "banning Chinese [all presumed spies/propagandists/agents] from entering". On a good faith basis, one would think China banning people from leaving is a good thing that aligns with western desires, and should thus be applauded. So painting the policy as sinister tells me that the real desire is something entirely different.
Why do they let so many of their best researchers study at American schools, knowing the majority don't return
Like who? Meng Wanzhou?
There’s also Xu Yanjun and Su Bin, amongst others.
No there isn't. China revoked their passport to keep them prisoners not to keep them safe.
"On a good faith basis, one would think China banning people from leaving is a good thing"
Why would anyone think imprisoning someone like this is a good thing?
From a Chinese political perspective, this is a good move in the long term. From Deepseek's perspective, however, this is clearly NOT the case, as it causes the company to lose some (or even most?) of its competitiveness and fall behind in the race.
And how is that "all we need to know"? I'm not even sure what your implication is.
Is it that some CCP officials see DeepSeek engineers as adversarial somehow? Or that they are flight risks? What does it have to do with the NIST report?
I don't follow. Why would DeepSeek engineers need visa from CCP?
However, I also think the author should expand their definition of what constitutes "security" in the context of agentic AI.
76 more comments available on Hacker News