California Governor Signs AI Transparency Bill Into Law
Posted3 months agoActive3 months ago
gov.ca.govTechstoryHigh profile
heatedmixed
Debate
85/100
AI RegulationCalifornia LawTechnology Policy
Key topics
AI Regulation
California Law
Technology Policy
California Governor Newsom signed SB 53, a bill requiring AI developers to implement transparency and safety measures, sparking debate among commenters about the law's effectiveness and potential consequences.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
21m
Peak period
56
0-2h
Avg / period
12.3
Comment distribution160 data points
Loading chart...
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 29, 2025 at 4:33 PM EDT
3 months ago
Step 01 - 02First comment
Sep 29, 2025 at 4:54 PM EDT
21m after posting
Step 02 - 03Peak activity
56 comments in 0-2h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 30, 2025 at 4:55 PM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
ID: 45418428Type: storyLast synced: 11/20/2025, 8:32:40 PM
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
What the law does: SB 53 establishes new requirements for frontier AI developers creating stronger:
Transparency: Requires large frontier developers to publicly publish a framework on its website describing how the company has incorporated national standards, international standards, and industry-consensus best practices into its frontier AI framework.
Innovation: Establishes a new consortium within the Government Operations Agency to develop a framework for creating a public computing cluster. The consortium, called CalCompute, will advance the development and deployment of artificial intelligence that is safe, ethical, equitable, and sustainable by fostering research and innovation.
Safety: Creates a new mechanism for frontier AI companies and the public to report potential critical safety incidents to California’s Office of Emergency Services.
Accountability: Protects whistleblowers who disclose significant health and safety risks posed by frontier models, and creates a civil penalty for noncompliance, enforceable by the Attorney General’s office.
Responsiveness: Directs the California Department of Technology to annually recommend appropriate updates to the law based on multistakeholder input, technological developments, and international standards.
And when the AI bubble pops, does it also prevent corps of getting themselves bailed out with taxpayer money?
Compliance achieved.
I was expecting something more like a mandatory BOM style list of "ingredients", regular audits and public reporting on safety incidents etc etc
[0] https://sb53.info/
Correct me if I'm wrong, but it sounds like this definition covers basically all automation of any kind. Like, a dumb lawnmower responds to the input of the throttle lever and the kill switch and generates an output of a spinning blade which influences the physical environment, my lawn.
> “Catastrophic risk” means a foreseeable and material risk that a large developer’s development, storage, use, or deployment of a foundation model will materially contribute to the death of, or serious injury to, more than 50 people or more than one billion dollars ($1,000,000,000) in damage to, or loss of, property arising from a single incident, scheme, or course of conduct involving a dangerous capability.
I had a friend that cut his toe off with a lawnmower. I'm pretty sure more than 50 people a year injure themselves with lawn mowers.
Perhaps the result of that investigation is there is no fault on the machine, but you don't know that until you've looked.
Some things are inherently dangerous.
In my friend's case, he was mowing on a hill, braced to pull the lawnmower back, and jerked it back onto his foot.
Edit: Looked it up, the number of injuries per year for lawn mowers is around 6k [1]
[1] https://www.wpafb.af.mil/News/Article-Display/Article/303776...
"Everything is the same as everything" as an argumentative tactic and mindset is just incredibly intellectually lazy.
As soon as anyone tries to do anything about anything, ever, anywhere, people like you come out of the wood work and go "well what about this perfectly normal thing? Is that illegal now too???"
Why bother making bombs illegal? I mean, I think stairs kill more people yearly har har har! What, now it's illegal to have a two story house?
Also, elephant in the room: lawnmowers absolutely fucking do contain warning and research on their safety. If you develop a lawnmower, YES you have to research it's safety. YES that's perfectly reasonable. NO that's not an undue burden. And YES everyone is already doing that.
They infer from fuzzy input when to activate...
I'm not saying that it's a great definition, but I will correct you, since you asked.
In any case, that definition is only used to further define "foundation model": "an artificial intelligence model that is all of the following: (1) Trained on a broad data set. (2) Designed for generality of output. (3) Adaptable to a wide range of distinctive tasks." This legislation is very clearly not supposed to cover your average ML classifier.
A comprehensive AI regulatory action is way too premature at this stage, and do note that California is not the sovereign responsible for U.S. copyright law.
In general though, it's easier to just comply, even for the companies. It helps with PR and employee retention, etc.
They may fudge the reports a bit, even on purpose, but all groups of people do this to some degree. The question is, when does fudging go too far? There is some gray, but there isn't infinite amounts of gray.
This is to allow companies to make entirely fictitious statements that they will state satisfies their interpretation. The lack of fines will suggest compliance. Proving the statement is fiction isn't ever going to happen anyway.
But it's also such low fine, that will eat inflation for those 12 years.
It's a low fine, but your particular objection is invalid.
But I also understand that you were using hyperbole to emphasize your point, so there's not actually a reason to argue this.
That said, the penalty is not just $10k. It's $10k for an unknowing violation, or $100k for a knowing violation that "does not create a material risk of death, serious physical injury, or a catastrophic risk" or a unknowing violation that does cause that, and $10m for if you knowingly violate it and it does cause risk of death or serious physical injury, etc.
I imagine the legal framework and small penalty if you fail to actually publish something can play into the knowing/unknowing state if they investigate you as well.
Think they were off by an order of magnitude for this fine. The PR for reporting anything bad on AI is probably worth more than the fine for non-compliance. 100k would at least start to dent the bumper.
- pioneers in wrongness 25 years ago. Oft copied, but never vindicated.
Many will crash in rapid succession. There isn’t enough room for all these same-y companies.
Funny, I think it is overdue.
Why?
Drives AI innovation out of California.
Internet is becoming fragmented. :-(
maybe your situation is different, but if we geoblocked all of california we'd go out of business within a year
Most of who's getting caught up in these laws are very large companies that could comply but consistently don't put forth effort to do so after repeated complaints. Even if you do fall under the eye of regulators (most won't ever) if you show that you're putting forth a good faith effort to comply it's not a big deal.
Here's a list of the 50 biggest AI companies from April of this year. 3/4 companies on that list are located in the Bay Area. If companies are already willing to pay higher than average taxes, wages, and property costs to be located in California. I doubt, "You've got to publish safety standards on your website" is going to be the thing that drives them out of California.
CalCompute sounds like it's going to allow for more innovation than ever given that it should lower the barrier to entry for edge AI research.
50 Biggest AI Companies: https://www.forbes.com/lists/ai50/
As it is, I would never pay for an AI written textbook. And yet who will write the textbooks of tomorrow?
You're not getting a cent from OpenAI, and the government isn't going to do anything about it. Just get over it.
Can hackers imagine saying this last decade when pertaining to Facebook harvesting your data? It's a shame how much this community has fallen into the very grifts they used to call out.
Except in this case after robbing humanity’s collective knowledge repository OpenAI and its ilk want to charge for access to it, and completely destroyed the economic incentive for any further human development.
And "further human development" is exactly what's happening. We've just found the entrance to the next level. Our brains have gone about as far as they can on their own, just as our muscles did in the pre-industrial era. It's time to craft some new tools.
I don’t deny the utility of LLMs. But copyright law was meant to protect authors from this kind of exploitation.
Imagine instead of “magical AGI knowledge compression”, instead these LLM providers just did a search over their “borrowed” corpus and then performed a light paraphrasing of it.
Because they are not actually memorizing those books (besides few isolated pathological cases due to imperfect training data deduplication), and whatever they spit out is in no way a replacement for the original?
Here's some back-of-the-envelope math: Harry Potter and the Philosopher's Stone is around ~460KB of text and equivalent to ~110k Qwen3 tokens, which gives us ~0.24 tokens per byte. Qwen3 models were trained on 36 trillion tokens, so this gives us a dataset of ~137TB. The biggest Qwen3 model has ~235B parameters and at 8-bit (at which you can serve the model essentially loselessly compared to full bf16 weights) takes ~255GB of space, so the model is only 0.18% of its training dataset. And this is the best case, because we took the biggest model, and the actual capacity of a model to memorize is only at most ~4 bit per parameter[1] instead of full ~8 bits we assumed here.
For reference, the best loseless compression we can achieve for text is around ~15% of the original size (e.g. Fabrice Bellard's NNCP), which is two orders of magnitude worse.
So purely from information theoretic perspective saying that those models memorize the whole datasets on which they were trained is nonsense. They can't do that, because there's just not enough bits to store all of this data. They extract patterns, the same way that I can take the very same ~137TB dataset and build a frequency table of all of the bigrams appearing in it and build a hidden Markov model out of it to generate text. Would that also be "stealing"? And what if I extend my frequency table to trigrams? Where exactly do we draw the line?
[1] -- https://arxiv.org/pdf/2505.24832
Is this actually true? I think in many cases it is a replacement. Maybe not in the case of a famous fictional work like Harry Potter, but what about non-fiction books or "pulp" fiction?
Kind of feels like the bottom rungs of the ladder are being taken out. You either become J.K. Rowling or you starve, there's no room for modest success with AI on the table.
Because that is, and always has been, legal.
AI will have effectively frozen human progress in a snapshot of time because nothing created after LLMs became widely available will be trustworthy
So would I. You've just demonstrated one of the many reasons that any kind of LLM tax that redistributes money to supposedly aggrieved "creators" is a bad idea.
While by no means the only argument or even one of the top ones, if an author has a clearly differentiated product from LLM generated content (which all good authors do) why should they also get compensated because of the existence of LLMs? The whole thing is just "someone is making money in a way I didn't think about, not fair!"
AI companies trying to leverage their power and lobby governments to stiff paying people and thus increase profits is rent seeking behavior. They aren’t creating wealth by non payment, just trying to enrich themselves.
That’s the basic flaw in any argument around necessity.
Likewise much of the most important information to want to train on (research literature) was just straight up stolen from the public that paid for its creation already.
By contrast, the models being created from these works are obviously useful to people today. They are clearly a form of new wealth generation. The open-weights models are even an equitable way of doing so, and are competitive with the top proprietary models. Saying the model creators need to pay the people monopolizing generations-old work is the rent-seeking behavior.
Utility of older works drop off as science marches on and culture changes. The real secret of long copyright terms is they just don’t matter much. Steamboat Willy entered the public domain and for all practical purposes nothing changed. Chip 20 years off of current copyright terms and it starts to matter more, but still isn’t particularly important. Sure drop it down to say 5 years and that’s meaningful but now it’s much harder to be an author which means fewer books worth reading.
Even if you took all of that -- leave nothing for salaries, hardware, utilities, to say nothing of profit -- and applied it to the works in the training data, it would be approximately $1 each.
What is that good for? It would have a massive administrative cost and the authors would still get effectively nothing.
Google’s revenue was 300 billion with 100 billion in profits last year, the AI industry may never reach that size but 1$/person on the planet is only 8 billion dollars, drop that to 70% of people are online so your down to 5.6 billion.
That’s assuming you’re counting books and individual Facebook posts in any language equally. More realistically there’s only 12k professional journalists in the US but they create a disproportionate amount of value for AI companies.
Google is a huge conglomerate and a poor choice for making estimates because the bulk of their revenue comes from "advertising" with no obvious way to distinguish what proportion of that ad revenue is attributable to AI, e.g. what proportion of search ad revenue is attributable to being the same company that runs the ad network, and to being the default search in Android, iOS and Chrome? Nowhere near all of it or even most of it is from AI.
"Counting books and individual Facebook posts in any language equally" is kind of the issue. The links from the AI summary things are disproportionately not to the New York Times, they're more often to Reddit and YouTube and community forums on the site of the company whose product you're asking about and Stack Overflow and Wikipedia and random personal blogs and so on.
Whereas you might have written an entire book, and that book is very useful and valuable to human readers who want to know about its subject matter, but unless that subject matter is something the general population frequently wants to know about, its value in this context is less than some random Facebook post that provides the answer to a question a lot of people have.
And then the only way anybody is getting a significant amount of money is if it's plundering the little guy. Large incumbent media companies with lawyers get a disproportionate take because they're usurping the share of YouTube creators and Substack authors and forum posters who provided more in aggregate value but get squat. And I don't see any legitimacy in having it be Comcast and the Murdoch family who take the little guy's share at the cost of significant overhead and making it harder for smaller AI companies to compete with the bigger ones.
Like I said science has mostly been stolen, and has no business being copyrighted at all. The output of publicly funded research should immediately be public domain.
Anyway this is beside the point that model creation is wealth creation, and so by definition not rent-seeking. Lobbying for a government granted monopoly (e.g. copyright) is rent-seeking.
Economic viability and utility for AI training are closely linked. Exclude all written works including news articles etc from the last 25 years and your model will know nothing about Facebook etc.
It’s not as bad if you can exclude stuff from copyright and then use that, but your proposal would have obvious gaps like excluding works in progress.
I suppose we all exist in our own bubbles, but I don't know why anyone would need a model that knows about Facebook etc. In any case, it's not clear that you couldn't train on news articles? AFAIK currently the only legal gray area with training is when e.g. Facebook mass pirated a bunch of textbooks. If you legally acquire the material, fitting a statistical model to it seems unlikely to run afoul of copyright law. Even without news articles, it would certainly learn something of the existence of Facebook. e.g. we are discussing it here, and as far as I know you're free to use the Hacker News BigQuery dump to your liking. Or in my proposed world, comments would naturally not be copyrighted since no one would bother to register them (and indeed a nominal fee could be charged to really make it pointless to do so). I suppose it is an important point that in addition to registration, we should again require notices, maybe including a registration ID.
Give a post-facto grace period of a couple weeks/months to register a thing for copyright. This would let you cover any work in progress that gets leaked by registering it immediately, causing the leak to become illegal.
This is why we’re seeing paywalls go up: authors and publishers of textual content are seeing that they need to protect the value of their assets.
There’s zero chance that happened without the book being in their training corpus. Worse, there’s significant effort put into obscuring this.
https://www.kron4.com/news/technology-ai/anthropic-copyright...
“the authors alleged nearly half a million books had been illegally pirated to train AI chatbots...”
Finally, a settlement isn’t a “win” from a legal perspective. It’s money exchanged for dropping the case. In almost every settlement, there’s no admission of guilt or liability.
https://storage.courtlistener.com/recap/gov.uscourts.cand.43...
(not directing these questions at you specifically, though if you know I'd certainly love to hear your thoughts)
It's not supposed to do anything in particular. It's supposed to demonstrate to the public that lawmakers are Taking Action about this whole AI thing.
An earlier version of the bill had a bunch of aggressive requirements, most of which would have been bad. The version that passed is more along the lines of filing paperwork and new rules that are largely redundant with existing rules, which is wasteful and effectively useless. But that was the thing that satisfied the major stakeholders, because the huge corporations don't care about spending ~0% of their revenue on some extra paper pushers and the legislators now get to claim that they did something about the thing everybody is talking about.
I know it's fun and all to circle jerk about how greedy those darn bureaucrats are - but we're all aware they control the budget, right? They could just raise taxes.
I don't think they're fining companies... sigh... 10,000 dollars as some sort of sneaky "haha gotcha!" scam they're running.
Hard to tell on the interwebs so apologies if that wasn’t the intent.
But, I ask, should they do nothing?
The meat of the bill is that some government contractors are about to get very rich. And, if history reflects the future, some portion will be no-bid, to make sure the money goes to exactly who he wants it to go to: https://www.sacbee.com/opinion/editorials/article250348451.h...
Look at what the bill actually requires. Companies have to publish frameworks showing how they "mitigate catastrophic risk" and implement "safety protocols" for "dangerous capabilities." That sounds reasonable until you realize the government is now defining what counts as dangerous and requiring private companies to build systems that restrict those outputs.
The Supreme Court already settled this. Brandenburg gives us the standard: imminent lawless action. Add in the narrow exceptions like child porn and true threats, and that's it. The government doesn't get to create new categories of "dangerous speech" just because the technology is new.
But here we have California mandating that AI companies assess whether their models can "provide expert-level assistance" in creating weapons or "engage in conduct that would constitute a crime." Then they have to implement mitigations and report to the state AG. That's prior restraint. The state is compelling companies to filter outputs based on potential future harm, which is exactly what the First Amendment prohibits.
Yes, bioweapons and cyberattacks are scary. But the solution isn't giving the government power to define "safety" and force companies to censor accordingly. If someone actually uses AI to commit a crime, prosecute them under existing law. You don't need a new regulatory framework that treats information itself as the threat.
This creates the infrastructure. Today it's "catastrophic risks." Tomorrow it's misinformation, hate speech, or whatever else the state decides needs "safety mitigations." Once you accept the premise that government can mandate content restrictions for safety, you've lost the argument.
That's the problem.
I'm less worried about catastrophic risks than routine ones. If you want to find out how to do something illegal or dangerous, all an LLM can give you is a digest what's already available on line. Probably with errors.
The US has lots of hate speech, and it's mostly background noise, not a new problem.
"Misinformation" is more of a problem, because the big public LLMs digest the Internet and add authority with their picks. It's adding the authority of Google or Microsoft to bogus info that's a problem. This is a basic task of real journalism - when do you say "X happened", and when do you say "Y says X happened"? LLMs should probably be instructed to err in the direction of "Y says X happened".
"Safety" usually means "less sex". Which, in the age of Pornhub, seems a non-issue, although worrying about it occupies the time of too many people.
An issue that's not being addressed at all here is using AI systems to manipulate customers and provide evasive customer service. That's commercial speech and consumer rights, not First Amendment issues. That should be addressed as a consumer rights thing.
Then there's the issue of an AI as your boss. Like Uber.
Fixing social media is now a near impossible task as it has built up enough momentum and political influence to resist any kind of regulation that would actually be effective at curtailing its worst side effects.
I hope we don't make the same mistakes with generative AI
Both are evil, in combination so much more so. Neither should be trusted at all.
More and more people get information from LLMs. You should be horrified at the idea of giving the state control over what information people can access through them, because going by historical precedent there's 100% chance that the state would use that censorship power against the interests of its citizens.
Can they publish them by intentionally putting them into the latent space of an LLM?
What if they make an LLM that can only produce that text? What if they continue training so it contains a second text they intended to publish? And continue to add more? Does the fact that there's a collection change things?
These are genuine questions, and I have no clue what the answers are. It seems strange to treat a implementation of text storage so differently that you lose all rights to that text.
AIs do not have freedom of speech, and even if they did, it is entirely within the bounds of the Constitution to mitigate this freedom as we already do for humans. Governments currently define unprotected speech as a going concern.
But there's a contradiction hidden in your argument: requiring companies to _filter_ the output of AI models is a prior restraint on their speech, implying the companies do not have control over their own "speech" as produced by the models. This is absurd on its face; just as the argument that the output of my random Markov chain text generator is protected speech because I host the generator online.
There are reasonable arguments to make about censoring AI models, but freedom of speech ain't it, because their output doesn't quack like "speech".
SB 53 is different. It requires companies to implement filtering systems before anyone commits a crime or demonstrates criminal intent. Companies must assess whether their models can "provide expert-level assistance" in creating weapons or "engage in conduct that would constitute a crime," then implement controls to prevent those outputs. That's not punishing distribution to someone you know will commit a crime. It's mandating prior restraint based on what the government defines as potentially dangerous.
Brandenburg already handles this. If someone uses an AI to help commit a crime, prosecute them. If a company knowingly provides a service to facilitate imminent lawless action, that's already illegal. We don't need a regulatory framework that treats the capability itself as the threat.
The "AIs don't have speech rights" argument misses the point. The First Amendment question isn't about the AI's rights. It's about the government compelling companies (or anyone) to restrict information based on content. When the state mandates that companies must identify and filter certain types of information because the government deemed them "dangerous capabilities," that's a speech restriction on the companies.
And yes, companies control their outputs now. The problem is SB 53 removes that discretion by legally requiring them to "mitigate" government-defined risks. That's compelled filtering. The government is forcing companies to build censorship infrastructure instead of letting them make editorial choices.
The real issue is precedent. Today it's bioweapons and cyberattacks. But once we establish that government can mandate "safety" assessments and require mitigation of "dangerous capabilities," that framework applies to whatever gets defined as dangerous tomorrow.
> You have to actually know or intend the criminal use.
> If a company knowingly provides a service to facilitate imminent lawless action, that's already illegal.
And if I tell an AI chatbot that I'm intending to commit a crime, and somehow it assists me in doing so, the company behind that service should have knowledge that its service is helping people commit crimes. That's most of SB 53 right there: companies must demonstrate actual knowledge about what their models are producing and have a plan to deal with the inevitable slip-up.
Companies do not want to be held liable for their products convincing teens to kill themselves, or supplying the next Timothy McVeigh with bomb-making info. That's why SB 53 exists; this is not coming from concerned parents or the like. The tech companies are scared shitless that they will be forced to implement even worse restrictions when some future Supreme Court case holds them liable for some disaster that their AIs assisted in creating.
A framework like SB 53 gives them the legal basis to say, "Hey, we know our AIs can help do [government-defined bad thing], but here are the mitigations in place and our track record, all in accordance with the law".
> When the state mandates that companies must identify and filter certain types of information because the government deemed them "dangerous capabilities," that's a speech restriction on the companies.
Does the output of AI models represent the company's speech, or does it not? You can't have your cake and eat it too. If it does, then we should treat it like speech and hold companies responsible for it when something goes wrong. If it doesn't, then the entire First Amendment argument is moot.
> The government is forcing companies to build censorship infrastructure instead of letting them make editorial choices.
Here's the problem: the nature of LLMs themselves do not allow companies to fully implement their editorial choices. There will always be mistakes, and one will be costly enough to put AIs on the national stage. This is the entire reason behind SB 53 and the desire for a framework around AI technology, not just from the state, but from the companies producing the AIs themselves.
The "companies want this" argument is irrelevant. Even if true, it doesn't make prior restraint constitutional. The government can't delegate its censorship powers to willing corporations. If companies are worried about liability, the answer is tort reform or clarifying safe harbor provisions, not building state-mandated filtering infrastructure.
On whether AI output is the company's speech: The First Amendment issue here isn't whose speech it is. It's that the government is compelling content-based restrictions. SB 53 doesn't just hold companies liable after harm occurs. It requires them to assess "dangerous capabilities" and implement "mitigations" before anyone gets hurt. That's prior restraint regardless of whether you call it the company's speech or not.
Your argument about LLMs being imperfect actually proves my point. You're saying mistakes will happen, so we need a framework. But the framework you're defending says the government gets to define what counts as dangerous and mandate filtering for it. That's exactly the infrastructure I'm warning about. Today it's "we can't perfectly control the models." Tomorrow it's "since we have to filter anyway, here are some other categories the state defines as harmful."
Given companies can't control their models perfectly due to the nature of AI technology, that's a product liability question, not a reason to establish government-mandated content filtering.
Lucky for me, I am not. The company already has knowledge of each and every prompt and response, because I have read the EULAs of every tool I use. But that's beside the point.
Prior restraint is only unconstitutional if it is restraining protected speech. Thus far, you have not answered the question of whether AI output is speech at all, but have assumed prior restraint to be illegal in and of itself. We know this is not true because of the exceptions you already mentioned, but let me throw in another example: the many broadcast stations regulated by the FCC, who are currently barred from "news distortion" according to criteria defined by (checks notes) the government.
Prior restraint is presumptively unconstitutional. The burden is on the government to justify it under strict scrutiny. You don't have to prove something is protected speech first. The government has to prove it's unprotected and that prior restraint is narrowly tailored and the least restrictive means. SB 53 fails that test.
The FCC comparison doesn't help you. In Red Lion Broadcasting Co. v. FCC, the Supreme Court allowed broadcast regulation only because of spectrum scarcity, the physical limitation that there aren't enough radio frequencies for everyone. AI doesn't use a scarce public resource. There's no equivalent justification for content regulation. The FCC hasn't even enforced the fairness doctrine since 1987.
The real issue is you're trying to carve out AI as a special category with weaker First Amendment protection. That's exactly what I'm arguing against. The government doesn't get to create new exceptions to prior restraint doctrine just because the technology is new. If AI produces unprotected speech, prosecute it after the fact under existing law. You don't build mandatory filtering infrastructure and hand the government the power to define what's "dangerous."
Do books have freedom of speech? The same argument can then be used to censor parts of a book.
Imagine going to the library and the card catalog had been purged of any references to books that weren't government approved.
61 more comments available on Hacker News