Why the Sanitizer API Is Just `sethtml()`
Key topics
The debate rages on about the Sanitizer API and its proposed implementation as `setHTML()`, with some commenters praising the author's insight while others criticize the post for lacking context and assuming prior knowledge. The discussion reveals a divide between those who see `setHTML()` as a vital tool for rendering dynamic content and those who question its limitations, particularly with regards to handling unusual HTML elements like `<svg>`. As commenters dig into the article's claims, they uncover potential security concerns, such as "mutation XSS," and differing opinions on whether to sanitize user-provided HTML on the server or client-side. The thread remains relevant as developers weigh in on the implications of this emerging browser API.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
2d
Peak period
43
36-48h
Avg / period
10.3
Based on 62 loaded comments
Key moments
- 01Story posted
Dec 8, 2025 at 5:37 PM EST
about 1 month ago
Step 01 - 02First comment
Dec 10, 2025 at 12:14 PM EST
2d after posting
Step 02 - 03Peak activity
43 comments in 36-48h
Hottest window of the conversation
Step 03 - 04Latest activity
Dec 15, 2025 at 8:52 AM EST
25 days ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
> The Sanitizer API is a proposed new browser API to bring a safe and easy-to-use capability to sanitize HTML into the web platform [and] is currently being incubated in the Sanitizer API WICG, with the goal of bringing this to the WHATWG.
Which would replace the need for sanitizing user-entered content with libraries like DOMPurify by having it built into the browser's API.
The proposed specification has additional information: https://github.com/WICG/sanitizer-api/
> HTML parsing is not stable and a line of HTML being parsed and serialized and parsed again may turn into something rather different
Are there any examples where the first approach (sanitize to string and set inner html) is actually dangerous? Because it's pretty much the only thing you can do when sanitizing server-side, which we do a lot.
This is a common and rather tiresome critique of all kinds of blog posts. I think it is fair to assume the reader has a bit of contextual awareness when you publish on your personal blog. Yes, you were linked to it from a place without that context, but it’s readily available on the page, not a secret.
It's not hard to add one line of context so readers aren't lost. Here, take this for example, combining a couple parts of the GitHub readme:
> For those who are unfamiliar, the Sanitizer API is a proposed new browser API being incubated in the Sanitizer API WICG, with the goal of bringing this to the WHATWG.
Easy. Can fit that in right after "this blog post will explain why", and now everyone is on the same page.
Do we have data to back that up? Anecdotally the blogs I have operated over the years tend to mostly sustain on repeat traffic from followers (with occasional bursts of external traffic if something trends on social media)
It wasn't necessarily a request for you personally to provide data. I'm curious if any larger blog operators have insight here.
"person who only reads the 0.001% of blog posts that reach the HN front page" is not terribly interesting as an anecdotal source on blog traffic patterns
It’s also not hard to look around for a few seconds to find that information, is my point.
Will you comment this on every blogpost written for a specific audience that happens to be posted to HN?
[0] https://www.sonarsource.com/blog/mxss-the-vulnerability-hidi...
SanitizeHTML functions in JS have had big security holes before, around edge cases like null bytes in values, or what counts as a space in Unicode. Browsers decided to be lenient in what they accept, so that means any serialize-parse chain creates some risk.
The more you allow, the less you know about what might happen. E.g., <svg> styling can very easily create clickjacking attacks. (If I wanted to allow SVGs at all, I'd consider shunting them into <img> tags with data URLs.) So anyone who does want to use these more 'advanced' features in the first place had better know what they're doing.
I'd suggest not sanitizing user-provided HTML on the server. It's totally fine to do if you're fully sanitizing it, but gets a little sketchy when you want to keep certain elements and attributes.
The article links to [0], which has some examples of instances in which HTML parsing is context-sensitive. The exact same string being put into a <div> might be totally fine, while putting it inside a <style> results in XSS.
[0]: https://www.sonarsource.com/blog/mxss-the-vulnerability-hidi...
The term to look for is “mutation xss” (or mxss).
A big part of designing a security-related API is making it really easy and obvious to do the secure thing, and hide the insecure stuff behind a giant "here be dragons" sign. You want people to accidentally do the right thing, so you call your secure and insecure functions "setHTML" and "setUnsafeHTML" instead of "setSanitizedHTML" and "setHTML".
mysql_real_escape_string() was removed in PHP 7.0.
get_magic_quotes_gpc() was removed in PHP 8.0.
https://www.php.net/mysql_real_escape_string
https://www.php.net/get_magic_quotes_gpc
The current minimum PHP version that is supported for security fixes by the PHP community is 8.1: https://www.php.net/supported-versions.php
If you're still seeing this in 2025 (going on 2026), there are other systemic problems at play besides the PHP code.
https://www.php.net/manual/en/function.mysql-escape-string.p...
https://stackoverflow.com/questions/3665572/mysql-escape-str...
One hardly even tries to do the thing it says on the tin, the other one at least tries to be the real thing. None of them worked very well, however.
This is why people should really use XHTML, the strict XML dialect of HTML, in order to avoid these nasty parsing surprises.
It was that decision that resulted in the current mess. Browser vendors could have given us a grace period to fix HTML that didn't validate against the schema. Instead they said "there is no schema"
[0]: I’ve worked with XML schemas a lot and have grown to really dislike them actually but that’s neither here nor there
Strongly typed software is a pain but it's benefits are starting to be recognized again. Unfortunately it's too late for XML.
Doesn't seem obvious unless your dutch.
Especially as the first things I would think obvious is: if breaking the behaviour of innerHTML is not a concern for your software why keep it at all? Delete the property or make it readonly.
https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Typ...
With `const clean = DOMPurify.sanitize(input); context.innerHTML = clean;` your linter suddenly needs to do complex code analysis and keep track if each variable passed to `context.innerHTML` is clean or tainted.
Well this is clearly wrong isn't it? You need a whitelist of elements, not a blacklist. That lesson is at least 2 decades old.
That's better than only supporting `removeElements`, but it really shouldn't support it at all.
Despite the very graphical description, I still don't understand why you don't like CSP. As the server owner, you set your own CSP rules, and if you don't want anything removed, don't configure it like that? It's all opt-in.
Obviously it doesn't fix all classes of potential security issues, but neither would anything else either, it's just one piece of the puzzle.
I think maybe a better api would be to add an unsafe html tag so it would look something like:
Then if the browsers do indeed support it, it would work even without javascript.But in any case, you really should be validating everything server side.
The theory is that the parse->serialize->parse round-trip is not idempotent and that sanitization is element context-dependent, so having a pure string->string function opens a new class of vulnerabilities. Having a stateful setHTML() function defined on elements means the HTML context-specific rules for tables, SVG, MathML etc. are baked in, and eliminates double-parsing errors.
Are MXSS errors actually that common?
You don’t want developers trying to rely on client-only sanitization for user input submitted to the server. Sanitizing while setting a user-face UI makes sense.