Why the Sanitizer API Is Just `sethtml()`

Postedabout 1 month agoActive25 days ago

birdculture

131 points

57 comments

frederikbraun.deTech DiscussionstoryHigh profile

informativeneutral

Debate

20/100

HTML SanitizationApi_designWeb Development

Key topics

HTML Sanitization

Api_design

Web Development

The debate rages on about the Sanitizer API and its proposed implementation as `setHTML()`, with some commenters praising the author's insight while others criticize the post for lacking context and assuming prior knowledge. The discussion reveals a divide between those who see `setHTML()` as a vital tool for rendering dynamic content and those who question its limitations, particularly with regards to handling unusual HTML elements like `<svg>`. As commenters dig into the article's claims, they uncover potential security concerns, such as "mutation XSS," and differing opinions on whether to sanitize user-provided HTML on the server or client-side. The thread remains relevant as developers weigh in on the implications of this emerging browser API.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

36-48h

Avg / period

10.3

Comment distribution62 data points

Loading chart...

Based on 62 loaded comments

Key moments

01Story posted
Dec 8, 2025 at 5:37 PM EST
about 1 month ago
Step 01
02First comment
Dec 10, 2025 at 12:14 PM EST
2d after posting
Step 02
03Peak activity
43 comments in 36-48h
Hottest window of the conversation
Step 03
04Latest activity
Dec 15, 2025 at 8:52 AM EST
25 days ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (57 comments)

Showing 62 comments

brainbag

about 1 month ago

2 replies

With context, this article is more interesting than the title might imply.

> The Sanitizer API is a proposed new browser API to bring a safe and easy-to-use capability to sanitize HTML into the web platform [and] is currently being incubated in the Sanitizer API WICG, with the goal of bringing this to the WHATWG.

Which would replace the need for sanitizing user-entered content with libraries like DOMPurify by having it built into the browser's API.

The proposed specification has additional information: https://github.com/WICG/sanitizer-api/

mubou2

about 1 month ago

5 replies

The author really needs to start with that. They say "the API that we are building" and assume I know who they are and what they're working on, all the way until the very bottom. I just assumed it's some open source library.

> HTML parsing is not stable and a line of HTML being parsed and serialized and parsed again may turn into something rather different

Are there any examples where the first approach (sanitize to string and set inner html) is actually dangerous? Because it's pretty much the only thing you can do when sanitizing server-side, which we do a lot.

tobr

about 1 month ago

1 reply

> They say "the API that we are building" and assume I know who they are and what they're working on, all the way until the very bottom.

This is a common and rather tiresome critique of all kinds of blog posts. I think it is fair to assume the reader has a bit of contextual awareness when you publish on your personal blog. Yes, you were linked to it from a place without that context, but it’s readily available on the page, not a secret.

mubou2

30 days ago

3 replies

Well that's... certainly a take. But I have to disagree. Most traffic coming to blog posts is not from people who know you and are personally following your posts, they're from people who clicked a link to the article someone shared or found it while googling something.

It's not hard to add one line of context so readers aren't lost. Here, take this for example, combining a couple parts of the GitHub readme:

> For those who are unfamiliar, the Sanitizer API is a proposed new browser API being incubated in the Sanitizer API WICG, with the goal of bringing this to the WHATWG.

Easy. Can fit that in right after "this blog post will explain why", and now everyone is on the same page.

swiftcoder

30 days ago

1 reply

> Most traffic coming to blog posts is not from people who know you and are personally following your posts

Do we have data to back that up? Anecdotally the blogs I have operated over the years tend to mostly sustain on repeat traffic from followers (with occasional bursts of external traffic if something trends on social media)

rerdavies

30 days ago

1 reply

[delayed]

swiftcoder

29 days ago

> (More than a bit irritated by the "Do you have data to back that up" thing, given that you don't really have data to back up your position).

It wasn't necessarily a request for you personally to provide data. I'm curious if any larger blog operators have insight here.

"person who only reads the 0.001% of blog posts that reach the HN front page" is not terribly interesting as an anecdotal source on blog traffic patterns

tobr

30 days ago

1 reply

> It's not hard

It’s also not hard to look around for a few seconds to find that information, is my point.

rerdavies

30 days ago

[delayed]

gen6acd60af

30 days ago

Why do you assume that everyone is writing for you the hypothetical uninformed reader (who can write this comment on HN but can't do simple research?)

Will you comment this on every blogpost written for a specific audience that happens to be posted to HN?

LegionMammal978

30 days ago

2 replies

They had a link in their post [0]: it seems like most of the examples are with HTML elements with wacky parsing semantics such as <svg> or <noscript>. Their recommendation for server-side sanitization is "lol, don't". Personally, my recommendation in most cases would be "maintain a strict list of common elements/attributes to allow in the serialized form, and don't put anything weird in that list: if a serialize-parse roundtrip would break something, then you're allowing too much".

[0] https://www.sonarsource.com/blog/mxss-the-vulnerability-hidi...

tlb

30 days ago

1 reply

setHTML needs to support just about every element if it's going to be the standard way of rendering dynamic content. Certainly <svg> has to work or the API isn't useful.

SanitizeHTML functions in JS have had big security holes before, around edge cases like null bytes in values, or what counts as a space in Unicode. Browsers decided to be lenient in what they accept, so that means any serialize-parse chain creates some risk.

LegionMammal978

30 days ago

1 reply

If you're rendering dynamic HTML, then either the source is authorized to insert arbitrary dynamic content onto the domain, or it isn't. And if it isn't, then you'll always have a hard time unless you're as strict as possible with your sanitization, given how many nonlocal effects can be embedded into an HTML snippet.

The more you allow, the less you know about what might happen. E.g., <svg> styling can very easily create clickjacking attacks. (If I wanted to allow SVGs at all, I'd consider shunting them into <img> tags with data URLs.) So anyone who does want to use these more 'advanced' features in the first place had better know what they're doing.

bffjjfjf

30 days ago

1 reply

That overly reductive thinking can go back to the 80s before we had learned any lessons. There are degrees of trust. Binary thinking invites dramatic all or nothing failures.

LegionMammal978

29 days ago

And my point is that with HTML, there's always an extremely fine line between allowing "almost nothing" and "almost all of it" when it comes to sanitization. I'd love to live in a world where there are natural delineations of features that can safely be flipped on or off depending on how much control you want to give the source over the content, but in practice, there are dozens of HTML/CSS features (including everything in the linked article) that do wacky stuff that can cross over the lines.

mubou2

30 days ago

[delayed]

rebane2001

29 days ago

> Because it's pretty much the only thing you can do when sanitizing server-side

I'd suggest not sanitizing user-provided HTML on the server. It's totally fine to do if you're fully sanitizing it, but gets a little sketchy when you want to keep certain elements and attributes.

crote

30 days ago

> Are there any examples where the first approach (sanitize to string and set inner html) is actually dangerous?

The article links to [0], which has some examples of instances in which HTML parsing is context-sensitive. The exact same string being put into a <div> might be totally fine, while putting it inside a <style> results in XSS.

[0]: https://www.sonarsource.com/blog/mxss-the-vulnerability-hidi...

masklinn

30 days ago

> Are there any examples where the first approach (sanitize to string and set inner html) is actually dangerous?

The term to look for is “mutation xss” (or mxss).

crote

about 1 month ago

2 replies

Yeah, I was expecting something closer to "because that's what people Google for".

A big part of designing a security-related API is making it really easy and obvious to do the secure thing, and hide the insecure stuff behind a giant "here be dragons" sign. You want people to accidentally do the right thing, so you call your secure and insecure functions "setHTML" and "setUnsafeHTML" instead of "setSanitizedHTML" and "setHTML".

cess11

30 days ago

1 reply

get_magic_quotes_gpc() and mysql_real_escape_string() had quite a bit to teach in this area.

some_furry

30 days ago

3 replies

Both of those functions were deprecated years ago.

mysql_real_escape_string() was removed in PHP 7.0.

get_magic_quotes_gpc() was removed in PHP 8.0.

https://www.php.net/mysql_real_escape_string

https://www.php.net/get_magic_quotes_gpc

The current minimum PHP version that is supported for security fixes by the PHP community is 8.1: https://www.php.net/supported-versions.php

If you're still seeing this in 2025 (going on 2026), there are other systemic problems at play besides the PHP code.

garaetjjte

29 days ago

1 reply

mysql_real_escape_string is only deprecated because there is mysqli_real_escape_string. I always wondered why it's "real"...like is there "fake" version of it?

cess11

28 days ago

Yes.

https://www.php.net/manual/en/function.mysql-escape-string.p...

https://stackoverflow.com/questions/3665572/mysql-escape-str...

One hardly even tries to do the thing it says on the tin, the other one at least tries to be the real thing. None of them worked very well, however.

tacone

29 days ago

Decades ago.

cess11

29 days ago

Hence why I chose "had" for my previous comment.

guessmyname

30 days ago

100%… it’s like Rust reqwest package naming things like danger_accept_invalid_certs(true) and danger_accept_invalid_hostnames(true)

nayuki

about 1 month ago

3 replies

> HTML parsing is not stable and a line of HTML being parsed and serialized and parsed again may turn into something rather different

This is why people should really use XHTML, the strict XML dialect of HTML, in order to avoid these nasty parsing surprises.

bayesnet

30 days ago

1 reply

I don’t think this is right. XHTML guarantees well-formedness (matched closing tags et al) but doesn’t do anything for validity. It’s not semantically valid for <td> to be a direct child of <table>, so the user agent has to make the call as to what to display regardless of the (X)HTML flavor. The alternative is parsing failure on improperly nested HTML which I don’t think is desirable.

intrasight

30 days ago

1 reply

> The alternative is parsing failure on improperly nested HTML which I don’t think is desirable.

It was that decision that resulted in the current mess. Browser vendors could have given us a grace period to fix HTML that didn't validate against the schema. Instead they said "there is no schema"

bayesnet

30 days ago

1 reply

The issue as I see it is that XML schemas are fine[0] for immutable documents but not suited for dynamic content. As a user it would be extraordinarily frustrating for a site or web app to break midflow because of a schema validation failure after a setHTML call or something.

[0]: I’ve worked with XML schemas a lot and have grown to really dislike them actually but that’s neither here nor there

intrasight

25 days ago

Users would be angry at any buggy software. Yours is just another example of buggy software.

Strongly typed software is a pain but it's benefits are starting to be recognized again. Unfortunately it's too late for XML.

favorited

30 days ago

You might as well complain about Betamax. XHTML is not the future.

recursive

30 days ago

HTML is also a standardized language.

philipwhiuk

about 1 month ago

5 replies

The downside of a new method is that it leaves innerHtml as a source of future security issues.

cluckindan

about 1 month ago

1 reply

Yes, one could simply make a setter for innerHTML which calls setHTML(). No code changes needed.

masklinn

30 days ago

2 replies

That breaks existing usages of innerhtml which may legitimately need its more dangerous features.

cxr

30 days ago

1 reply

[delayed]

masklinn

30 days ago

1 reply

> It seems obvious

Doesn't seem obvious unless your dutch.

Especially as the first things I would think obvious is: if breaking the behaviour of innerHTML is not a concern for your software why keep it at all? Delete the property or make it readonly.

cxr

30 days ago

1 reply

[delayed]

shadowgovt

28 days ago

[delayed]

cluckindan

29 days ago

Maybe I should have included the word ”monkeypatch” in the comment.

wbobeirne

30 days ago

1 reply

I feel like calling this a downside implies there's an alternative, but there's no way that `innerHtml`'s behavior could be changed. There are a lot of valid reasons for arbitrary HTML to be set, and changing that would break so many things.

cortesoft

30 days ago

There could be a better name for it? like `innerSanitizedHTML` or something, that makes it clear what the difference between the two calls are. There is nothing in the wording of setHTML that makes it clear it sanitizes where innerHTML doesn't.

rictic

29 days ago

You can disable it for your site using a trusted types content security policy.

uallo

30 days ago

Combine it with enforced Trusted Types:

https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Typ...

crote

30 days ago

Yes, but you can also easily lint on it: all uses of `context.innerHTML` are now suspect and should get a suggestion to use `context.setHTML` instead.

With `const clean = DOMPurify.sanitize(input); context.innerHTML = clean;` your linter suddenly needs to do complex code analysis and keep track if each variable passed to `context.innerHTML` is clean or tainted.

jamesbvaughan

30 days ago

1 reply

Aside from the article's content, I really like the inline exercise for the reader with the hidden/expandable answer section. It's fun and it successfully got me to read the proceeding section more closely than I would have otherwise.

QuercusMax

30 days ago

Same; it was a very easy question once you actually read the previous section instead of just skimming over it. :D

cxr

30 days ago

1 reply

[delayed]

bffjjfjf

30 days ago

It sounds like you didn’t understand the post… or more likely, didn’t even read it.

IshKebab

30 days ago

1 reply

> Traverse the HTML fragment and remove elements as configured.

Well this is clearly wrong isn't it? You need a whitelist of elements, not a blacklist. That lesson is at least 2 decades old.

jkrems

30 days ago

1 reply

I mean... "as configured" can me either an allow OR a denylist. That sentence doesn't really prescribe doing it one way or the other..? You have to parse the denylisted elements because they will affect the rest of the parse, so you _have_ to remove them afterwards in the general case.

IshKebab

30 days ago

Looks like it supports both actually: https://wicg.github.io/sanitizer-api/#sanitization

That's better than only supporting `removeElements`, but it really shouldn't support it at all.

socketcluster

30 days ago

1 reply

This is a good API. I hope it gains adoption in at least one browser, that way other browsers which don't adopt it will be called 'insecure'... Which would be warranted IMO... People have been wanting the ability to inject safe HTML for almost as long as JavaScript existed.

embedding-shape

30 days ago

> CSP is nasty

Despite the very graphical description, I still don't understand why you don't like CSP. As the server owner, you set your own CSP rules, and if you don't want anything removed, don't configure it like that? It's all opt-in.

Obviously it doesn't fix all classes of potential security issues, but neither would anything else either, it's just one piece of the puzzle.

dec0dedab0de

30 days ago

1 reply

So is the usecase for this that you save un trusted html from your user in your database, then send that untrusted html to your users, but in the front end parse it down to just the safe bits?

I think maybe a better api would be to add an unsafe html tag so it would look something like:

    <unsafe>
    all unsafe code here
    </unsafe>

Then if the browsers do indeed support it, it would work even without javascript.

But in any case, you really should be validating everything server side.

isaachinman

30 days ago

So... An iframe?

twodave

30 days ago

Interesting, though not really a replacement for server-side sanitization. But as another layer of defense? Sure. I could see it useful especially in RTEs.

bikeshaving

30 days ago

This is interesting. The argument which I’m gleaning from the essay is that the old proposed API of having an intermediary new Sanitizer() class with a sanitize(input) method which returns a string is actually insecure because of mutated XSS (MXSS) bugs.

The theory is that the parse->serialize->parse round-trip is not idempotent and that sanitization is element context-dependent, so having a pure string->string function opens a new class of vulnerabilities. Having a stateful setHTML() function defined on elements means the HTML context-specific rules for tables, SVG, MathML etc. are baked in, and eliminates double-parsing errors.

Are MXSS errors actually that common?

jagged-chisel

30 days ago

I think this API makes more sense from another standpoint as well.

You don’t want developers trying to rely on client-only sanitization for user input submitted to the server. Sanitizing while setting a user-face UI makes sense.

cobbal

about 1 month ago

Makes sense. I think this is a variant of the "parse, don't validate" motto, but is more "parse, don't parse-serialize-parse" in the implementation.

View full discussion on Hacker News

ID: 46198606Type: storyLast synced: 12/11/2025, 10:35:18 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN