Wasm 3.0 Completed
Key topics
The release of WASM 3.0 brings significant new features such as 64-bit address space, garbage collection, and exception handling, sparking excitement and discussion among developers about its potential applications and limitations.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
13m
Peak period
75
0-6h
Avg / period
13.3
Based on 160 loaded comments
Key moments
- 01Story posted
Sep 17, 2025 at 2:16 PM EDT
4 months ago
Step 01 - 02First comment
Sep 17, 2025 at 2:29 PM EDT
13m after posting
Step 02 - 03Peak activity
75 comments in 0-6h
Hottest window of the conversation
Step 03 - 04Latest activity
Sep 20, 2025 at 11:47 AM EDT
3 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.
> This is not simply due to a lack of optimization. Instead, the performance of Memory64 is restricted by hardware, operating systems, and the design of WebAssembly itself.
https://spidermonkey.dev/blog/2025/01/15/is-memory64-actuall...
Wow!
- https://github.com/WebAssembly/design/issues/1397
- https://github.com/WebAssembly/memory-control/issues/6
This is a crucial issue, as the released memory is still allocated by the browser.
Shrinking the memory object shouldn't require any special support from GC, just an appropriate API hook. It would, as always, be up to the application code running inside the module to ensure that if a shrink is done, that the program doesn't refer to memory addresses past the new endpoint.
If this hasn't been implemented yet, it's not because it's been waiting on GC, but more that it's not been prioritized.
WASM approach: start very low-level so C is definitely supported. Thus everything is supported, although every language has to roll its own high-level constructs. But over time more patterns can be standardised so languages can be interoperable within a polyglot WASM app.
1. Different languages have totally different allocation requirements, and only the compiler knows what type of allocator works best (e.g. generational bump allocator for functional languages, classic malloc style allocator for C-style languages).
2. This perhaps makes wasm less suitable for usage on embedded targets.
The best argument I can make for this is that they're trying to emulate the way that libc is usually available and provides a default malloc() impl, but honestly that feels quite weak.
> Work has been done for Java and Kotlin
I'm unaware of this development. What did they do? Did they create an interface to the GC specification in the draft proposal?
For Kotlin it's similar but the compiler backend is from Jetbrains themselves, targets Wasm and adapts the Kotlin runtime to use WasmGC: https://kotlinlang.org/docs/wasm-overview.html. https://seb.deleuze.fr/introducing-kotlin-wasm/ has some low level detail on how Kotlin works with WasmGC.
A bit more on Kotlin/Wasm here, seems like also Dart/Flutter uses WasmGC: https://developer.chrome.com/blog/wasmgc#kotlin_wasm
https://github.com/dotnet/runtime/issues/94420 has some notes on why C# can't use WasmGC (yet?).
By the way now you can generate WASM via Dlang compiler LDC [1].
[1] Generating WebAssembly with LDC:
https://wiki.dlang.org/Generating_WebAssembly_with_LDC
This was not necessary.. what a mistake, specially EH..
https://github.com/emscripten-core/emscripten/blob/main/syst...
so most GC-languages being ported to webassembly already have a GC, so what is the benefit of using a provided GC then?
on the other hand i see GC as a feature that could become part of any modern CPU. then the benefit would be large, as any language could use it and wouldn't have to implement their own at all anymore.
I'd think porting an existing GC to WASM is more effort than using WASM's GC for a GC'd language?
Garbage collection is a small part of the go run time, but it's not insignificant.
Skimming this issue, it seems like they weren't expecting to be able to use this GC. I know C# couldn't either, at least based on an earlier state of the proposal.
Whereas with a manual GC, if you had a JS object holding a reference to an object on your custom heap, and your heap holds a reference to that JS object (with indirections sprinkled in to taste) but nothing else references it, that'd result in a permanent memory leak, as both heaps would have to consider everything held by the other as GC roots; so you'd still be forced to manually avoid cycles despite only ever using GC'd languages. Wasm GC entirely avoids this problem.
The whole magic about CL's condition system is to keep on executing code in the context of a given condition instead of immediately unwinding the stack, and this can be done if you control code generation.
Everything else necessary, including dynamic variables, can be implemented on top of a sane enough language with dynamic memory management - see https://github.com/phoe/cafe-latte for a whole condition system implemented in Java. You could probably reimplement a lot of this in WASM, which now has a unwind-to-this-location primitive.
Also see https://raw.githubusercontent.com/phoe-trash/meetings/master... for an earlier presentation of mine on the topic. "We need means of unwinding and «finally» blocks" is the key here.
[0] https://spritely.institute/hoot/
But if you really need more than 4GB of memory, then sure, go ahead and use it.
[1] https://devblogs.microsoft.com/oldnewthing/20070801-00/?p=25...
https://gcc.gnu.org/onlinedocs/gcc/Named-Address-Spaces.html
Unfortunately the obvious `__attribute__((mode(...)))` errors out if anything but the standard pointer-size mode (usually SI or DI) is passed.
Or you may be able to do it based on x32, since your far pointers are likely rare enough that you can do them manually. Especially in C++. I'm pretty sure you can just call "foreign" syscalls if you do it carefully.
Especially how you could increase the segment value by one or the offset by 16 and you would address the same memory location. Think of the possibilities!
And if you wanted more than 1MB you could just switch memory banks[1] to get access to a different part of memory. Later there was a newfangled alternative[2] where you called some interrupt to swap things around but it wasn't as cool. Though it did allow access to more memory so there was that.
Then virtual mode came along and it's all been downhill from there.
[1]: https://en.wikipedia.org/wiki/Expanded_memory
[2]: https://hackaday.com/2025/05/15/remembering-more-memory-xms-...
Schulman’s Unauthorized Windows 95 describes a particularly unhinged one: in the hypervisor of Windows/386 (and subsequently 386 Enhanced Mode in Windows 3.0 and 3.1, as well as the only available mode in 3.11, 95, 98, and Me), a driver could dynamically register upcalls for real-mode guests (within reason), all without either exerting control over the guest’s memory map or forcing the guest to do anything except a simple CALL to access it. The secret was that all the far addresses returned by the registration API referred to the exact same byte in memory, a protected-mode-only instruction whose attempted execution would trap into the hypervisor, and the trap handler would determine which upcall was meant by which of the redundant encodings was used.
And if that’s not unhinged enough for you: the boot code tried to locate the chosen instruction inside the firmware ROM, because that will have to be mapped into the guest memory map anyway. It did have a fallback if that did not work out, but it usually succeeded. This time, the secret (the knowledge of which will not make you happier, this is your final warning) is that the instruction chosen was ARPL, and the encoding of ARPL r/m16, AX starts with 63 hex, also known as the ASCII code of the lowercase letter C. The absolute madmen put the upcall entry point inside the BIOS copyright string.
(Incidentally, the ARPL instruction, “adjust requested privilege level”, is very specific to the 286’s weird don’t-call-it-capability-based segmented architecture... But it’s has a certain cunning to it, like CPU-enforced __user tagging of unprivileged addresses at runtime.)
Isn’t that an arbitrary string, though? Presumably AMI and Insyde have different copyright messages, so then what?
If the search doesn’t succeed or if you’ve set SystemROMBreakPoint=off in the [386Enh] section of SYSTEM.INI[1] or run WIN /D:S, then the trap instruction will instead be placed in a hypervisor-provided area of RAM that’s shared among all guests, accepting the risk that a misbehaving guest will stomp over it and break everything (don’t know where it fits in the memory map).
As to the chances of failing, well, I suspect the original target was the c in “(c)”, but for example Schulman shows his system having the trap address point at “chnologies Ltd.”, presumably preceded by “Phoenix Te”. AMI and Award were both “Inc.”, so that would also work. Insyde wasn’t a thing yet; don’t know what happened on Compaq or IBM machines. One way or another, looks like a c could be found somewhere often enough that the Microsoft programmers were satisfied with the approach.
[1] https://jeffpar.github.io/kbarchive/kb/071/Q71264/
At least most people design non-overlaping segments. And I'm not sure wasm would gain anything from it, being a virtual machine instead of real.
This multi-memory setup reminds me of my array juggling I had to do back then. While intellectually challenging it was not fun at all.
With 64-bit pointers, you can't really reserve all the possible space a pointer might refer to. So you end up doing manual bounds checks.
Can't bounds checks be avoided in the vast majority of cases?
See my reply to nagisa above (https://news.ycombinator.com/item?id=45283102). It feels like by using trailing unmapped barrier/guard regions, one should be able to elide almost all bounds checks that occur in the program with a bit of compiler cleverness, and convert them into trap handlers instead.
Yeah, certainly compiler smarts can remove many bounds checks (in particular for small deltas, as you mention), hoist them, and so forth. Maybe even most of them in theory?
Still, there are common patterns like pointer-chasing in linked list traversal where you just keep getting an unknown i64 pointer, that you just need to bounds check...
With 64-bit addresses, and the requirements for how invalid memory accesses should work, this is no longer possible. AND-masking does not really allow for producing the necessary traps for invalid accesses. So every one now needs some conditional before to validate that this access is in-bounds. The addresses cannot be trivially offset either as they can wrap-around (and/or accidentally hit some other mapping.)
Why does it need to trap? Can't they just make it UB?
Specifying that invalid accesses always trap is going to degrade performance, that's not a 64-bit problem, that's a spec problem. Even if you define it in WASM, it's still UB in the compiler so you aren't saving anyone from UB they didn't already have. Just make the trapping guarantee a debug option only.
[1]: http://catb.org/jargon/html/N/nasal-demons.html
Seems like they got overly attached to the guaranteed trapping they got on 32-bit and wanted to keep it even though it's totally not worth the cost of bounds checking every pointer access. Save the trapping for debug mode only.
Maybe. Bugs that come from spooky behavior at a distance are notoriously hard to debug, especially in production, and it's worthwile to pay for it to avoid that.
The biggest contributor to pointer arithmetic is offset reads into pointers: what gets generated for struct field accesses.
The other class of cases are when you're actually doing more general pointer arithmetic - usually scanning across a buffer. These are cases that typically get loop unrolled to some degree by the compiler to improve pipeline efficiency on the CPU.
In the first case, you can avoid the masking entirely by using an unmapped barrier region after the mapped region. So you can guarantee that if pointer `P` is valid, then `P + d` for small d is either valid, or falls into the barrier region.
In the second case, the barrier region approach lets you lift the mask check to the top of the unrolled segment. There's still a cost, but it's spread out over multiple iterations of a loop.
As a last step: if you can prove that you're stepping monotonically through some address space using small increments, then you can guarantee that even if theoretically the "end" of the iteration might step into invalid space, that the incremental stepping is guaranteed to hit the unmapped barrier region before that occurs.
It's a bit more engineering effort on the compiler side.. and you will see some small delta of perf loss, but it would really be only in the extreme cases of hot paths where it should come into play in a meaningful way.
To operate on any other size, you need to insert extra instructions to mask addresses to the desired size before they are used.
On x86-64, the start of the linear memory is typically put into one of the two remaining segment registers: GS or FS. Then the code can simply use an address mode such as "GS:[RAX + RCX]" without any additional instructions for addition or bounds-checking.
Sounds about right. Guess 512 GiB menory is the minimum to read email nowadays.
For video editing, 4GiB of completely uncompressed 1080p video in memory is only 86 frames, or about 3-4 seconds of video. You can certainly optimize this, and it's rare to handle fully uncompressed video, but there are situations where you do need to buffer this into memory. It's why most modern video editing machines are sold with 64-128GB of memory.
In the case of Figma, we have files with over a million layers. If each layer takes 4kb of memory, we're suddenly at the limit even if the webapp is infinitely optimal.
How is that data stored?
Because (2^32)÷(1920×1080×4) = 518 which is still low but not 86 so I'm curious what I'm missing?
(2^32)÷(1920×1080×4×3×2) = 86
So glad you asked. It's stored poorly because I'm bad at maths and I'm mixing up bits and bytes.
That's what I get for posting on HN while in a meeting.
If we think we need a more thoroughly virtualized machine than traditional operating system processes give us (which I think is obvious), then we should be honest and build a virtualization abstraction that is actually what we want, rather than converting a document reader into a video editor…
I'm going to assume you're being sincere. But even the crustiest among us can recognize that the modern purpose for web browsers is not (merely) documents. Chances are, many folks on HN in the last month have booked tickets for a flight or bought a home or a car or watched a cat video using the "document browser".
> If we think we need a more thoroughly virtualized machine than traditional operating system processes give us (which I think is obvious)...
Like ... the WASM virtual machine? What if the WASM virtual machine were the culmination of learning from previous not-quite-good-enough VMs?
WASM -- despite its name -- is not truly bound to the "document" browser.
Would you install a native app to book a flight? One for each company? Download updates for them every now and then, uninstall them when you run out of disk space etc
I can ask the same question about every other activity we do in these non-native apps.
Unfortunely several of them are glorified webviews.
I am old enough to have lived through the days Internet meant a set of networking protocols, not ChromeOS Platform.
And on those days hard disks were still bloody expensive, by the way.
Isn't your phone providing a sandbox, a distribution system, a set of common runtime services, etc to get these native apps functional?
You don't have to squint your eyes to realize that this thing we call "document browsers" are doing a lot of the same work that Apple/Google are doing with their mobile OSes.
All the OS frameworks that are available across most operating systems that don't fragment themselves into endless distributions?
My dear Lord! In what world are you living in?
Take a look at all of the "mobile apps" you installed on your phone and tell me which of those would ever devote any resource to make a apt/rpm repository for their desktop applications.
Even the ones that want to have a desktop application can not figure out how to reliably distribute their software. The Linux crowd itself is still at the flatpak vs AppImage holy war. Mark Shuttleworth is still beating the snap horse.
The Web as a platform is far from ideal, but if it weren't for it I would never been able to switch to Linux as my primary base OS and I would have to accept the Apple/Microsoft/Google oligopoly, just like we are forced to do it at the mobile space.
As my old IT teacher said: you can use the browser on any OS. She also implied it requires no special skills, which is true if you are limited to the browser for the majority of the time.
So... are you saying that you are able to use Linux because all you are using is the browser?
But for some reason this takes 20M lines of code, which creates a moat that prevents browser competition.
(* including every web browser)
I am still shocked Google has not rubbed two brain cells together and built a serious Google ChromeOS version for developers with a real desktop environment and real access to Linux, and keeping the browser as sandboxed as they have. I would spend top dollar on such a laptop. Heck it could come with an easy way to install Android Studio, and native apps for things like Hangouts or whatever they call it now.
I could write those exact same bindings for my language that I will compile to wasm and then use the current WASI interface, but even that is pointless because at that point I have written a native app, what good reason would I need to run it through an emulator, specifically when a modern OS is already sandboxing things.
If I am targeting the browser my above point stands, unless the DOM is already a reasonably decent API for what needs to be built(it probably isn’t, it’s the best example you can get of horrible API design in my opinion) then I will need to build something ontop of that, prime example being react.
So I need to run my code in an interpreter, having built data structures to generate a document in an eDSL, which creates other data structures, which then calls into another API, which then calls into the OS/hardware layer.
When what I want to do, is call into the OS/hardware layer directly…
One good reason would be that “Which OS specifically?” is a question with at least 4-5 mainstream answers (and even more not-so-mainstream answers). That's what motivated the push from WWW-as-a-bunch-of-interconnected-documents to WWW-as-a-bunch-of-remote-hosted-apps in the first place: to be able to develop and deliver applications without having to worry quite so much about the client platform.
WASM is in this regard analogous to the olden days of Java/Silverlight/Flash, with many of the same tradeoffs (and some of the rough edges filed down due to lessons learned around browser integration and toolchains and such).
When using the web I have my own issues with that platform, or more specifically using wasm in it since it is literally useless for building web based applications over JS/TS. It is being targeted more as a library platform which gets called into from JS than being an actual “assembly “ language.
I don't understand any such thing.
Modern OS sandboxing doesn't give you memory safety from Wasm's memory model or the capability-based security of WASI.
>> When what I want to do, is call into the OS/hardware layer directly…
If that's what you want to do, then I'm not sure what we're even discussing in this thread. The only safe way to run such code is with a hypervisor.
If you care about memory safety then use a “memory safe” language that guarantees the exact same thing as what you thing WASM guarantees except without all the pointless overhead or running a sandboxed interpreter inside a sandboxed operating system process, it is actually just pointless complexity at that point.
For native development WASM gives no benefits, the only useful parts that WASM might bring literally hasn’t been standardised because it’s a real hard problem to solve and has no use in browser.
So wasm is designed for the browser and unless you only intend to embed a library in your existing JS application it is pointless because you are still restricted to the DOM.
Mind you, I think WASM is the best thing that has happened to the browser, but that's because I think the HTML/DOM is completely unsuitable for apps, and I hate developing with it so much that I won't do it, even if I have to switch careers.
I think WASM is a reasonable start to a proper virtual machine. But I think what we need is a "browser" that presents a virtual machine with a virtual monitor(s), virtual chunk of contiguous memory, virtual CPU(s), virtual networking, virtual filesystem, and basic drawing primitives, that executes WASM. The "browser" part would be that you can use a URL to point to a WASM program. The program would see itself as being the only thing on the machine (see, basically generalization of the OS process concept into a machine). The user would be able to put limits on what the virtual network can access, what parts of the OS filesystem the virtual filesystem could access, how many real CPUs (and a cpulimit) the virtual CPUs get, etc. So, sort of like a containerized Plan9. I would be happy to do this myself, but I just don't have the spare time to do it, so unless someone is willing to fund me, I'll hope someone sees this and catches the vision.
Using WASM in the web browser is a workaround.
https://donhopkins.medium.com/alan-kay-on-should-web-browser...
>Alan Kay answered: “Actually quite the opposite, if “document” means an imitation of old static text media (and later including pictures, and audio and video recordings).”
"virtual machine" is clearly not
that said, i love WASM in the browser, high time wrapping media with code to become "new media" wasn't stuck solely with a choice between JS and "plugins" like Java, Flash, or Silverlight
it's interesting to look back at a few "what might have been" alternate timelines, when the iPhone was intended to launch as an HTML app platform, or Palm Pre (under a former Apple exec, the "pod-father") intended the same with WebOS. if a VM running a web OS shows a PDF or HTML viewer in a frame, versus if a HTML viewer shows a VM running a web OS in a frame...
we're still working on figuring out whether new media and software distribution are the same.
today, writing Swift, or Nim, or whatever LLVM, and getting WASM -- I agree with you, feels like a collective convergence on the next step of common denominator
* note: those are all documents and document workflows with skeuomorphic analogs in the same headspace, and newspaper with "live pictures" has been a sci-fi trope for long enough TV news still can't bring themselves to say "video" (reminding us "movie" is to "moving" as "talkie" was to "talking") so extending document to include "media" is reasonable. but extending that further to be "arbitrary software" is no longer strictly document nor media
I do agree that we tend to run a lot in a web-browser or browser environment though. It seems like a pattern that started as a hack but grew into its own thing through convenience.
It would be interesting to sit down with a small group and figure out exactly what is good/bad about it and design a new thing around the desired pattern that doesn't involve a browser-in-the-loop.
Heh, reminds me of those boxes Sun used to make that only ran Java. (I don’t know how far down Java actually went; perhaps it was Solaris for the lower layers now that I think about it…)
I do miss the Solaris 10/OpenSolaris tech though. I don’t know anything that comes close to it today.
dtrace/zones/smf/zfs/iscsi/... and the integration between them all was top notch. One could create a zone, spin up a clone, do some computation, trash the filesystem and then just throw the clone away... in very short time. Also, that whole loop happened without interacting with zfs directly; I know that some of these things have been ported but the ports miss the integration.
eg: zfs on Linux is just a filesystem. zfs on Solaris was the base of a bunch of technology. smf tied much of it together.
eg: dtrace gave you access all the way down to individual read/write operations per disk in a raid-z and all the way up to the top of your application running inside of a zone. One tool with massive reach and very little overhead.
Not much compels me to go back to the ecosystem; I've been burned once already.
"What if we made a new WASM-based platform for apps, separate from the browser?"
And this comes from someone who started with Flash, built actual video editing apps with it, and for the last 25 years build application with "it's not a web app, it's a desktop app that lives in a browser" attitude [1].
Even with Flash we often used hybrid approach where you had two builds from same codebase - a lite version running in the browser and an optional desktop app (AIR) with full functionality. ShareObjects and LocalConnection made this approach extremely feasible as both instances were aware of each other and you could move data and objects between them in real time.
The premise is great, but it was never fully realized - sure you have few outliers like Figma, but building a real "desktop app" in a browser comes with a lot of quirks, and the resulting UX is just terrible in most cases.
[1] just to be clear, there's a huge difference between web page and web app ;D
[1] https://news.ycombinator.com/item?id=45200414
334 more comments available on Hacker News