Writing a Risc-V Emulator in Rust

Posted2 months agoActive2 months ago

signa11

124 points

54 comments

book.rvemu.appTechstoryHigh profile

calmpositive

Debate

40/100

Risc-VEmulator DevelopmentRust Programming

Key topics

Risc-V

Emulator Development

Rust Programming

The post shares a resource for writing a RISC-V emulator in Rust, sparking discussion on the project's scope, design choices, and alternative approaches to emulator development.

Snapshot generated from the HN discussion

Discussion Activity

Very active discussion

First comment

Peak period

6-12h

Avg / period

6.6

Comment distribution46 data points

Loading chart...

Based on 46 loaded comments

Key moments

01Story posted
Oct 26, 2025 at 3:34 AM EDT
2 months ago
Step 01
02First comment
Oct 26, 2025 at 5:29 AM EDT
2h after posting
Step 02
03Peak activity
27 comments in 6-12h
Hottest window of the conversation
Step 03
04Latest activity
Oct 28, 2025 at 10:05 AM EDT
2 months ago
Step 04

Generating AI Summary...

Analyzing up to 500 comments to identify key contributors and discussion patterns

Discussion (54 comments)

Showing 46 comments of 54

quantummagic

2 months ago

1 reply

Just a heads-up that only the first three chapters are available so far. Apparently there will be ten, when finished.

what

2 months ago

Those chapters were written 5 years ago, doubt they’ll get around to finishing it. But I think the repo has code for all 10?

sylware

2 months ago

4 replies

It's not the right move, better do it in assembly. I have a little rv64 interprer written in x86_64 assembly.

andsoitis

2 months ago

1 reply

> It's not the right move, better do it in assembly. I have a little rv64 interprer written in x86_64 assembly.

Published your source code?

sylware

2 months ago

Affero GPLv3 work-in-process there, I use it for my own commands written in rv64 running on x86_64 everday (warning: it depends on a new executable file format and an ELF capsule). Currently slow-writting my own wayland compositor for AMD GPU using it. (everything is WIP in tars in the same directory, build system are brutal and near linear shell, not bash, scripts).

https://qocketgit.com/useq/sylwaqe/nyanlinux/souqce/tqee/bqa...

Replace q(s) with r.

pjmlp

2 months ago

1 reply

Real life example, in Android 7 Google re-introduced an interpreter for DEX bytecodes, manually written in Assembly, throwing away the old one that existed until Android 5, written in C.

sylware

2 months ago

1 reply

Well, if true, those people at gogol did the right thing... if all the others could behave the same way...

pjmlp

2 months ago

1 reply

If true? I usually only comment stuff I can post profs on, so is the Internet nature.

> Interpreter performance significantly improved in the Android 7.0 release with the introduction of "mterp" - an interpreter featuring a core fetch/decode/interpret mechanism written in assembly language

From https://source.android.com/docs/core/runtime/improvements

sylware

2 months ago

1 reply

Still actually a thing on android 16, the latest?

pjmlp

2 months ago

1 reply

https://android.googlesource.com/platform/art/+/c1d6b34/runt...

sylware

2 months ago

Yep, there are still people trying to do the right(tm) thing at gogol.

Maybe there were fired?

trollbridge

2 months ago

1 reply

If you're going to make this argument, I'd consider arguing for Zig a little more substantiated; Rust is cross-platform and x86_64 assembly certainly isn't. Most of my day to day computing is done on ARM platforms as are some of my server resources, and I expect that to expand as time goes on.

sylware

2 months ago

1 reply

Your use case is totally out of scope of my project.

Look down below in the comments where I do reference one written in plain and simple C from the creator of ffmpeg and q-emu.

p_j_w

2 months ago

1 reply

> Your use case is totally out of scope of my project.

You have a completely different use case from the OP, but still had no problem telling them that they were doing it wrong, so it’s pretty funny to see you use this line of defense for your choice.

sylware

2 months ago

1 reply

For a legacy ISA like arm, the less worse compromise would be to use the project from the creator of ffmpeg and q-emu would did already wrote it, but in plain and simple C, namely compiling with most, if not all, "ok" C compilers out-there...

trollbridge

2 months ago

I can’t think of a good reason to use C that isn’t also a good reason to use Zig unless you’re targeting obsolete platforms.

timhh

2 months ago

1 reply

I think assembly is probably a pretty bad choice for a RISC-V emulator. It's not portable, a nightmare to maintain, and not even as fast as binary translation.

What kind of performance do you get?

I guess it would be a great way to learn about the differences between x86 and RISC-V though!

sylware

2 months ago

2 replies

I am not looking for performance (it will run natively on rv64 hardware), I am looking to protect the code against computer language syntax/compiler planned obsolescence (usually cycles of 5-10 years).

Have a look a little bit below in the comments where I give a reference to another one, written by the creator of ffmpeg and q-emu.

jcranmer

2 months ago

2 replies

Honestly, assembly language bitrots far faster than other programming languages. In my lifetime, the only thing that really comes close to qualifying as "compiler language syntax/compiler planned obsolescence" is Python 2 to Python 3. In contrast, with x86 alone, there's three separate generations of assembly language to go through in that same timeframe.

sylware

2 months ago

1 reply

Cycles of the planned obsolescence here is long, 5/10 years.

Shorter for c++ and similar, than C.

jcranmer

2 months ago

1 reply

Look, I work on compilers, and I have no idea what you're even trying to refer with "planned obsolescence" here.

And 5/10 years is a very short time in compiler development planning! Prototypeless-functions in C were deprecated for longer than some committee members were alive before they removed from the standard, and they will remain supported in C compilers probably for longer than I myself will be alive.

sylware

2 months ago

feature creeps.

timhh

2 months ago

Python still doesn't do very well on backwards compatibility & bitrot, even after 3. They're constantly deprecating and removing things.

https://docs.python.org/3.12/deprecations/index.html

This obviously improves Python, but also it means you absolutely shouldn't choose Python if you are looking for a low maintenance language.

timhh

2 months ago

I'm pretty sure Rust is going to outlast x86! C definitely will.

timhh

2 months ago

3 replies

> RISC-V has either little-endian or big-endian byte order.

Yeah though for instruction fetch it's always little endian. I honestly think they should remove support for big endian from the spec. As far as I know nobody has implemented it, the justification in the ISA manual is very dubious, and it adds unneeded complexity to the spec and to reference models.

Plus it's embarrassing (see Linus's rant which I fully agree with).

sedatk

2 months ago

2 replies

The rant for the curious: https://www.phoronix.com/news/Torvalds-No-RISC-V-BE

nostrademons

2 months ago

1 reply

And the RISC V blog post:

https://riscv.org/blog/to-boldly-big-endian-where-no-one-has...

And CodeThink blog post:

https://www.codethink.co.uk/articles/risc-v-big-endian-suppo...

timhh

2 months ago

> So when a little-endian system needs to inspect or modify a network packet, it has to swap the big-endian values to little-endian and back, a process that can take as many as 10-20 instructions on a RISC-V target which doesn’t implement the Zbb extension.

See this justification doesn't make any sense to me. The motivation is that it makes high performance network routing faster, but only in situations where a) you don't implement Zbb (which is a real no-brainer extension to implement), and b) you don't do the packet processing in hardware.

I'm happy to be proven wrong but that sounds like an illogical design space. If you're willing to design a custom chip that supports big endian for your network appliance (because none of the COTS chips do) then why would you not be willing to add a custom peripheral or even custom instructions for packet processing?

Half the point of RISC-V is that it's customisable for niche applications, yet this one niche application somehow was allowed and now it forces all spec writers and reference model authors to think about how things will work with big endian. And it uses up 3 precious bits in mstatus.

I guess it maybe is too big of a breaking change to say "actually no" even if nobody has ever actually manufactured a big endian RISC-V chip, so I'm not super seriously suggesting it is removed.

Perhaps we can all take a solemn vow to never implement it and then it will be de facto removed.

WD-42

2 months ago

He’s still got it!

6SixTy

2 months ago

1 reply

The rundown on this is that CodeThink added Big Endian RISC-V because of a half-baked optimized networking scenario where somehow the harts (RISC-V speak for a cpu core) don't have Zbb byte manipulation instructions. Linus shuts down efforts made in mainline Kernel (!!) because these issues are extremely flimsy at best and don't have technical merit for complicating the kernel's RISC-V code and already extreme RISC-V fragmentation.

I've looked at more reasons that CodeThink came up with for Big Endian RISC-V, and trust me, that's the best that they have to present.

crote

2 months ago

> somehow the harts don't have Zbb byte manipulation instructions

More specifically, it relies on a hypothetical scenario where building a big-endian / bi-endian core from scratch would be easier than adding the Zbb extension to a little-endian core.

hajile

2 months ago

4 replies

Linus' rant was WAY off the mark.

Did he make the same rant about ARMv8 which can (if implemented) even switch endianness on the fly? What about POWER, SPARC, MIPS, Alpha, etc which all support big-endian?

Once you leave x86-land, the ISA including optional big-endian is the rule rather than the exception.

pm215

2 months ago

2 replies

The problem is that it's relatively easy to add "supports both endiannesses" in hardware and architecture but the ongoing effect on the software stack is massive. You need a separate toolchain for it; you need support in the kernel for it; you need distros to build all their stuff two different ways; everybody has to add a load of extra test cases and setups. That's a lot of ongoing maintenance work for a very niche use case, and the other problem is that typically almost nobody actually uses the nonstandard endianness config and so it's very prone to bitrotting, because nobody has the hardware to run it.

Architectures with only one supported endianness are less painful. "Supports both and both are widely used" would also be OK (I think mips was here for a while?) but I think that has a tendency to collapse into "one is popular and the other is niche" over time.

Relatedly, "x32" style "32 bit pointers on a 64 bit architecture" ABIs are not difficult to define but they also add a lot of extra complexity in the software stack for something niche. And they demonstrate how hard it is to get rid of something once it's nominally supported: x32 is still in Linux because last time they tried to dump it a handful of people said they still used it. Luckily the Arm ILP32 handling never got accepted upstream in the first place, or it would probably also still be there sucking up maintenance effort for almost no users.

zozbot234

2 months ago

1 reply

> Relatedly, "x32" style "32 bit pointers on a 64 bit architecture" ABIs are not difficult to define but they also add a lot of extra complexity in the software stack for something niche.

I'm not sure that there's much undue complexity, at least on the kernel side. You just need to ensure that the process running with 32-bit pointers can avoid having to deal with addresses outside the bottom 32-bit address space. That looks potentially doable. You need to do this anyway for other restricted virtual address spaces that arise as a result of memory paging schemes, such as 48-bit on new x86-64 hardware where software may be playing tricks with pointer values and thus be unable to support virtual addresses outside the bottom 48-bit range.

pm215

2 months ago

1 reply

In practice it seems like it's not as simple as that; see this lkml post from a few years back pointing out some of the weird x32 specific syscall stuff they ended up with: https://lkml.org/lkml/2018/12/10/1145

But my main point is that the complexity is not in the one-off "here's a patch to the kernel/compiler to add this", but in the way you now have an entire extra config that needs to be maintained and tested all the way through the software stack by the kernel, toolchain, distros and potentially random other software with inline asm or target specific ifdefs. That's ongoing work for decades for many groups of people.

zozbot234

2 months ago

1 reply

Then the real question is whether this bespoke syscall mechanism will be needed going forward, especially as things like time_t adopt 64-bit values anyway. Can't we just define a new "almost 32-bit" ABI that just has 64-bit clean struct layouts throughout for all communication with the kernel (and potentially with system-wide daemons, writing out binary data, etc. so there's no real gratuitous breakage there, either), but sticks with 32-bit pointers at a systems level otherwise? Wouldn't this still be a massive performance gain for most code?

pm215

2 months ago

1 reply

You could definitely do better than x32 did (IIRC it is a bit of an outlier even among "32-bit compat ABI" setups). But even if the kernel changes were done more cleanly that still leaves the whole software stack with the ongoing maintenance burden. The fact that approximately nobody has taken up x32 suggests that the performance gain is not worth it in practice for most people and codebases.

Defining yet another 32-bit-on-64-bit x86 ABI would be even worse, because now everybody would have to support x32 for the niche users who are still using that, plus your new 32-bit ABI as well.

zozbot234

2 months ago

But that maintenance burden has been paid off for things like 64-bit time_t on 32-bit ABI's. One couod argue that this changes the calculus of whether it's worth it to deprecate the old x32 (as has been proposed already) but also propose more general "ABI-like" ways of letting a process only deal with a limited range of virtual address space, be that 32-bit, 48-bit or whatever - which is, arguably, where most of the gain in "x32" is.

hajile

2 months ago

Major difference between "we won't support big endian" and calling RISC-V out as stupid for adding optional support.

The academic argument Linus himself made is alone reason enough that big-endian SHOULD be included in the ISA. When you are trying to grasp the fundamentals in class, adding little endian's "partially backward, but partially forward" increases complexity and mistakes without meaningfully increasing knowledge of the course fundamentals.

No zbb support is also a valid use. Very small implementations may want to avoid adding zbb, but still maximize performance. These implementations almost certainly won't be large enough to run Linux and wouldn't be Linus' problem anyway.

While I've found myself almost always agreeing with Linus (even on most of his notably controversial rants), he's simply not correct about this one and has no reason to go past the polite, but firm "Linux has no plans to support a second endianness on RISC-V".

timhh

2 months ago

1 reply

Those architectures are all much older than RISC-V from a time when it wasn't quite so blindingly obvious that Little Endian had won the debate.

hajile

2 months ago

The fact that essentially every major ISA except x86 has support for big-endian including the two latest big-name entries (ARMv8 and RISC-V) contradicts this assertion.

Most importantly, big-endian numbers have overwhelmingly won the human factor. If I write 4567, you don't interpret it as 7 thousand 6 hundred and fifty-four. Even an "inverted big-endian" (writing the entire sequence backward rather than partially forward and partially backward like little endian) makes more sense and would be much more at home with right-to-left readers more than little endian too.

pezezin

2 months ago

1 reply

SPARC, MIPS, Alpha and others are irrelevant nowadays.

Regarding POWER, the few distros that support it, only support the little-endian variant. Ditto for ARM.

hajile

2 months ago

1 reply

There are almost certainly more MIPS chips in the world than x86.

Saying "you can have big-endian in your ISA, but we won't be supporting it" is very different from railing on about the ISA having big-endian at all.

pezezin

2 months ago

Is that so? I know that MIPS used to be very popular for embedded devices, especially routers and switches, but it seems that everything has moved to ARM.

6SixTy

2 months ago

If you read the LKML thread with Linus' rant, you would know that big endian ARM* is a problematic part of the Linux kernel that the maintainers are removing due to lack of testing let alone receiving bug fixes. It's also implied that big endian causes problems elsewhere beyond ARM, but no examples are given.

Later on in the thread, Linus states that he has no problem with historically Big Endian architectures, it's just that nothing new should be added for absolutely no reason.

*ARMv3+ is bi endian, but only for data, all instructions are little endian.

charlycst

2 months ago

Another way to build emulators that I am very interested in is to start from a spec, and automatically translate it into executable code. High-fidelity emulators have a lot of potential for testing and verification.

The best example I know of is Sail [1]. Among other, they have RISC-V spec [2] and a bunch of compiler back-ends. They can already generate C or OCaml emulators, and I have been working on a new Rust back-end recently.

[1]: https://github.com/rems-project/sail [2]: https://github.com/riscv/sail-riscv

8 more comments available on Hacker News

View full discussion on Hacker News

ID: 45709819Type: storyLast synced: 11/20/2025, 1:48:02 PM

Want the full context?

Jump to the original sources

Read the primary article or dive into the live Hacker News thread when you're ready.

Open link View on HN