Safe Zero-Copy Operations in C#
Mood
supportive
Sentiment
positive
Category
other
Key topics
The article discusses safe zero-copy operations in C# using Span<T> and related features, with commenters sharing their experiences and insights on using these features to improve performance.
Snapshot generated from the HN discussion
Discussion Activity
Very active discussionFirst comment
3h
Peak period
54
Day 1
Avg / period
21.3
Based on 64 loaded comments
Key moments
- 01Story posted
Sep 29, 2025 at 7:12 PM EDT
about 2 months ago
Step 01 - 02First comment
Sep 29, 2025 at 10:01 PM EDT
3h after posting
Step 02 - 03Peak activity
54 comments in Day 1
Hottest window of the conversation
Step 03 - 04Latest activity
Oct 2, 2025 at 2:02 PM EDT
about 2 months ago
Step 04
Generating AI Summary...
Analyzing up to 500 comments to identify key contributors and discussion patterns
I've been using Span<T> very aggressively and it makes a massive difference in cases where you need logical views into the same physical memory. All of my code has been rewritten to operate in terms of spans instead of arrays where possible.
It can be easy to overlook ToArray() or (more likely) code that implies its use in a large codebase. Even small, occasional allocations are all it takes to move your working set out of the happy place and get the GC cranking. The difference in performance can be unreasonable in some cases.
You can even do things like:
var arena = stackalloc byte[1024];
var segment0 = arena.Slice(10);
var segment1 = arena.Slice(10, 200);
...
The above will incur no GC pressure/activity at all. Everything happens on the stack.Somebody decided that "safety" included rejecting obvious concepts like this.
I am currently looking into making use of "public readonly record struct" for the models that I create for my views. Of course, I need to performance profile the code versus using standard classes with readonly properties where appropriate, but since most of my code is short-lived for pulling from the CMS to hydrate classes for the views, I'm not sure how much of a benefit I will get. Luckily I'm in a position to work on squeezing as much performance as possible between major projects.
I'm curious if anyone has found any serious performance benefit from using a Span<T> or a "public readonly record struct" in a .NET CMS, where the pages are usually fire and forget? I have spent years (since 2013) trying to squeeze every ounce of performance from the code, as I work with quite a few smaller businesses, and even the rest of my team are starting to look into Wix or Squarespace, since it doesn't require a "me" to be involved to get a site up and running.
To my credit and/or surprise, I haven't dealt with a breach to my knowledge, and I read logs and am constantly reviewing code as it is my passion (at least working within the confines of the Umbraco CMS, although it isn't my only place of knowledge). I used to work with PHP and CodeIgniter pre-2013 (then Kohana a bit while making the jump from PHP to .NET). I enjoy C#, and feel like I am able to gain quite a bit of performance from it, but if anyone has any ideas for me on how to create even more value from this, I would be extremely interested.
In my experience, the biggest wins by far were achieved by using the network tab of the browser F12 tools. The next biggest was Azure Application Insights profiler running in production. Look at the top ten most expensive database queries and tune them to death.
The use of Span<T> and the like is much more important for the authors of shared libraries more than "end users" writing a web app. Speaking of which, you can increase your usage of it by simply updating your NuGet package versions, .NET framework version to 9 or 10, etc... This will provide thousands of such micro optimisations for very little effort!
You really need to measure before going to low level optimizations like this. Odds are in this case that the overhead is in the framework/CMS, and you gain the most by understanding how it works and how to use it better.
Span<T> is really more of an optimization you should pay attention to when you write lower level library code.
MySQL, C#. I have a rather nasty query, two of the fields in it are actually arrays and have their own tables, in most cases all the children must be read. Strange, my code takes a lot longer to execute the child-reading portion than the console does. Profiler time....the hot spot is the routine (in the library, not my code) that returns the value of the named field! Rewrote the big reads to translate the column names to indexes, then use those to read the fields. I've forgotten just how big the speedup was but that lookup was using the majority of the time of the whole routine.
The two .NET CMS I have experience with, Sitecore and Optimizely, something like Span would hardly bring any improvement, rather check the way their ORM is being used, do some direct SQL, cache some renderings in a different way, cross check if the CMS APIs are being correctly used.
Most of the benefits of Span<T> you gain by keeping up with .NET upgrades. Span<T> is a low level optimization that benefits things like ASP.NET internals far more than most user code. Each version of .NET since Span<T> was added has improved the use of it. Additionally in C#, the compiler prefers Span<T> overloads when they make sense so just rebuilding for the most recent .NET opts you in to the benefits. Whether or not those are "serious" benefits is a matter of taste and also a reminder that your code probably doesn't spend all of its time doing low level things. (Your database query time, for instance is generally going to have a bigger impact.)
I'd add a big word of caution for "public readonly record struct". I've seen a few codebases start using that wordy version "by default" and then build themselves into a far bigger terrible performance pit than they expected. There's a lot of good reasons that "record" defaults to class and has you opt in to struct behavior. It's a lot easier to reason about classes. It's a lot easier to understand the performance trade-offs on the side of classes. The GC is your friend, not your enemy, even and sometimes especially for short-lived data. (Gen0 collections are often very fast. The "nursery" was designed for fire-and-forget data churn. It's the bread-and-butter job of a generational garbage collector to separate the churn from the stable, handle the churn quickly and keep the stable stabler.)
Structs are pass-by-value, which means as soon as they exit the "lucky path" of staying in the same stack they are copied from place to place. If your models include a lot of other structs, you start copying memory a lot more regularly. If your structs grow too large for certain stackframe quotas they get boxed onto the GC heap anyway not saving you heap allocations.
Classes are pass-by-reference. If you are using "readonly" as a part of your structs to build immutable data models, all the copies add up from every immutable data change creating a new struct. Whereas "regular' immutable records (classes) can share structure between each other by reference (the immutable parts that don't change can use the same references and thus share the same memory).
If your models are more than a couple integers and have any sort of nesting or complex relationships, "public readonly record struct" can be a premature optimization that actually ends up costing you performance. Not every bit of data can be moved to the stack and not every bit of data should be moved to the stack. Keep in mind there are trade-offs and a healthy performing .NET application generally uses a smart mixture of stack and GC, because they are both important tools in the toolbelt. Like I said, there are reasons that "public record" defaults to "class" and "public readonly record struct" is the wordy opt-in and it is useful to keep them in mind.
This response is not directly answering that "in a .NET CMS" part of your question. I'm just trying to say how to think about when to worry about optimizations.
These sorts of micro optimizations are best considered when your are trying to solve a particular performance problem, particularly when you are dealing with a site that is not getting a lot of hits. I've experienced using small business ecommerce websites where each page load takes 5 seconds and given up trying to buy something. In that case profiling the site and figuring out the problem is very worth while.
When you have a site getting a lot of hits, these sorts of performance optimizations can help you save cost. If your service takes 100 servers to run and you can find some performance tweaks to get down to 75 server, that may be worth the engineering effort.
My recommendation is to use a profiler of some type. Either on your application in aggregate to identify hot spots in in search of the source of a particular performance problem. Once you identify a hot spot, construct a micro benchmark of the problem in BenchmarkDotNet and try to use tools like Span<T> to fix the problem.
Span<T> , stackalloc and value-structs will matter more when writing heavy data/number crunching scenarios like imageprocessing, games, "AI"/vector queries or things like _implementing_ database engines (see yesterdays discussion on the guys announcing they're using C++ where Rust, Go, Erlang, Java and C# was discussed for comparisons https://news.ycombinator.com/item?id=45389744 ).
I'm often spending my days on writing applications that are reminiscent of CMS workloads and while I sometimes do structs, I've not really bought out my lowlevel optimization skills more than a few times in the past 6 years, 95% of the time it's bad usage of DB's.
I understand it's not in the same realm as Rust, but how comparable is this to some of the power that Rust gives you?
For example, Rust allows the equivalent of storing `Span<T>` (called slice in Rust) everywhere (including on the heap, although this is rare).
C# is GC'd, the system will protect memory in use and while it can allow things like Span<>, Memory<>,etc there are some constraints due to sloppy lifetime reasoning but in general "easy" usage since you do not need to care about lifetime.
Rust has lifetime semantics built down to the core of the language, the compiler will know much better about what's safe but also forbid you early from things that are safe but not provable (they're improving the checkers based on experience though), due to it's knowledge it will be better at handling allocations more exactly.
Personally as someone with an assembly, C,C++,etc background, while I see the allure of Rust and do see it as a plus if less experienced devs that really need perf go for Rust for critical components, and thinking I'm going to try to do some project in Rust...
I've so far not seen a project where the slight performance improvement of Rust will outweight the "productivity" gain from being able to do a bit "sloppy" connections that C# allows for.
I'm not too phased at going for unsafe in general, but prefer to keep it within focused submodules that really need it though and doesn't leak unsafe details.
“A comparison of Rust’s borrow checker to the one in C#”
I do check the standard library for things that sound like they should be there as their common enough. My experience tells me this approach is not as common as you would expect, same for C# in msft, I don’t know how many people using framework knew about array segment.
Funny point: the verbosity of this method and SRCS.Unsafe ones make them look slower vs pointers at subconscious level for me, but they are as fast if not faster to juggle with knifes in C#.
The `fixed` keyword is mostly for fast transient pinning of data. Raw pointers from `fixed` remain handy in some cases, e.g. for alignment when working with AVX, but even this can be done with `ref`s, which can reference an already pinned array from Pinned Object Heap or native memory. Most APIs accept `ref`s and GC continues tracking underlying objects.
See the subtle difference here for common misuse of fixed to get array data pointer: https://sharplab.io/#v2:C4LghgzgtgPgAgJgIwFgBQcDMACR2DC2A3ut...
Spans are great, but sometimes raw `ref`s are a better fit for a task, to get the last bits of performance.
C# _can_ do this! But I face many abstractions: special perf APIs, C#, IL, asm. Outcomes will vary with language version, runtime version, platform, IL2CPP/Burst/Mono/dotnet. But C/C++ has one layer of abstraction (the compiler), and it's locked in once I compile it.
I want to do the thing as exactly and consistently as possible in the simplest way possible!
A build environment that compiles .cpp alongside .cs (no automatic bindings, just compilation) would be so nice for this.
----
Example of what I mean regarding abstractions:
void addBatch(int *a, int *b, int count)
{
for(int i=0; i<count; i++)
a[i] += b[i];
}
versus: [MethodImpl(MethodImplOptions.AggressiveOptimization)]
public static void AddBatch(int[] a, int[] b, int count)
{
ref int ra = ref MemoryMarshal.GetArrayDataReference(a);
ref int rb = ref MemoryMarshal.GetArrayDataReference(b);
for (nint i = 0, n = (nint)count; i < n; i++)
Unsafe.Add(ref ra, i) += Unsafe.Add(ref rb, i);
}
(This is obviously a contrived example, my point is to show the kinds of idioms at play.)I will predict the future: you will pull up the JIT assembly output to make the case that they output similarly performant assembly on your preferred platform, and that you just have to do X to make sure that the code behaves that way.
But my problem is that we are invoking the JIT in the conversation at all. The mental model for any code like this inevitably involves a big complex set of interacting systems and assumptions. Failure to respect them results in crashes or unexpected performance roadblocks.
Will it be as efficient? Probably not; C++ compilers have been in the optimization game for a very long time and have gotten crazy good at it. Not to mention that the language itself is defined in a way that essentially mandates a highly optimizing compiler to get decent performance out of it (and avoid unnecessary creation of temporaries and lots of calls to very tiny functions), which then puts pressure on implementations.
But my point is that this is not a question of language, but implementation. Again, your C example is literally, token-for-token, valid C# as well. And, in general, you can take any random C program and mechanically convert it to C# with the exact same semantics and mostly the same look (with minor variations like the need to use stackalloc for local arrays). So if it's all 1:1, equivalent perf is certainly achievable, and indeed I'd expect a C# AOT compiler to do exactly the same thing as the C compiler here, especially if both are using the same backend; e.g. LLVM.
Now in practice the implementations are what they are, and so even if you are writing C# code "C-style", it's likely to be marginally slower because optimizer is not as good. But the question then becomes whether it's "good enough", and in many cases the answer is "yes" - by writing low-level C# you already get the 90% perf boost compared to high-level code, and rewriting that in C so that it can be compiled with a more optimizing compiler will net you maybe 10% for a lot more effort needed to then integrate the pieces.
internal static class ArrayExtensions
{
[MethodImpl(MethodImplOptions.AggressiveInlining)]
public static ref T RefAtUnsafe<T>(this T[] array, nint index)
{
#if DEBUG
return ref array[index];
#else
Debug.Assert((uint)index < array.Length, "RefAtUnsafe: (uint)index < array.Length");
return ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(array), (nuint)index);
#endif
}
}
then your example turns into: public static void AddBatch(int[] a, int[] b, int count)
{
// Storing a reference is often more expensive that re-taking it in a loop, requires benchmarking
for (nint i = 0; i < (uint)count; i++)
a.RefAtUnsafe(i) += b.RefAtUnsafe(i);
}
The JITted assembly: https://sharplab.io/#v2:EYLgxg9gTgpgtADwGwBYA0AXEBDAzgWwB8AB...I'm convinced C# is so much better for high perf code, because yes it can do everything (including easy-to-use x-arch SIMD), but it lets one not bother about things that do not matter and use safe code. It's so pragmatic.
See also the top comments from a recent thread, I totally agree. https://news.ycombinator.com/item?id=45253012
BTW, do not use [MethodImpl(MethodImplOptions.AggressiveOptimization)], it disables TieredPGO, which is a huge thing for latest .NET versions.
My argument isn't that C# is bad or performance is unachievable. It's that the mental overhead to write something that has consistent, high performance in C/C++ is very low. In other words, for the amount of mental effort, knowledge, and iteration it takes to write something fast + maintainable in C#, would I be better served by just writing it in C/C++?
The linked assembly is almost certainly non-optimal; compare to -O3 of the C version: https://godbolt.org/z/f5qKhrq1G - I automatically get SIMD usage and many other optimizations.
You can certainly make the argument that if X, Y, Z is done, your thing would be fast/faster. But that's exactly my argument. I don't want to do X, Y, Z to get good results if I don't have to (`return ref Unsafe.Add(ref MemoryMarshal.GetArrayDataReference(array), (nuint)index);` and using/not using `[MethodImpl(MethodImplOptions.AggressiveOptimization)]` are non-trivial mental overhead!).
I want to write `foo.bar` and get good, alloc free, optimized results... and more importantly, results that behave the same everywhere I deploy them, not dependent on language version, JIT specifics, etc.
If I was operating in a domain where I could not ever take the C/C++ path, these features of C# are of course very welcome. And in general more power/expressiveness is very good. But circling back, I wonder if my energy is better spent doing a C version than contorting C# to do what I want.
Regarding,
> Spans and slice-like structures in are the future of safe memory operations in modern programming languages.
It is sad how long stuff takes to reach mainstream technology, in Oberon the equivalent declaration to partition would be,
PROCEDURE partition(span: ARRAY OF INTEGER): INTEGER
And if the type is the special case of ARRAY OF BYTE (need to import SYSTEM for that), then any type representation can be mapped into a span of bytes.You will find similar capabilities in Cedar, Modula-2+, Modula-3, among several others.
Modern safe memory langaguage are finally catching up with the 1990's research, pity it always takes this much for adoption of cool ideas.
Having said this, I feel modern .NET has all the features that made me like Modula-3 back in the day, even if some are a bit convoluted like inline arrays in structs.
Then people go around shouting GC's are bad because they used them with a language that made them use some of the worst ones out there.
https://github.com/titzer/virgil/blob/master/doc/tutorial/Ra...
[1] https://learn.microsoft.com/en-us/archive/msdn-magazine/2018...
I had forgotten, or perhaps never realized, that substrings in C# allocate. The solution was Spans.
Notably, it caused me to realize that Go had “spans” designed in from the start.
What Go can't do is create a single-element slice out of a variable or pointer to it. But that just means code duplication if you need to cover both cases, not that it's not expressible at all.
var x int
s := unsafe.Slice(&x, 1)
fmt.Println(&x == &s[0])
// Output: trueThere's no reason for this to be unsafe - you're asking for a 1-element slice, and the compiler knows that the variable is always going to be there as long as the reference exists.
In C#, `Span<T>` has a (safe) constructor from `ref T`.
Slices in Go are not restricted to GC memory. They can also point to stack memory (simply slice a stack-allocated array; though this often fails escape analysis and spills onto the heap anyway), global memory, and non-Go memory.
The three things in a slice are the (arbitrary) pointer, the length, and the capacity: https://go.dev/src/runtime/slice.go
Go's GC recognizes internal pointers, so unlike ArraySegment<T>, there's no requirement to point at the beginning of an allocation, nor any need to store an offset (the pointer is simply advanced instead). Go's GC also recognizes off-heap (foreign) pointers, so the ordinary slice type handles them just fine.
The practical differences between a Go slice []T and a .NET Span<T> are only that:
1. []T has an extra field (capacity), which is only really used by append()
2. []T itself can spill onto the managed heap without issue (*)
Go 1.17 even made it easy to construct slices around off-heap memory with unsafe.Slice: https://pkg.go.dev/unsafe#Slice(*): Span<T> is a "ref struct" which restricts it to the stack (see https://learn.microsoft.com/en-us/dotnet/csharp/language-ref...); whereas, []T can be safely stored anywhere *T can
> Span bounds are guaranteed to be correct at all times and compiler explicitly trusts this (unless constructed with unsafe), because span is larger than a single pointer, its assignment is not atomic, therefore observing a torn span will lead to buffer overrun, heap corruption, etc. when such access is not synchronized, which would make .NET not memory safe
Indeed, the lack of this restriction is actually a (minor) problem in Go. It is possible to have a torn slice, string, or interface (the three fat pointers) by mutably sharing such a variable across goroutines. This is the only (known) source of memory unsafety in otherwise safe Go, but it is a notable hole: https://research.swtch.com/gorace
To work with strings you should use StringBuilder.
Go's strings are also immutable and yet substrings share the same internal memory. Java/JVM also has immutable strings and yet substrings shared the char[] array of the parent string up until Java 7, when they switched to copying instead (for the same reason as .NET): https://mail.openjdk.org/pipermail/core-libs-dev/2012-June/0...
> Strings in C# are inmutable.
Yes, but > To work with strings you should use StringBuilder.
It helps combine strings together. The author needed the opposite - split/slice strings.The transition to using ReadOnlySpan<char> immediately addressed the allocation issue. We were able to represent slices of the incoming buffer without any heap allocations and the parser logic was simplified significantly.
Why don't we have hardware support for this yet? (i.e. CPU instructions that are bounds-aware?)
Edit: Do we?
https://stackoverflow.com/questions/40752436/do-any-cpus-hav...
10 more comments available on Hacker News
Want the full context?
Jump to the original sources
Read the primary article or dive into the live Hacker News thread when you're ready.