(Updated to .NET Core 2.1 official release version)
NOTE: I highly suggest to also check how to use System.Threading.Channels. It’s a feature introduced after the article was published and that allows a better implementation of the patterns shown in the examples of this article.
Memory<T> are new features in .NET Core 2.1 that allow strongly-typed management of contiguous memory, independently of how it was allocated. These allow easier to maintain code and greatly improves the performance of applications by reducing the number of required memory allocations and copies.
For reasons that others can explain much better,
Span<T> can only exist in the stack (as opposed to existing in the heap). This means it can’t be a field in a class or in any “box-able” struct (convertible to a reference type).
Span<T> takes advantage of
ref struct, a new feature in C# 7.2, making the compiler enforce this rule.
Next, I’m going to create a few usage scenarios so that you can better understand when and how to use each of these new features.
Lets imagine we a have a service that returns a collection of objects of type
Foo. The collection comes from some remote location so it comes through a stream of bytes. This means that we need to get a chunk of bytes into a local buffer, convert these into our objects and repeat this process until the end of the collection.
The following example iterates through all the collection and calculates the sum of the values in the
Integer field of each item.
Notice on line 5 that the buffer is allocated as an
Array<Foo> but stored as a local variable of type
Span<Foo>. The framework includes implicit operators that allow this conversion.
Local variables in regular methods (not using async, yield or lambdas) are allocated in the stack. This means there is no problem using
Span<T>. The variable will persistent as long as its own scope, in this case, the function itself.
If you check the method signature of
Stream.Read(), you’ll notice that it accepts an argument of type
Span<byte>. This usually means that we would need to copy memory. Not with
Span<T>. As long as T is a value-type, which is the case, you can use the method
MemoryMarshal.Cast<TFrom, TTo>() that masks the buffer as another type without requiring any copy. Pass the
Stream.Read() (line 8 and 15) but read its contents as a collection of
Foo using the
Span<Foo> (line 13). You can access the
Span<T> using the square-bracket operator or using a
Because, at the end of the enumeration, the number of items read can be less than the size of the buffer, we iterate on a slice of the original buffer.
Slice() is a method that returns another
Span<T> for the same buffer but with different boundaries.
Note that, besides the
Stream.Read(), there are no memory copies. Just maskings of the same buffer. This results in major performance improvements relative to the memory managers we had before. All this with type safety and easy to maintain code.
Notice from the previous example that the
Span<T> has to reside in the stack but not its content. The array is allocated in the heap.
Because, in this case, the buffer doesn’t have to outlive the function and is relatively small, we can allocate the buffer in the stack using the
Besides the buffer allocation, all the code remains unchanged. This is one more advantage of using
Span<T>. It abstracts how the contiguous memory was allocated.
These previous examples work fine but what if you want to perform multiple operations over the collection? You’d have to replicate this code in many other places, creating a maintenance nightmare. What if we could use a
foreach loop? We just need to create a
struct, which is allocated in the stack, that implements
Unfortunately any value-type that implements an interface is “box-able”, which means, it can be converted into a reference-type.
foreach doesn’t really require the implementation of interfaces. It only requires the implementation of a method
GetEnumerator() that returns an instance of an object that implements a read-only property
Current and a method
MoveNext(). This is actually how the enumeration of
Span<T> is implemented. We can do the same for our collection.
Notice on lines 17 and 18 that the spans are not local variables but fields of the
Enumerator struct. To make sure that this object is only created in the stack, notice on line 12 that it is declared as a
ref struct. Without this, the compiler would show an error.
The creation of the spans is now in the Enumerator constructor but very similar to the first example (to my best knowledge, it’s not possible to use stackalloc in this case). The enumeration is now split into
foreach calls the method
MoveNext() to step to the next item of the collections and then calls
Current to get it.
Current returns a read-only reference of type
Foo. This means it accesses the item without copying it. This is also a feature of C# 7.2 that can be used to improve considerably the performance of applications.
The previous code allows the use of
foreach but if you also want to allow the use of LINQ, there’s no escape from implementing interfaces.
Notice on line 19 that buffer is still stored as a field but now of type
Memory<T> is a factory of
Span<T> that can reside in the heap. It has a
Span property that creates a new instance of
Span<T> valid in the scope that is called. It is used on lines 33 and 45.
Enumerator cannot be a
ref struct now as it implements an interface. I’m leaving it as a
struct as it performs better in this case. Calling an interface method has a performance penalty because it’s a virtual call. Structs don’t allow inheritance so .NET is able to optimize these calls making them slightly faster.
Current now return
Foo instead of a reference, which means there is a memory copy. You can add an overload that returns a reference but, any call using
IEnumerator<T>, will explicitly use the other.
How do these examples perform? BenchmarkDotNet makes it very simple to compare the performance of all these scenarios.
The code for these benchmarks can be found at https://github.com/aalmada/SpanSample/blob/master/SpanSample/EnumerationBenchmarks.cs
For the benchmarks, I extended the first example into 3 options of iteration on the buffer
Span<>: using a
GetEnumerator() and using a
for loop with indexer operator. Interesting to see that the
foreach has the same performance has the
for but using the
GetEnumerator() is twice as slow.
for loop with the buffer allocated using
stackalloc is the most efficient. It takes the least time (1.6 ms) and allocates no memory on the heap. It’s set as the baseline benchmark for easier comparison.
The use of a
ref struct enumerator is slower with 4.2 ms (2.67 times slower than raw
stackalloc enumeration) but with reusable and easier to maintain code. This is the penalty for splitting the enumeration logic into two functions.
The use of
IEnumerator<Foo> makes it substantially slower with 24.0 ms (15.11 times slower that the raw
stackalloc enumeration). This case has the same penalty as the one before, plus, the use of interfaces, not returning the value by reference and not having a single
Span<> for the whole enumeration.
Although not shown here, any of these solution is much faster than without the use of
Span<T>. These values show that you should consider several enumeration scenarios in your applications, depending on if you favor flexibility or performance. If you are an API developer, you should expose all these so that the user can make its own choice.
Memory<T> are new features that can drastically reduce the memory copies in .NET applications, allowing performance improvements without sacrificing type safety and code readability.
You can download the source code for this article and run the benchmarks on your own system.
I plan to write a few more articles on this subject but you can find a lot more info on these links: