(Updated to .NET Core 2.1 official release version)
Through out the many years I’ve been using .NET, I’ve had to use many functionalities that were not yet available in .NET (computer vision, 3D rendering, physics, augmented-reality, and many more). One of the great features of .NET is that it supports calls to native code. This is called Platform Invocation Service, or P/Invoke. This article is about my investigation on how
Span<T> can be used for P/Invoke calls.
For the purpose of this investigation, I created a simple native library that exposes a method that calculates the total sum of a given array of integers.
The method accepts two arguments (a pointer to the first position of the array and the number of items in the array) and returns the sum. The array is allocated by the caller, that is also responsible for releasing it.
The following code makes if possible for managed code to call the native method defined above.
This P/Invoke implementation has multiple details that are beyond the scope of this article but the most important thing here is that managed code can call the static method by using
Before Span<T> Era
.NET has three different ways to allocate contiguous memory:
new— Allocated on the heap and managed by the GC.
stackalloc— Allocated on the stack and released automatically when exiting scope.
Marshal.AllocHGlobal()— Allocated on the heap. Caller is responsible for releasing it by calling
All these options can be used to allocate the required buffer to be passed to the native method:
Notice how each memory allocation is handled in completely different ways:
- The managed array needs to be
fixedso that the GC doesn’t move it around while the sum is calculated. The memory is automatically released once no references are found by the GC.
- The stack allocation is the simplest one as it stays in its position until automatically released when the code exits the method.
- Memory allocated with
Marshal.AllocHGlobal()is not managed by the GC so it’s not moved around but it has to be explicitly released. Although the GC doesn’t managed this memory, it’s good policy to inform it of the total heap memory used by the application. This can be done using
NOTE: The example doesn’t include code to fill the buffer with values so the result is always 0. I’m focusing just on memory management and p/invoke call issues.
Unsafe and MemoryMarshal classes
.NET has always supported passing value-type arguments by reference. Recently it was added support for return-by-reference and read-only-references.
System.Runtime.CompilerServices.Unsafe class includes many useful static methods, including the following that allow handling pointers as references:
ref T AsRef<T>(void* source);ref T AsRef<T>(in T source)void* AsPointer<T>(ref T value);ref T Add<T>(ref T source, int elementOffset);ref T Subtract<T>(ref T source, int elementOffset);
System.Runtime.InteropServices.MemoryMarshal class includes a couple of
GetReference() static methods that return a reference to the first position of a
Span<T> or a
ref T GetReference<T>(Span<T> span);ref T GetReference<T>(ReadOnlySpan<T> span);
To use these, we can refactor the first argument in the P/Invoke signature, from a
int* pointer to a
int reference using the
ref keyword. But, because the buffer content is not changed in the call and I’m using C# 7.2, I can set it to a readonly-reference instead, using the
Notice that the
unsafe keyword is no longer required.
Using the reference methods and the new P/Invoke signature, we can refactor the first code example to the following:
It compiles fine with the following notes:
- For the managed array, although we get a reference to the first position, it still has to be
fixedwhich returns a pointer, forcing the use of the
- For the unmanaged allocation, there is no
ReadOnlySpan<T>constructor that takes an
IntPtrargument so it has to be converted to a pointer, also forcing the use of the
- The use of the
inkeyword is optional in a method call.
One of the advantages of using
Span<T> is that methods can be abstracted from how the memory was allocated. We can move the call to the P/Invoke into a static
Sum method with a
Notice that the method now has to fix the buffer for all the cases as it doesn’t know how the memory was allocated. We’ll check later if this affects the performance in any way.
NOTE: The method is marked as
unsafeonly because of the use of a pointer inside. The signature of the method is not
unsafeon itself. The keyword can be moved inside if you prefer.
This example is very simple so we gain very little with the abstraction but developers of public APIs can now let the user allocate the memory and pass it in, in a clean, strongly-typed way without exposing unsafe code.
The code for the unmanaged array handling is still somewhat complex as it has to release resources in a robust way. We should hide it in some
.NET Core 2.1 includes a
System.Buffers.MemoryManager<T> class that seem to be exactly for this purpose (was
System.Buffers.OwnedMemory<T> in Preview 1). This is an abstract class so we have to derive our own class, implementing the unmanaged array creation and release.
The dotnet/corefx repository includes a
NativeMemoryManager class that does exactly that. This is an internal class and only implements the case for
MemoryManager<byte> so I cloned it and refactored it to be generic.
We can now refactor our
HGlobal method to the following:
The code is now much cleaner, easier to maintain and read. The need for the
unsafe keyword has also dropped.
I now want to know if these abstraction affect the performance in any way. To evaluate it, I commented out the buffer iteration in the native code so that only the memory management and the
P/Invoke is taken into account.
Using BenchmarkDotNet and some code that reproduces all the scenarios described, for buffers with 100 and 1000 items, I got the following results:
NOTE: Choosing larger buffers would make the stack allocation fail as this type of memory is very limited.
I reordered the result to better understand how the method used, memory type and number of items, influence the performance.
On this first table, each line highlights the difference in performance when using
Span<T> and method with a
Span<T> argument, relative to when not using them:
- The use of
Span<T>makes almost no difference for the managed array.
- There’s a big difference when using
Span<T>with stack allocations.
- “Fixing” the buffer even when not required, doesn’t seem to affect performance.
- The use of the
NativeMemoryManager<T>introduces some overhead (~50 percentage points).
On this second table, each lines highlights the difference in performance when increasing from 100 to 1000 items:
- The factor of time-elapsed increase is constant for all scenarios except for the stack allocation where there is a big difference between using
Span<T>and not using it.
- Interesting to see that the factor is 1 (100%) for all unmanaged allocation scenarios.
The use of
Span<T> for P/Invoke calls allows cleaner, strongly-typed, reusable code.
The performance is not affect, except for stack allocations. This difference was reduced from Preview 1 to Preview 2 but it’s still significant.
This is not the final release for .NET Core 2.1 so things may still improve until then.