Slicing managed arrays using Span<T>

(Updated to .NET Core 2.1 official release version)

Blue-tilled swimming pool by aalmada

My previous articles on Span<T> explained how it can be used for handling all types of memory allocations and for p/invoking native code. I hope they are helpful but I understand these are not typical use cases. Span<T> usage will be much more common with managed arrays. The .NET type system contains one that we deal with every day, it’s the System.String (string for short in C#).

System.String

A string in .NET is nothing more than an immutable array of System.Char (char for short in C#). Immutable means that, once created, its content cannot be modified. You can hold a reference to it and be sure that the string always stays the same. Otherwise, you’d have to clone it. On the down side, it means that many of operations on it require memory allocations and copies.

That’s easy to understand for a Concat() but “hard to swallow” for a Substring() as the characters are already lined-up in memory and there is no intention to change them.

Substring() creates a new string, allocating the necessary memory and copying each character into it.

This example calls Substring() only once but imagine the performance issue that this is for a common scenario like parsing text files (CSV, XML, JSON, YAML and so on) where it’s called thousands of times.

A string can easily be converted into a ReadOnlySpan<char> using the AsReadOnlySpan() extension method. The resulting span is read-only, preserving the immutability of the string.

You can then use the Slice() to get a reference to a portion of the string without copying it.

Slice() is a method that returns another Span<T> for the same buffer but with different boundaries.

AsReadOnlySpan() has overloads that allow the conversion to span and get a slice of it, in one single step:

Please note that you’ll have major gains if you never convert the span back to a string as this will result in a memory allocations and a copy. Exactly what we are trying to avoid.

For this reason, the .NET framework developers went through the Herculean task of adding overloads to all methods that accept string parameters, to now accept Span<char> or ReadOnlySpan<char>. There is also an implicit converter from string to ReadOnlySpan<char> keeping the code simple.

Unfortunately Console.WriteLine() is still missing this treatment so, I have to call ToArray() to be able to use it. Lets hope this is fixed in a future release.

Benchmarking

Using BenchmarkDotNet and a bit of code, it’s very easy to spot the difference between the use of Substring() and Slice().

To get a sub-string is much slower than a slice. While the performance of slices is independent of length, sub-strings are strongly affected by it:

  • 16x slower for 10 characters
  • 38x slower for 100 characters
  • 253x slower for 1000 characters

Slices use no heap allocations so there is also no time wasted in garbage collection.

Conclusion

Use Span<char> or ReadOnlySpan<char> for slicing strings. Use them also as argument types so that no conversion back to string is required.

You should extend these rules to any managed array type.

Principal Engineer @ Farfetch - Future Retail Lab https://about.me/antao.almada

Principal Engineer @ Farfetch - Future Retail Lab https://about.me/antao.almada