Array iteration performance in C#
Implementing the sum of the items in an array is very simple. I think most developers would implement it this way:
There’s actually a simpler alternative in C#:
Another alternative is to use the Sum()
operation provided by LINQ. It can be applied to any enumerable, including arrays.
So, how do these fair in terms of performance?
You can see that, comparing to using a for
loop, using a foreach
loop is almost 1.4 times faster, while using LINQ is almost 7 times slower and allocates on the heap. Why is that?
NOTE: All benchmarks were performed on .NET 6.0 Preview 5. The results may vary on other target frameworks.
LINQ
NOTE: In this article I’m refering to the LINQ implementation that comes with .NET, in the
System.Linq
namespace. There are many alternative implementation that optimize for when the source is an array.
If you check the source code for the Sum()
in System.Linq
, you’ll find that it uses a foreach
loop. So, if using a foreach
is faster than a for
, why is it so slow in this case?
Notice that Sum()
is an extension method for the type IEnumerable<int>
. Unlike the Where()
operation, Sum()
does not have a special case for when the source in an array.
ReSharper and Roslyn analyzers may suggest changing
source.Where(predicate).First()
tosource.First(predicate).
IGNORE THEM whensource
is an array!First(predicate)
doesn’t have a special case for when the source in an array, making it slower.
Sum()
implementation in LINQ expands to something like this:
There are several performance issues with this code.
GetEnumerator()
, for an interface, allocates an enumerator on the heap. It requires a try/finally
to dispose the enumerator, making it impossible to inline this method.
For each item in the collection, it calls the MoveNext()
method and the Current
property, which are virtual calls in the case of interfaces. This makes the enumeration of the array almost 7 times slower.
ReSharper and Roslyn analyzers may suggest changing parameters of type array to an interface, IGNORE THEM! Create an overload if you need to support other enumerable types.
Foreach
How can foreach
be faster than a for
loop?
The foreach
in C# usually generates code very similar to the one expanded for the LINQ case. In case of arrays, it’s handled differently. It expands to code equivalent to the one generated for a for
loop.
So, how can it still be faster than in the first example?
The code generated for the foreach
is slightly different. It uses a temporary variable for the array. It then uses the Length
property and the indexer of this variable. This way the JIT compiler can remove bounds checking throughout the iteration.
NOTE: The temporary variable should only be needed for non-local variables. It’s been pointed out in the comments that
array
is a parameter, so it’s already a local variable. For some reason RyuJIT misses the optimization in this case.
So, how do these fair in terms of performance?
As you can see, the optimized for
version has the same performance as foreach
. Having temporary variables may actually improve performance.
ReSharper and Roslyn analyzers may suggest to inline the temporary variable, IGNORE THEM!
Slice
Sometimes we may want to iterate just a portion of the array. I think most developers would implement the following:
Once again, there’s a simpler alternative:
Notice that the parameter now is of type ReadOnlySpan<T>
. It represents a slice/portion of contiguous memory. This way, a portion of an array can be passed into the method. The foreach
will only iterate through that portion.
So, how do these fair in terms of performance?
Both implementations have similar performance but around 1.33 time slower than when iterating the full array. If you know you’re iterating the full array, prefer the first implementation.
In this case using temporary variables doesn’t make much difference. The compiler cannot infer if offset
and count
are always within bounds.
SIMD
All the iteration above were sequencial, meaning that the items in the array were handled one element at the time.
CPUs may allow the simultaneous handling of multiple items using SIMD. This is available in .NET through the System.Numerics
namespace, or the more recent System.Runtime.Intrinsics
namespace.
Here is an implementation of Sum()
using System.Numerics
:
It performs the following:
- Checks if SIMD is available and if it’s worth using it, otherwise go to 7.
- Casts the array of
int
to an array ofVector<int>
. - Creates a
Vector<int>
initialized to zeros (sumVector
). - Sums all the vectors from the array of
Vector<int>
intosumVector
. - Each
sumVector
item contains a partial sum. Sum them all. - The length of the original array may not be a multiple of the size of
Vector<int>
. Calculate the number of items not handled and slice the originalReadOnlySpan<int>
. - Sum all the not handled items.
So, how do these fair in terms of performance?
It’s almost 5 times faster than the fastest until now.
List<T>
List<T>
can also be iterated in two different ways. Using a for
loop, which will use its indexer, or using a foreach
or LINQ, which will use an enumerator.
To guarantee performance, we can implement one more overload of Sum()
using a for
loop:
ReSharper and Roslyn analyzers may suggest changing parameters of type
List<T>
to an interface, IGNORE THEM!List<T>
uses a value type enumerator to avoid heap allocations and virtual calls. Using the indexer will still be slower. Create an overload if you need to support other enumerable types.
Starting from .NET 5.0, there’s a method CollectionsMarshal.AsSpan<T>(List<>)
that returns a reference to the List<T>
inner array as a Span<T>
.
We can then implement a Sum(List<int>)
that calls the SIMD version of Sum(ReadOnlySpan<int>)
declared above:
Using foreach
makes it 1.28 times slower, while using AsSpan()
makes it 5.6 times faster.
Conclusions
Iteration of an array is a special case for the compiler. It may optimize the iteration but small changes may confuse it. The compiler does not optimize the iteration on other types of collections.
ReSharper and Roslyn analyzer are not always to be trusted in terms of performance.
System.Linq
is slow for arrays in most cases. Avoid its use when the source is an array or use one of the many alternatives.
List<T>
can be iterated as a Span<T>
.
Sequential processing is inherently slow. Favor the use of SIMD.
The benchmarking source code is available at https://github.com/aalmada/ArrayIteration