Achieving zero heap memory allocations with LINQ operations.
This is a follow up on my exploration into improving LINQ performance. This is a work in progress. Go to my previous article to find what triggered this.
The source code is available at https://github.com/NetFabric/NetFabric.Hyperlinq
I’ve been emphasizing the importance of the use of value types for raw performance but, another advantage is that they are allocated on the stack. This means that, they don’t contribute to increasing the frequency of the garbage collection. The deallocation is deterministic. It’s immediately performed when it gets out of scope.
One of my objectives for this project is to get as close as possible to zero heap allocations.
To achieve zero allocation in this project, I declare all enumerables and enumerators as
struct but, this is still not enough. We have to avoid copies of these value types due to passing-by-value, boxing or defensive copies. Check my previous articles in this series to find how I take care of that.
There was a particular pattern in my code that I missed and where there was a heap allocation in disguise. It appears at line 8 of this code:
A new instance of an enumerator is created by calling
GetEnumerator() and stored in the
enumerator variable. The variable is of type
TEnumerator that, by using generics constraints, allows the enumerator not to be boxed. Well, in this case, that’s not true…
GetEnumerator() method returns
IEnumerator<TSource> that boxes the enumerator. The cast to
TEnumerator unboxes it immediately but the heap allocation and copies were performed anyway.
Is it possible to avoid these allocations? There is more than one solution…
A solution is not to call
GetEnumerator() whenever possible.
We can implement an overload for the operation that takes an
IReadOnlyList as the source. This allows the use of a
for loop instead of the
foreach loop. Not only it doesn’t require an enumerator, it is also much more performant. In this particular case, it’s 5 times faster for a
Still, not all collections implement IReadOnlyList…
The problems with
GetEnumerator() start when the compiler has no information on the type of enumerator returned. What if there was an enumeration interface that could carry this information? I found one implemented by Ben Adams.
We can now implement one more overload for the
Notice at line 8 that the cast is no longer required. The compiler now knows the type of the enumerator that is returned and no boxing is performed.
Notice also at line 10 that the calls to
Current were compacted into one single call to
TryMoveNext(), making the interface usage more intuitive.
Benchmarks show that actually there’s no performance improvement but now there are no heap allocations!
The NetFabric.Hyperlinq library makes extensive use of this interface. All its enumerables implement
IValueEnumerable instead of
foreach keyword and composed operations will call its methods. If you need to convert to
IEnumerable, use the
AsEnumerable() conversion operation.
Still, no other collections implement
I explained in my previous articles that
foreach uses duck typing. It looks for the type returned by the public
GetEnumerator() method. Can we implement this same behavior in LINQ operations? One solution is to use expression trees to generate code at runtime that will have the same behavior.
The extension method
GetEnumeratorMethod(), at line 23, uses reflection to look for the most appropriate implementation of
GetEnumerator(). Just like
foreach, it first looks for a public non-static implementation, which can return some type that is not an interface. If not found, it then checks for
IEnumerable implementations, that will always return interfaces.
Finding the method is not enough. It needs to be called. We could use
Invoke() to call it but this is slow. The method
Create(), at line 9, uses expression trees to generates a
Func that calls the method returned by
GetEnumeratorMethod(). The new method is compiled, making it fast and strongly-typed.
Reflection and expression tree compilation are very slow and should not be used frequently throughout the lifetime of the app. To avoid this, the resulting
Func is stored in a static field, implicitly generated by the read-only property at line 7. The method
Create() is only called the first time the property is accessed.
Because the field is inside a class that uses generics, the compiler will automatically use a different one for each combination of
<TEnumerable, TEnumerator, TSource>. Getting the value from the field is very efficient. There is no need to use a dictionary that would need to calculate the hash code for
Type and look it up in the collision buckets.
We can now implement a new
Count() overload that can take advantage of the dynamically generated code. Notice at line 8 that the cast is not required. No boxing or any heap allocations happen in this version.
This solution is not compatible with AOT (ahead-of-time compilation)…
source.Count(_ => true), which iterates all the collection items:
- NetFabric.Hyperlinq performs much better than LINQ in almost every case. It can be up to 4 times faster. Except for when the enumerator is a reference-type, which the performance is fairly the same.
- NetFabric.Hyperlinq only allocates on the heap when the enumerator is a reference-type.
source.Where(_ => true).Select(item => item).Count(), which is the composition of multiple operations and iterates all the collection items:
- NetFabric.Hyperlinq maintains more or less the same performance gains.
- NetFabric.Hyperlinq only allocates on the heap when the enumerator is a reference-type and the same size as the previous benchmark, which means it allocates only one enumerator instance. LINQ increases allocation size compared to the previous benchmark.
It is possible to implement LINQ operations with zero allocation.
NetFabric.Hyperlinq implements extension methods for all .NET collections that have value-type enumerators so that it to happen implicitly. To add support for other collections, just implement equivalent mapping extensions.
It also supplies multiple overloads for each operation so that the best performance can be achieved in each case.
Reference-type enumerables and enumerators will always be allocated on the heap but are still supported.