NetFabric.Hyperlinq — Zero Allocation

Antão Almada
6 min readMar 20, 2019

Achieving zero heap memory allocations with LINQ operations.

Empty Flowers by aalmada

Introduction

This is a follow up on my exploration into improving LINQ performance. This is a work in progress. Go to my previous article to find what triggered this.

The source code is available at https://github.com/NetFabric/NetFabric.Hyperlinq

Zero Allocation

I’ve been emphasizing the importance of the use of value types for raw performance but, another advantage is that they are allocated on the stack. This means that, they don’t contribute to increasing the frequency of the garbage collection. The deallocation is deterministic. It’s immediately performed when it gets out of scope.

One of my objectives for this project is to get as close as possible to zero heap allocations.

To achieve zero allocation in this project, I declare all enumerables and enumerators as struct but, this is still not enough. We have to avoid copies of these value types due to passing-by-value, boxing or defensive copies. Check my previous articles in this series to find how I take care of that.

There was a particular pattern in my code that I missed and where there was a heap allocation in disguise.

public static int Count<TEnumerable, TEnumerator, TSource>(this TEnumerable source, Func<TSource, bool> predicate)
where TEnumerable : IEnumerable<TSource>
where TEnumerator : IEnumerator<TSource>
{
if (source == null) ThrowHelper.ThrowArgumentNullException(nameof(source));

var count = 0;
using (var enumerator = (TEnumerator)source.GetEnumerator())
{
while (enumerator.MoveNext())
{
if (predicate(enumerator.Current))
count++;
}
}
return count;
}

A new instance of an enumerator is created by calling GetEnumerator() and stored in the enumerator variable. The variable is of type TEnumerator that, by using generics constraints, allows the enumerator not to be boxed. Well, in this case, that’s not true…

The method GetEnumerator() method returns IEnumerator<TSource> that boxes the enumerator. The cast to TEnumerator unboxes it immediately but the heap allocation and copies were performed anyway.

Is it possible to avoid these allocations? There is more than one solution…

IReadOnlyList

A solution is not to call GetEnumerator() whenever possible.

public static int Count<TEnumerable, TSource>(this TEnumerable source, Func<TSource, bool> predicate)
where TEnumerable : IReadOnlyList<TSource>
{
if (source == null) ThrowHelper.ThrowArgumentNullException(nameof(source));

var count = 0;
for (var index = 0; index < source.Count; index++)
{
if (predicate(source[index]))
count++;
}
return count;
}

We can implement an overload for the operation that takes an IReadOnlyList as the source. This allows the use of a for loop instead of the foreach loop. Not only it doesn’t require an enumerator, it is also much more performant. In this particular case, it’s 5 times faster for a List<int>.

Still, not all collections implement IReadOnlyList…

IValueEnumerable

The problems with GetEnumerator() start when the compiler has no information on the type of enumerator returned. What if there was an enumeration interface that could carry this information? I found one implemented by Ben Adams.

// Based on implementation by Ben Adams
// https://gist.github.com/benaadams/294cbd41ec1179638cb4b5495a15accf

public interface IValueEnumerator : IDisposable
{
bool TryMoveNext();
}

public interface IValueEnumerator<T> : IValueEnumerator
{
bool TryMoveNext(out T current);
}

public interface IValueEnumerable<TEnumerator>
where TEnumerator : struct, IValueEnumerator
{
TEnumerator GetValueEnumerator();
}

public interface IValueEnumerable<T, TEnumerator>
where TEnumerator : struct, IValueEnumerator<T>
{
TEnumerator GetValueEnumerator();
}

We can now implement one more overload for the Count() operation:

public static int Count<TEnumerable, TEnumerator, TSource>(this TEnumerable source, Func<TSource, bool> predicate)
where TEnumerable : IValueEnumerable<TSource, TEnumerator>
where TEnumerator : struct, IValueEnumerator<TSource>
{
if (source == null) ThrowHelper.ThrowArgumentNullException(nameof(source));

var count = 0;
using (var enumerator = source.GetValueEnumerator())
{
while (enumerator.TryMoveNext(out var current))
{
if (predicate(current))
count++;
}
}
return count;
}

Notice at line 8 that the cast is no longer required. The compiler now knows the type of the enumerator that is returned and no boxing is performed.

Notice also at line 10 that the calls to MoveNext() and Current were compacted into one single call to TryMoveNext(), making the interface usage more intuitive.

Benchmarks show that actually there’s no performance improvement but now there are no heap allocations!

The NetFabric.Hyperlinq library makes extensive use of this interface. All its enumerables implement IValueEnumerable instead of IEnumerable. The foreach keyword and composed operations will call its methods. If you need to convert to IEnumerable, use the AsEnumerable() conversion operation.

Still, no other collections implement IValueEnumerable

Expression trees

I explained in my previous articles that foreach uses duck typing. It looks for the type returned by the public GetEnumerator() method. Can we implement this same behavior in LINQ operations? One solution is to use expression trees to generate code at runtime that will have the same behavior.

static class Dynamic
{
public static class GetEnumerator<TEnumerable, TEnumerator, TSource>
where TEnumerable : IEnumerable<TSource>
where TEnumerator : IEnumerator<TSource>
{
public static Func<TEnumerable, TEnumerator> Invoke { get; } = Create();

static Func<TEnumerable, TEnumerator> Create()
{
var enumerableType = typeof(TEnumerable);
var getEnumerator = enumerableType.GetEnumeratorMethod<TSource>();
if (getEnumerator is null || !typeof(TEnumerator).IsAssignableFrom(getEnumerator.ReturnType))
throw new Exception($"'{enumerableType.FullName}': type must be implicitly convertible to '{typeof(TEnumerable).FullName}'");

var enumerable = Expression.Parameter(enumerableType, "enumerable");
var body = Expression.Call(enumerable, getEnumerator);
return Expression.Lambda<Func<TEnumerable, TEnumerator>>(body, enumerable).Compile();
}
}

static MethodInfo GetEnumeratorMethod<TSource>(this Type enumerableType)
{
const string MethodName = "GetEnumerator";

var method = enumerableType.GetMethod(MethodName, BindingFlags.Public | BindingFlags.Instance);
if (!(method is null))
return method;

var type = typeof(IEnumerable<>).MakeGenericType(typeof(TSource));
if (type.IsAssignableFrom(enumerableType))
return type.GetMethod(MethodName);

type = typeof(IEnumerable);
if (type.IsAssignableFrom(enumerableType))
return type.GetMethod(MethodName);

return null;
}
}

The extension method GetEnumeratorMethod(), at line 23, uses reflection to look for the most appropriate implementation of GetEnumerator(). Just like foreach, it first looks for a public non-static implementation, which can return some type that is not an interface. If not found, it then checks for IEnumerable<T> or IEnumerable implementations, that will always return interfaces.

Finding the method is not enough. It needs to be called. We could use Invoke() to call it but this is slow. The method Create(), at line 9, uses expression trees to generates a Func that calls the method returned by GetEnumeratorMethod(). The new method is compiled, making it fast and strongly-typed.

Reflection and expression tree compilation are very slow and should not be used frequently throughout the lifetime of the app. To avoid this, the resulting Func is stored in a static field, implicitly generated by the read-only property at line 7. The method Create() is only called the first time the property is accessed.

Because the field is inside a class that uses generics, the compiler will automatically use a different one for each combination of <TEnumerable, TEnumerator, TSource>. Getting the value from the field is very efficient. There is no need to use a dictionary that would need to calculate the hash code for Type and look it up in the collision buckets.

We can now implement a new Count() overload that can take advantage of the dynamically generated code. Notice at line 8 that the cast is not required. No boxing or any heap allocations happen in this version.

public static int Count<TEnumerable, TEnumerator, TSource>(this TEnumerable source, Func<TSource, bool> predicate)
where TEnumerable : IEnumerable<TSource>
where TEnumerator : IEnumerator<TSource>
{
if (source == null) ThrowHelper.ThrowArgumentNullException(nameof(source));

var count = 0;
using (var enumerator = Dynamic.GetEnumerator<TEnumerable, TEnumerator, TSource>.Invoke(source))
{
while (enumerator.MoveNext())
{
if (predicate(enumerator.Current))
count++;
}
}
return count;
}

This solution is not compatible with AOT (ahead-of-time compilation)

Benchmarks

Benchmarking source.Count(_ => true), which iterates all the collection items:

  • NetFabric.Hyperlinq performs much better than LINQ in almost every case. It can be up to 4 times faster. Except for when the enumerator is a reference-type, which the performance is fairly the same.
  • NetFabric.Hyperlinq only allocates on the heap when the enumerator is a reference-type.

Benchmarking source.Where(_ => true).Select(item => item).Count(), which is the composition of multiple operations and iterates all the collection items:

  • NetFabric.Hyperlinq maintains more or less the same performance gains.
  • NetFabric.Hyperlinq only allocates on the heap when the enumerator is a reference-type and the same size as the previous benchmark, which means it allocates only one enumerator instance. LINQ increases allocation size compared to the previous benchmark.

Conclusion

It is possible to implement LINQ operations with zero allocation.

NetFabric.Hyperlinq implements extension methods for all .NET collections that have value-type enumerators so that it to happen implicitly. To add support for other collections, just implement equivalent mapping extensions.

It also supplies multiple overloads for each operation so that the best performance can be achieved in each case.

Reference-type enumerables and enumerators will always be allocated on the heap but are still supported.

--

--