Improving performance of LINQ generation operations.
This is a follow up on my exploration into improving LINQ performance. This is a work in progress. Go to my previous article to find what triggered this.
The source code is available at https://github.com/NetFabric/NetFabric.Hyperlinq
LINQ is a library of operations on
IEnumerable<T>. These operations can be grouped into categories. One of these categories is the generation operations that includes all the operations that create a new sequence of values.
In this category we can find the following operations:
DefaultIfEmpty<TSource>(TSource defaultValue)— Replaces an empty collection with a default valued singleton collection.
Empty<TResult>()— Returns an empty collection.
Range(int start, int count)— Generates a collection that contains a sequence of numbers.
Repeat<TResult>(TResult element, int count)—Generates a collection that contains one repeated value a given number of times.
Repeat<TResult>(TResult value)—Generates a sequence by repeating the given value infinitely.
Create<TResult>(Func<IEnumerator<TResult>> getEnumerator)—Creates an enumerable sequence based on an enumerator factory function.
Note: I’m going to ignore
DefaultIfEmptyfor the moment. Unlike the other operations in this category, it converts an enumerable into another one.
I copied the descriptions from the documentation. The terms collection, sequence and enumerable are used loosely as all these operations return
IEnumerable<T>. There’s no
Note: I use the term enumerable for entities that have a public
GetEnumerator()method and the term enumerator for entities that have a
MoveNext()method and a
Currentproperty. Not only implementation of IEnumerable and IEnumerator. This is aligned with the
What these operations differ from the framework collections (
HashSet, etc.) is that they don’t allocate memory to store all its items, these are generated dynamically using deferred execution. Meaning that, the items are generated when
MoveNext() is called explicitly or, implicitly by a
foreach or an aggregation operation.
I’m going to start this series of articles by the generation operations as these allow simpler optimizations and can be used as the source of values for all other operations.
Click/tap on the following links to find my implementations:
Enumerables and enumerators
As mentioned above, all original implementations of the generation operations return
IEnumerable<T> . LINQ uses classes derived from an internal
Iterator<T> class, while Ix uses the
yield keyword. Both result in a reference type that implements both
GetEnumerator() returns a “clone” of itself, with the state reset.
For each operation and its overrides, I created a value type enumerable. I like to think of the enumerables as factories of enumerators. To separate concerns, I keep enumerable and enumerator implementations as separate structs, keeping the close relation by declaring the enumerator as an inner struct. (If there’s an advantage on having a single struct, please let me know!)
GetEnumerator() returns a new instance of the enumerator using its type, not
IEnumerator<T>, avoiding boxing of the enumerator.
IEnumerator<T> are implemented explicitly so that these implementations can also be used where required.
All enumerables are immutable so, the
readonly keyword is used on their declaration. This is a feature introduced in C# 7.2, that allows the compiler not to generate defensive copies, improving performance.
Enumerators keep the state of the enumeration which usually mutates when
MoveNext() is called so, they cannot always be declared as
readonly. In some special cases there is no mutation. That’s the case for
RepeatEnumerable<TResult>.Enumerator so, these are also declared as
The use of the keyword in all these cases resulted in major improvements in some of the benchmarking scenarios.
IReadOnlyCollection and IReadOnlyList
Interestingly, I found that the enumerables of most generation operations can implement
Count is well known and the value returned by the indexer can be inferred from the
This means that the
Countproperty can be used on the result of these operations, which has O(1) complexity, instead of the
Count()operation, which has O(n) complexity.
Count()can be O(1) for some cases, it’s not the case for generation operations. It doesn’t change even for this new implementation as
Count()still hasn’t been updated to be aware of
IReadOnlyCollection<T>. Like I mentioned in a previous article, it’s better to expose these as public API than as internal runtime optimizations.
The implementation of
IReadOnlyList<T> also means that random access and
for loops are now supported.
The benchmark calculates the sum of the items, using
for loops, with a number of items of 0, 100 and 10000. (The original implementations only support
The graphs show the mean time that it takes to calculate the sum. Shorter is better.
The new implementation results in major performance improvements.
The new implementation is only slower for
Repeat, when using
foreach and when empty. This is because the original implementation, by returning an interface, can return different implementations given the parameter values. When
count is 0, it returns
Empty() which is more efficient. This should not be a big issue as empty enumerables are less frequent and its
MoveNext() method is called only once.
There’s still a lot of work required to fully take advantage of these performance improvements.
foreach can use the value types without boxing but all the methods that take
IEnumerable<T> parameters need to use interface constraints.
Still, it shows that these small changes in the code can make a huge difference in the performance.