Enumeration in .NET II — Count()

Antão Almada
3 min readJun 9, 2018
Giraffe @ Badoca Safari Park by aalmada

This is part of a series of articles:

Count()

On my previous article, I analysed the usage of IEnumerable strictly based on its contract. It was brought to my attention that LINQ to Objects optimizes some of the described scenarios, possibly breaking some of my assumptions. It’s a good point and I did some more research.

Checking the implementation of the Count() extension method in LINQ to Objects, we can see that it handles differently the case where the IEnumerable instance also implements ICollection or one other internal interface. This optimization is tricky and let me explain why.

First, this optimization is based on knowledge of the existence of other interfaces. It works with the current framework as all its collections implement ICollection, but who knows what interfaces and collections will be added in the future? IReadOnlyCollection was added to the framework after LINQ and it looks like they didn’t update this code. What about third-party libraries?

Second, the use of LINQ operators breaks the optimization. Let’s see…

The following code benchmarks the Count() extension method applied to an IEnumerable, a filtered IEnumerable, an ICollection and a filtered ICollection, for 0, 10 and 100 elements:

using BenchmarkDotNet.Attributes;
using System.Collections.Generic;
using System.Linq;

namespace EnumerationBenchmarks
{
[MemoryDiagnoser]
public class CountBenchmarks
{
IEnumerable<int> enumerable;
IEnumerable<int> filteredEnumerable;
IEnumerable<int> collection;
IEnumerable<int> filteredCollection;

[Params(0, 10, 100)]
public int ItemsCount { get; set; }

[GlobalSetup]
public void Setup()
{
enumerable = MyRange(0, ItemsCount);

filteredEnumerable = enumerable
.Where(_ => true);

collection = enumerable
.ToList();

filteredCollection = collection
.Where(_ => true);
}

[Benchmark]
public int Enumerable_Count() =>
enumerable.Count();

[Benchmark]
public int FilteredEnumerable_Count() =>
filteredEnumerable.Count();

[Benchmark]
public int Collection_Count() =>
collection.Count();

[Benchmark]
public int FilteredCollection_Count() =>
filteredCollection.Count();

static IEnumerable<int> MyRange(int start, int count)
{
var end = start + count;
for (var value = start; value < end; value++)
yield return value;
}
}
}

NOTE: I initially used Enumerable.Range() for the IEnumerable instance but the results where not what I expected. It turns out it implements the internal interface that is optimized in Count(). The method MyRange(), implemented at the bottom, returns a pure IEnumerable instance.

The results of the benchmark were the following:

As expected, Enumerable_Count allocates on the heap and its mean time increases directly with the number of elements, which is all bad.

Also expected from the Count() implementation, Collection_Count is much faster than Enumerable_Count, doesn’t allocate on the heap and has constant mean time whatever number of elements, which is all great.

What it’s important here is that, although FilteredCollection_Count was created as a collection, its mean time increases directly with the number of elements. It’s a smoother increase than with FilteredEnumerable_Count but still a complexity of O(n).

Conclusion

When you have a method with an IEnumerable argument, it’s assumed you have no idea how it was created. You should not expect Count() to have any complexity other than O(n).

If you can optimize the algorithm given more capabilities, make it explicit by using other interfaces like IReadOnlyCollection, IReadOnlyList, IReadOnlyDictionary or any other that makes sense.

The optimization seems to be misleading and useless.

If you follow the rules from my previous article, you won’t have any surprises…

--

--