Enumeration in .NET II —
This is part of a series of articles:
- Enumeration in .NET
- Enumeration in .NET II —
- Enumeration in .NET III —
- Enumeration in .NET IV — Finding an item
- Enumeration in .NET V —
On my previous article, I analysed the usage of
IEnumerable strictly based on its contract. It was brought to my attention that LINQ to Objects optimizes some of the described scenarios, possibly breaking some of my assumptions. It’s a good point and I did some more research.
Checking the implementation of the
Count() extension method in LINQ to Objects, we can see that it handles differently the case where the
IEnumerable instance also implements
ICollection or one other internal interface. This optimization is tricky and let me explain why.
First, this optimization is based on knowledge of the existence of other interfaces. It works with the current framework as all its collections implement
ICollection, but who knows what interfaces and collections will be added in the future?
IReadOnlyCollection was added to the framework after LINQ and it looks like they didn’t update this code. What about third-party libraries?
Second, the use of LINQ operators breaks the optimization. Let’s see…
The following code benchmarks the
Count() extension method applied to an
IEnumerable, a filtered
ICollection and a filtered
ICollection, for 0, 10 and 100 elements:
NOTE: I initially used
Enumerable.Range() for the
IEnumerable instance but the results where not what I expected. It turns out it implements the internal interface that is optimized in
Count(). The method
MyRange(), implemented at the bottom, returns a pure
The results of the benchmark were the following:
Enumerable_Count allocates on the heap and its mean time increases directly with the number of elements, which is all bad.
Also expected from the
Collection_Count is much faster than
Enumerable_Count, doesn’t allocate on the heap and has constant mean time whatever number of elements, which is all great.
What it’s important here is that, although
FilteredCollection_Count was created as a collection, its mean time increases directly with the number of elements. It’s a smoother increase than with
FilteredEnumerable_Count but still a complexity of O(n).
When you have a method with an
IEnumerable argument, it’s assumed you have no idea how it was created. You should not expect
Count() to have any complexity other than O(n).
The optimization seems to be misleading and useless.
If you follow the rules from my previous article, you won’t have any surprises…