This is part of a series of articles:
- Enumeration in .NET
- Enumeration in .NET II —
- Enumeration in .NET III —
- Enumeration in .NET IV — Finding an item
- Enumeration in .NET V —
Enumeration in .NET
All .NET developers know and use
IEnumerable but when reviewing code I’ve found that many fall into the same pitfalls, resulting in applications with easily avoidable performance issues.
IEnumerable is very useful and hard to avoid. In this article I will explore some of its nuances so you can make a better use of its features.
NOTE: This article contains links to SharpLab where you can test the code examples. Click on the links and change the code at will to get a better grasp of the concepts explained here.
Enumeration in .NET is based on two very simple interfaces:
IEnumerator. There is a non-generic and a generic version for each but lets now focus only on their roles.
IEnumerableis simply a factory of
IEnumerators— Its only method
GetEnumerator(), returns a new instance of an object that implements
IEnumeratorperforms the enumeration — The method
MoveNext()advances the enumeration to the next position and the
Currentproperty returns the value for that position. The
Reset()method is provided for COM interoperability and it may not be supported by the collection.
Given these interfaces, these are some important points to highlight:
- They don’t allow collection mutations. Only read operations.
- They don’t allow random access. Only sequential enumeration.
- They don’t give access to any other collection information. Only allow enumeration and that’s all.
In this article I will explore further the consequences of these.
IEnumeratorfrom the same
IEnumerableshould be independent and be able to have different enumeration positions. If not, when enumerating more than once on the same
IEnumerablewill result in different value sequences. I’ve found some providers that don’t respect this rule. Beware of this as some of the rules I mention here may not work correctly in those cases.
The simplest way to enumerate is to use a
foreach loop. Lets start with a simple method that takes an
IEnumerable<int> and outputs all its items to the console:
You can check at SharpLab that, in reality, the above code is expanded to a
while loop. Focusing on the enumeration, we can resume it to this:
Notice that, first a new instance of an
IEnumerator is created. Next, it starts a while loop calling
MoveNext(), stopping when false is returned. When it’s true, the
Current property returns the new value.
The generic version of
IDisposable so the using keyword is used to take care of that.
This is very simple but lets now focus on the nuances…
Count() vs. Any()
It’s good practice to validate arguments and avoid unnecessary computation, so I’ve found the following very often:
It uses the
Count() extension method to check if the enumerable is empty and exits early. Unfortunately this does way more computation than most expect…
From the highlighted points on the enumeration interfaces, we already know that these can only enumerate sequentially and have no more information on the collection.
Count() implementation looks something like this:
Notice that it has to enumerate all the items, counting the number of times
true. As a consequence,
Count() has a complexity of O(n).
This means the previous code example is enumerating the collection three times. First to check if it’s empty, second to calculate the sum and third to get the number of items.
NOTE: LINQ uses an optimization where, if the collection implements
ICollection, it simply calls the
Countproperty. The issue is that many collections don’t implement it. To avoid any surprise, I always assume they don’t.
There is a better solution.
Any() is an extension method that returns true if the enumerable contains at least one element; otherwise
Any() implementation looks something like this:
Notice that it creates an instance of
IEnumerator and then calls
MoveNext() only once. There is no loop!
Any() has a complexity of O(1).
You should always use
Any() to check if an enumerable is empty:
Usually, you don’t even need to check if it’s empty. The
foreach doesn’t enter the loop in that case. Here’s an implementation of
Average() that creates one single instance of
IEnumerator and enumerates the collection only once:
If required, always use Any() to find if an IEnumerable contains any element.
Count() vs. Count
Sometimes finding if an enumerable is empty is not enough. You may need to get the actual number of items.
Count() is an extension method for the
IEnumerable interface while
Count is a property of the
IReadOnlyCollection interface. When using them, the only noticeably difference is the existence or absence of the parentheses, but their behavior is very different.
IReadOnlyCollection is implemented by most collection in the .NET framework. These keep the count value as a private field or can easily calculate it from other private fields allowing
Count to have a complexity of O(1).
If you return one of these, instead of the usual
IEnumerable, the callers will be able to both enumerate and efficiently perform other supported operations, while maintaining immutability. You can return other collection interfaces or types if required.
You can add one more
Average() extension method for
IReadOnlyCollection that takes advantage of availability of the
Count property and guaranteeing that the method will not mutate the collection. It improves performance by not creating the instance of
IEnumerator and not calling
MoveNext() when the collection is empty:
ICollectionshould derive from
IReadOnlyCollection, for historical reasons and to maintain backwards compatibility, it doesn’t. So, you cannot pass an
ICollectionto this method. To fully support all scenarios, you should have one more extension method for
ICollection. On the other hand, this causes ambiguity problems for collections that implement both interfaces, forcing you to cast to one of them.
Always expose the highest admissible level interface of the returned collection, and consume the lowest admissible level interface. IEnumerable, IReadOnlyCollection, IReadOnlyList and IReadOnlyDictionary maintain immutability.
The implementation of
MyRange(), found above, fills a list with all the range values and then returns it. This is inefficient and it’s an issue as it can fill up memory. The solution is to use the
In this case, the compiler automatically generates an implementation of
IEnumerable that enumerates the values without allocating a collection in memory. You can check at SharpLab the code generated that supports this behavior.
In the code example above, the method validates the argument count and throws an exception if it’s less than zero. You would expect it to be thrown immediately after the method is called. It doesn’t. It’s only thrown after the first call to
MoveNext(). Go to SharpLab and play with the count value to visualize this behavior.
To have the intended behavior, we need to split the method into two parts:
- immediate for validation,
- lazy for enumeration.
This can be done using a local function:
Lazy evaluation is the ability to “only execute code when needed and only as many times as required regardless of where in the execution pipeline the code sits”. You should take advantage of this to minimize resource usage when generating enumerables.
Use yield when generating enumerables.
I’ve found way too often the use of
ToArray() when developers want to reuse the result of a LINQ expression:
This results in unnecessary memory allocation and copying of all the enumeration values into it. This results in a big performance penalty. You should only use these methods when it’s strictly necessary to cache the values and it’s guaranteed that they always fit in memory.
LINQ expressions can be reused, maintaining the lazy evaluation behavior:
Use composition to reuse expressions, maintaining lazy evaluation.
null vs. Enumerable.Empty<T>()
I’ve seen many instances where developers return
null when they really mean that it’s empty. It’s important to understand that these have two very different meanings:
- null — an invalid state where the
IEnumerableinstance hasn’t been created,
- empty — a valid state where the enumeration contains no values.
null breaks the use of the foreach and method composition. You can see at SharpLab how badly it behaves.
You must return an empty enumeration. The
System.Linq namespace contains a
Enumerable.Empty<T>() static method that returns exactly that; an
IEnumerable implementation that generates
IEnumerator implementations where the method
MoveNext() always returns false.
Here is an example that exemplifies the use of it:
yield, you can exit the enumeration using
yield break and, if no value was returned until then, it results in an empty enumeration:
Never return a null enumerable. Use Enumerable.Empty<T>() or yield break instead.
System.Linq namespace, also known as “LINQ to Objects”, contains many extension methods for
IEnumerable. Not having random access enumeration support, means that full scans are required to find elements and that memory allocations with value copies are required in operations like ordering or grouping.
Databases have engines optimized for these operations and support much larger amounts of data.
IQueryable is an enumeration interface that converts the LINQ expression tree into something equivalent that a database can process. The execution of the query is delegated to the database.
IQueryable derives from
IEnumerable so the result can be enumerated just like we’ve been doing until now.
The .NET drivers for databases usually support
IQueryable and I’m going to show a couple of examples, but there are other providers like LinqToExcel, LinqToTwitter, LinqToCsv, LINQ-to-BigQuery, LINQ-to-GameObject-for-Unity, ElasticLINQ, GpuLinq and many more.
Entity Framework Core
The Entity Framework (EF) is an object-relational mapping (ORM) that includes “LINQ-to-SQL”, meaning it supports LINQ queries on a SQL database engine.
The following is an example based on the database model defined in the “Entity Framework Core Quick Overview” article. It applies a LINQ query on the
Blogs property, that is of type
DbSet<T> implements the
IQueryable interface which, in this case, converts the LINQ expression tree to SQL. The
ToSql() method, used on line 9, returns the generated string that will be sent to the database. The example above outputs the following:
SELECT TOP(5) [blog].[Url]
FROM [Blogs] AS [blog]
WHERE [blog].[Rating] > 3
ORDER BY [blog].[Rating] DESC
Notice that it’s equivalent to the LINQ query defined in the code. This means it will be fully performed by the database engine. The
foreach just has to enumerate the result returned. No filtering, projection, ordering or element counting is performed directly by LINQ operators.
Now imagine that you want to apply LINQ operators to the query result. Don’t use
AsEnumerable() instead. It will keep the lazy evaluation, without memory usage.
The following example performs a projection on the results, formatting the string that will be written to the console:
This is particularly useful with NOSQL databases as they usually support a limited set of operations defined in LINQ. You should still design the data model to execute the query as much as possible by the database engine.
DataStax C# Driver for Apache Cassandra
The following code is based on a data model similar to the one from the previous example and applies a query on the
blogs_by_rating table. The mapping is defined by the attributes on the
BlogByRating class located at the bottom.
CqlQuery<T>.ToString() returns the CQL query that is sent to the database. The line 18 of the code above, outputs the following:
SELECT "url" FROM "blogs_by_rating" WHERE "rating" = ?
The question mark indicates it uses a prepared statement. It means it will be reused and the values bound when called.
IEnumerable so you can use LINQ operators directly without the need of
NOTE: This driver is one of those cases mentioned above where enumerating more than once result in different value sequences. This is due to how
GetEnumerator()is implemented where resulting
IEnumeratorinstances share state. In this case, you may have to use
ToList()to keep the sequence in memory or call
Execute()before each enumeration to get a new
Execute as much as possible of the query within the IQueryable. If required, use AsEnumerable() to maintain lazy evaluation.
As a summary, these are the conclusions for each of the points focused above:
- If required, always use
Any()to find if an
IEnumerablecontains any element.
- Always expose the highest admissible level interface of the returned collection, and consume the lowest admissible level interface.
yieldwhen generating enumerables.
- Use composition to reuse expressions, maintaining lazy evaluation.
- Never return a null enumerable. Use
- Execute as much as possible of the query within the
IQueryable. If required, use
AsEnumerable()to maintain lazy evaluation.
I hope you liked this article and find it useful. Share it with your team.
NOTE: This article focused on the client side of the enumeration. If you want to implement
IEnumerable, you can find some tips in my other article “How to use
Memory<T>”. If you want to implement
IQueryable, check the open-source project Relinq.