Enumeration in .NET V — ToList(), or not ToList(), that is the question

Antão Almada
6 min readJul 25, 2018
Plaza de Armas, Seville by aalmada

EDIT: Updated to .NET 8 and improved content.

This is part of a series of articles:

Introduction

I frequently find other developers using a ToList() or ToArray() at the end of every LINQ query:

var sequence = Enumerable.Range(0, 10)
.ToList();

foreach(var number in sequence)
Console.WriteLine(number);

Most of the time this is not necessary and can have a big impact on performance.

Lazy evaluation

LINQ queries use lazy evaluation. This means that a query by itself doesn’t do any work. The items have to be pulled by calling the MoveNext() of an instance of its enumerator. This is usually done by simply using a foreach loop.

The ToList() from the example can be removed:

var sequence = Enumerable.Range(0, 10);

foreach(var number in sequence)
Console.WriteLine(number);

In this case, the foreach will pull the values directly from the Enumerable.Range() and output it to the console.

NOTE: Unfortunately, the GetEnumerator() method returns a reference-type enumerator, so it has to be allocated on the heap. The GetEnumerator() method for List<T> and other collections on the framework return a value-type enumerator which the foreach loop do not box, resulting in the no heap allocations and better performance.

Arrays and List<T>

Arrays and List<T> are data structures that allow random access. They provide an indexer than finds the item just by calculating the offset from the beginning of the allocated memory. They allow sequential access by also providing a GetEnumerator() method.

Arrays are contiguous portions of allocated memory. Its property Length returns the number of items that fits in the allocated memory. Resizing an array means allocating a new array, copying the items and releasing the original array. This is an expensive operation.

A List<T> contains an array. Its property Capacity returns the number of items that fits in this internal array. Its property Count returns the number of items stored in the list. The Capacity is always greater than or equal to Count.

When an item is added to the List<T>, Count is incremented and if it becomes larger than Capacity it means the internal array has to be resized, which we already know it’s an expensive operation. When resizing, the Capacity is increased to its double, up to a maximum size.

ToList() and ToArray()

ToList() converts an enumerable into a List<T>, while ToArray() converts an enumerable into an array.

Their behavior depends on the source type:

In either case, memory has to be allocated on the heap and all the items copied. This adds pressure to the garbage collector and it’s not a cheap operation. The second case can be worse than the first, depending on the number of resizes required.

When using ToList() or ToArray() after a collection of known size:

var sequence = Enumerable.Range(0, 10)
.ToList();

foreach(var number in sequence)
Console.WriteLine(number);

These operations have to allocate memory once and copy all the items.

When using ToList() or ToArray() after a collection of unknown size:

var sequence = Enumerable.Range(0, 10)
.Where(_ => true)
.ToList();

foreach(var number in sequence)
Console.WriteLine(number);

NOTE: The Where(_ => true) is here just for benchmarking purposes. It converts the Enumerable.Range() into an IEnumerable<T> while returning the same number of items. It represents the output of most regular LINQ queries.

Now these operations have to allocate memory one or more times and copy all the items by using an equivalent to the first foreach loop.

Performance

Lets compare the performance by using BenchmarkDotNet to run the following code:

public class ToListBenchmarks
{
[Params(10, 1_000)]
public int Count { get; set; }


[Benchmark(Baseline = true)]
public long LazyEvaluation()
{
var sequence = Enumerable.Range(0, Count)
.Where(_ => true);

var sum = 0L;
foreach (var item in sequence)
sum += item;
return sum;
}

[Benchmark]
public long ToList()
{
var sequence = Enumerable.Range(0, Count)
.Where(_ => true)
.ToList();

var sum = 0L;
foreach (var item in sequence)
sum += item;
return sum;
}

[Benchmark]
public long ToArray()
{
var sequence = Enumerable.Range(0, Count)
.Where(_ => true)
.ToArray();

var sum = 0L;
foreach (var item in sequence)
sum += item;
return sum;
}
}

It compares the performance for sources with two sizes. A small one (10 items) and a relatively large one (1.000 items).

It uses Where(_ => true) so that the size of the source is not known to the methods ToList() and ToArray().

I also configured it to use .NET 6, .NET 7, and .NET 8 (all the “modern” .NET runtimes).

NOTE: The benchmarking methods return the sum of the items so that the JIT compiler doesn’t remove any code that it considers unused.

One thing to note is that the performance improves significantly between .NET 7 and .NET 8. That’s one good reason to upgrade to .NET 8 as soon as possible.

Notice the memory allocated on the heap. The lazy evaluation only allocates one enumerator (96 bytes) and it’s around 20% faster than both alternatives. ToList() and ToArray() allocate 312 bytes for 10 int values and more than 8 KB for 1.000 int values.

This benchmark adds the time of conversion and the time of enumerating the results. As I explained in my other article “Array iteration performance in C#”, iterating an array is much more performant than iterating a List<T>. We should also benchmark the conversion time independently of the iteration:

public class ToListBenchmarks
{
[Params(10, 1_000)]
public int Count { get; set; }

[Benchmark(Baseline = true)]
public List<int> ToList()
=> Enumerable.Range(0, Count)
.Where(_ => true)
.ToList();

[Benchmark]
public int[] ToArray()
=> Enumerable.Range(0, Count)
.Where(_ => true)
.ToArray();
}

Notice that most the allocated memory does come from the conversion operation and that it increases with the number of items in the sequence.

Conclusions

ToList() and ToArray() will allocate on the heap. The size depends on the number of items and the size of the items.

Heap allocations add pressure to the garbage collection. If you allocate small amounts and keep them for short period of time, these will be handled by the Gen 0 collection, which is fast but not free. If you allocate big amount (>85,000 bytes) they will go directly into the LOH (Large Object Heap) causing its fragmentation and making it slow.

These methods should be used to cache the result of a query when:

  • It’s going to be iterated more than one time.
  • The total size is guaranteed to fit in memory.

A method that returns the result of a query should not use any of these methods. It should return the result directly or cast it to IEnumerable<T>. It should be the caller of the method to decide if the result should be cached and, if so, if ToList() or ToArray() is more adequate.

--

--