Building Custom Iterators with ‘yield’ in C#
An enumerator represents a pull stream. Clients have to call the method MoveNext()
to request the next item. If it returns true
, it means that it successfully retrieved it and the value will be provided by the Current
property. The enumerator is implemented as a state machine. The state is composed of the current position in the stream and the current value.
An enumerable is simply a factory of enumerators. It returns a new instance of its enumerator every time GetEnumerator()
is called.
These methods can be explicitly called or implicitly by using a foreach loop.
yield return
yield return
in C# makes it very simple to implement pull streams. It can be used inside of any method that returns IEnumerable
, IEnumerable<T>
or IAsyncEnumerable<T>
. For example:
IEnumerable<int> GetValues()
{
yield return 1;
yield return 2;
}
This method represents a stream that contains the values 1 and 2. The method execution can be interpreted as follow:
- Executes code until it reaches a
yield return
or the end of the method. - If it was a
yield return
then the method returns the value provided. Otherwise, the execution terminates. - Next time the method is called, it resumes the execution immediately after the previous executed
yield return
. - Go to step 1.
In reality, the method is executed only once. The method returns an instance of an enumerable where the MoveNext()
method and the Current
property of its respective enumerator are used to retrieve the values.
What happens is that the compiler, when it finds the keyword yield
, generates all the code to implement the required state machine. Using SharpLab we can see that the method returns an instance of the following class. Here I simplified its code to make it easier to understand:
sealed class GetValues
: IEnumerable<int>
, IEnumerator<int>
{
int state;
int current;
int initialThreadId;
int IEnumerator<int>.Current
=> current;
object IEnumerator.Current
=> current;
public GetValues(int state)
{
this.state = state;
initialThreadId = Environment.CurrentManagedThreadId;
}
void IDisposable.Dispose() {}
bool MoveNext()
{
switch (state)
{
default:
return false;
case 0:
state = -1;
current = 1;
state = 1;
return true;
case 1:
state = -1;
current = 2;
state = 2;
return true;
case 2:
state = -1;
return false;
}
}
bool IEnumerator.MoveNext()
=> MoveNext();
void IEnumerator.Reset()
=> throw new NotSupportedException();
IEnumerator<int> IEnumerable<int>.GetEnumerator()
{
if (state == -2 && initialThreadId == Environment.CurrentManagedThreadId)
{
state = 0;
return this;
}
return new GetValues(0);
}
IEnumerator IEnumerable.GetEnumerator()
=> ((IEnumerable<int>)this).GetEnumerator();
}
The important part to understand is how MoveNext()
is implemented. Every time it’s called, it executes accordingly to the current value of state. For cases where state is 0 or 1, it sets the value of Current
, transitions to the next state and returns true
. Once it reaches 2, it then transitions to the default state and from then on it always returns false
. Meaning that no more values are available.
NOTE: The
async
andawait
keywords can be described exactly the same way but, instead of returning an instance of an enumerable, they return an instance ofTask<T>
orValueTask<T>
.
This previous example is very simple just so that the state machine can be easily understandable but these can become much more complex. For example, the Range()
method can be implemented as follow:
static IEnumerable<int> Range(int start, int count)
{
var end = start + count;
for (var value = start; value < end; value++)
yield return value;
}
It returns a sequence of integers. The same yield return
is called multiple times with different values. When the end value is reached, the loop terminates and so does the method as there’s no more code to be executed. You can experiment with it in SharpLab.
You can implement a Range()
that pauses before returning a value as follow:
static async IAsyncEnumerable<int> RangeAsync(int start, int count, int millisecondsDelay)
{
var end = start + count;
for (var value = start; value < end; value++)
{
await Task.Delay(millisecondsDelay);
yield return value;
}
}
Notice the use of the async
and await
keywords. This makes the code asynchronous which allows the application to execute other portions of the code while this portion is paused. You can experiment with it in SharpLab.
You can also process other enumerables using yield
. The Select()
method can be implemented as follow:
static IEnumerable<TResult> Select<TSource, TResult>(IEnumerable<TSource> source, Func<TSource, TResult> selector)
{
foreach(var item in source)
yield return selector(item);
}
It returns all the items from source but projected by the selector
. You can experiment with it in SharpLab.
The Where()
method can be implemented as follow:
static IEnumerable<T> Where<T>(IEnumerable<T> source, Func<T, bool> predicate)
{
foreach(var item in source)
{
if (predicate(item))
yield return item;
}
}
It returns all the items from the source for which the predicate
returns true
for the item
. You can experiment with it in SharpLab.
yield break
yield break
terminates the method execution. Basically, it’s a shortcut that transitions the state immediately to the default state. For example, the Empty()
method can be implemented as follow:
static IEnumerable<T> Empty<T>()
{
yield break;
}
It returns an empty stream. It immediately breaks so that MoveNext()
will return false
on the first call. You can experiment with it in SharpLab.
Here’s also an alternative implementation of Range()
:
static IEnumerable<int> Range(int start, int count)
{
var value = start;
var end = start + count;
while(true)
{
if (value == end)
yield break;
yield return value;
value++;
}
}
This implementation uses a infinite loop with an yield break
to terminate it. You can experiment with it in SharpLab.
Performance
If you are cautious about performance issues, then you should be aware of the following.
yield
generates an enumerable and its respective enumerator types. The use of enumerables is usually much slower than using the indexer provided by the array, spans, and all types that implement IReadOnlyList<T>
or IList<T>
.
A method that uses yield
must return IEnumerable
, IEnumerable<T>
or IAsyncEnumerable<T>
, which implies that the enumerator must be of type IEnumerator
, IEnumerator<T>
or IAsyncEnumerator<T>
respectively. These are interfaces which are reference types. The performance is not as good as with value type enumerators and the enumerator is allocated on the heap.
NOTE: Check my other article “Performance of value-type vs reference-type enumerators in C#” to understand how the enumerator type influences its performance.
Yet, the lazy-evaluation nature of the enumerables allows the generation of streams and its processing with none or minimal memory allocations as you can see by the examples provided. yield
is a great tool that can save you a lot of development and testing time. If you, for some reason, require more performance, you should explore alternatives or implement your own custom implementation of an enumerable.
NOTE: Check my other article “LINQ Internals: Speed Optimizations” to learn about the optimisations used by LINQ to improve performance.
Coroutines
Coroutines have been described as “functions whose execution you can pause”. Meaning that it’s a method that can resume execution. As we’ve already seen here, that can be implemented by using enumerables and yield
.
The concept is extensively used for game development in Unity. Game engines work by executing code, followed by a frame rendering, repeating these two steps in an infinite loop. The fast execution of this loop gives the impression of movement, just like with a movie. Coroutines are methods that can spread execution across several frames.
A coroutine in Unity is simply a method that returns IEnumerable
. The Unity engine will then call the respective MoveNext()
method to resume execution before every frame rendering. The use of yield
makes the code easier to write and maintain, instead of having to implement a complex state machine.
Behavior Trees
Behavior trees are used in game development and robotics to specify the behavior of agents or robots in a composable and reusable way. That’s where the name came from but the concept can be generalised to synchronously manage code execution in any other computer science field.
A behavior tree is actually multiple coroutines composed as a tree. The branch modules are coroutines that take one or more modules as parameters, its children modules. Each module is responsible for calling the MoveNext()
method of its children modules as appropriate. Leaf modules are coroutines that perform simple tasks. The composition of multiple modules as a tree allow the creation of complex tasks.
A behavior tree module can be in one of the three states: running, succeed or failed. So, a module must return IEnumerable<BehaviorStatus>
, being BehaviorStatus
a type that can represent the 3 states. This way, when a parent module calls the MoveNext()
method of one of its children, it knows if this child is still running, or finished in success or failure.
Conclusions
The yield
keyword is a powerful tool provided by the C# compiler that can be used to implement collections and streams but, in general, it can be used to implement easy to maintain state machines. It can also be used as a synchronous way of managing code execution.
NOTE: For asynchronous code execution check the
async
andawait
keywords: https://learn.microsoft.com/en-us/dotnet/csharp/asynchronous-programming/