P3.NET

C# Iterator Oddity

If you’ve never used a C# iterator before then it may seem like magic. But under the hood it isn’t doing anything you can’t already do yourself. It is a time saver and as such there are quick actions and analyzers to recommend that you use them. If you’re like me you use them without even thinking but it is still important to remember they are generated code. It really shouldn’t be necessary to understand the underlying implementation of how they work but sometimes you have to. Here’s an interesting oddity that I saw recently with iterators.

Transforming IEnumerable<T> lists is a common use for extension methods. Suppose we have the following code in a payment system.

class OrderItem
{
   public int Id { get; set; }
   public int Quantity { get; set; }
}

class PaymentService
{
    public IEnumerable<OrderItem> GetCartItems ()
    {
        IEnumerable<OrderItem> data = new[]
        {
            new OrderItem() { Id = 10, Quantity = 1 },
            new OrderItem() { Id = 11, Quantity = 1 },
            new OrderItem() { Id = 10, Quantity = 1 },
        };

        return data;
    }
}

class Program
{
   static void Main ()
   {
      var service = new PaymentService();
      var items = service.GetCartItems();

      foreach (var item in items)
         Console.WriteLine($"{item.Id} = {item.Quantity}");
   }
}

Running this code works fine. But notice that we have to items with the same Id. We probably want to group these together so we will write a normalize method to combine any items together that have the same Id.

static class OrderItemExtensions
{
    public static IEnumerable<OrderItem> Normalize ( this IEnumerable<OrderItem> source )
    {
        var groups = from i in source
                        group i by i.Id into g
                        select g;

        var items = new List<OrderItem>();

        foreach (var group in groups)
        {
            var item = group.FirstOrDefault();

            if (group.Count() > 1)
                item.Quantity = group.Sum(i => i.Quantity);

            items.Add(item);
        };

        return items;
    }
}

//Normalize the data
items = items.Normalize();

Now we are getting just two items back because we’ve combined the objects. But to get this to work we are creating a temporary list. This type of code is what iterators were designed to replace.

Iterator Implementation

The current implementation has some disadvantages. Firstly the larger the source list the more memory the new list will take up. For small lists this isn’t an issue but if we had 1,000, 10,000 or 100,000 items this would be unmanageable. The only purpose of the list is to store a temporary copy of the data. The second issue is that we are going to process the entire list even if the results are never used or only a subset therein. Clearly we need to work around the need for a list. This is what iterators are for.

Here’s the equivalent code using the iterator synax.

public static IEnumerable<OrderItem> NormalizeWithIterator ( this IEnumerable<OrderItem> source )
{
    var groups = from i in source
                    group i by i.Id into g
                    select g;

    foreach (var group in groups)
    {
        var item = group.FirstOrDefault();

        if (group.Count() > 1)
            item.Quantity = group.Sum(i => i.Quantity);

        yield return item;
    };
}

//Normalize with iterator
items = items.NormalizeWithIterator();

This code is cleaner. We don’t need a temporary list anymore and we will only process the next result if the caller asks for it. Win. Win.

Problem in Paradise

There is a problem with this code that you probably won’t see with just the example given. In fact this code may work correctly in other code but try this – put a breakpoint after the foreach statement and run it. Verify the output is correct. Then use the debugger to look at the items variable. Notice anything wrong? Look at the variable again. See what is happening? Each time you evaluate the variable the quantity is going up. What is going on?

You may think it is a referencing issue in that the code is reusing the same object and, ultimately, that is the problem but it doesn’t really explain why it is an issue. Transforms should be able to make changes to the list objects. If it was simply a referencing problem then this code should replicate the issue, but it doesn’t.

items = items.NormalizeWithIterator().NormalizeWithIterator();

If you think about what is going on it makes sense that the above code works. For the first call to NormalizeWithIterator we have 3 items. The returned enumeration would only have 2 items so when NormalizeWithIterator is called again each item would appear once and it wouldn’t group them together again.

The problem actually resides in how iterators are implemented. It requires a better understanding of how the generated code works. When the compiler sees an iterator method it breaks it up. A new type is created, implementing IEnumerable that has members for the arguments given. Another type is created to implement IEnumerator. The IEnumerator<T> interface is then implemented. The most important member (for iterators) is the MoveNext method. This method is the one that does the heavy lifting. Each time you need the next result you call this method. The original method (containing the iterator logic) is replaced with a call to create the IEnumerable instance, set the members and then call the GetEnumerator method and return the IEnumerator<T> result.

The implementation of the MoveNext method is where the original iterator method gets generated, broken up to support the state machine. On the first call to the method it executes all the code up to the point the method returns a value (the yield). If anything is returned back then the move is successful and control returns to the caller. When the method is called again it resumes from where it left off (using the state machine) until it gets done with the work.

The subtle issue is that the code in your iterator method isn’t actually executed until the first time a result is needed, it uses deferred execution. This is why it fails in the above situation. When the original method is called you simply get back a constructed instance. It isn’t until the enumeration is triggered that any code executes. Going back to the example of mousing over the variable here is what is happening.

  1. The underlying instance is created and the original list is assigned to it.
  2. The IEnumerator<T> instance is returned.
  3. Now imagine another call to the method is made, this produces a new instance attached to the original list.
  4. When enumeration starts (e.g. foreach or ToList) the actual method is implemented which, in our case, runs the grouping. The grouping is run against the original object.
  5. Now enumeration starts on the second instance. It also does the grouping against the original object.
  6. During the enumeration on the first instance it modifies one of the objects.
  7. During the enumeration on the second instance it groups the same way and wants to modify the same object but now it will be using the (modified) value from the first instance.

Fixing the Issue

The problem is subtle because you’d only see it if you happen to have multiple instances of the enumerator that bound to the original list (e.g. in the debugger). If the iterator generation ran the code up to the first yield when the instance was originally created (as you might have imagined) then you would never see the issue. But that is not how the implementation currently works. So switching from a regular method to the iterator syntax actually modifies when the method contents are executed. This may produce unexpected results.

There are a couple of different ways to solve this problem. Perhaps the most obvious is to not modify the object being enumerated.

public static IEnumerable<OrderItem> NormalizeWithNewObject ( this IEnumerable<OrderItem> source )
{
    var groups = from i in source
                    group i by i.Id into g
                    select g;

    foreach (var group in groups)
    {
        var item = group.FirstOrDefault();

        if (group.Count() > 1)
            item = new OrderItem()
            {
                Id = item.Id,
                Quantity = group.Sum(i => i.Quantity)
            };

        yield return item;
    };
}

//Normalize with a new object
items = items.NormalizeWithNewObject();

The net effect is that we’re not modifying the original list anymore. So we can create any number of enumerators and then call them haphazardly and each will return a new set of objects. For public facing code this is probably the safer route since the caller may or may not capture the updated list. But if this were internal code we would lose some of the benefit of iterators because we are potentially creating duplicate items. Another disadvantage of this approach is that it becomes harder to manage unless the object being copied supports cloning. It is yet another place you have to define a clone method.

Another option is to go back to our original implementation without the iterators. If I were concerned about performance and the iterator method was an internal call only I would probably go this route. It requires less memory at the cost of changing data inline. Because each time a new list is returned the grouping wouldn’t cause any issues.

Ultimate the solution is up to you. The purpose of this post was to share some of my experience with iterators and providing a word of caution about just haphazardly updating code (manually or via tools) to use syntactical helpers like iterators. Good unit tests will help with some of this but I doubt even in this case that a unit test would have detected this scenario. I know when I first saw it occur it took me a while to figure out why what I was seeing didn’t line up with what the debugger was showing me.