Delayed execution with yield, or “How to abdicate, cede, and relent”

“Yield” is not generally a word we hillbillies understand all too well as my meanderings yesterday can attest to. I’ve been using the yield keyword for nigh on three years now while having only a vague understanding of how it works behind the scenes. And even now, I’m not quite sure where my logical fallacy is but I’m sure that over the time it takes to write this all out, I’ll have formed an opinion.

Here is a the first test I wrote for a new class:

[TestMethod]
[ExpectedException( typeof( ArgumentException ) )]
public void Should_throw_exception_if_search_term_is_not_provided()
{
    var sut = new VacationDestinationService();
    sut.FindDestinationsByRegion(string.Empty);
}

Bear with me on the ExpectedException. I’m pretty sure my problem is that I’m writing the wrong tests but for now, let’s just follow the “thought” process.

To get this test to pass, I created the following class:

public class VacationDestinationService
{
    public void FindDestinationsByRegion( string region )
    {
        if (string.IsNullOrEmpty(region))
        {
            throw new ArgumentException("C'mon, feller. You gotta pick a place.");
        }
    }
}

Of course, I don’t want this class to return void in the long run, but it gets the test passing.

Here’s the next test:

[TestMethod]
public void Should_retrieve_search_results()
{
    var sut = new VacationDestinationService();
    var results = sut.FindDestinationsByRegion("North America");
    Assert.IsTrue(results.Count() > 0);
}

Again, don’t get semantic on me. This isn’t *really* the actual test. But like most developers, I don’t want to get bogged down in dependencies and explaining context.

To pass this test, I modified my VacationDestinationService accordingly:

public class VacationDestinationService
{
    public IEnumerable<string> FindDestinationsByRegion( string region )
    {
        if (string.IsNullOrEmpty(region))
        {
            throw new ArgumentException("C'mon, feller. You gotta pick a place.");
        }

        var destinations = new List<string>(new[] {
            "Ozarks",
            "Appalachia",
            "Western Manitoba"
        });
        foreach ( var destination in destinations )
        {
            yield return destination;
        }
    }
}

And lo! This passes our test. So I re-run all my tests again before checking in, and lo!:

image

“What?!” says I, “Why is the first test failing?” Reaching for the debugger yields (teehee) no help because setting a breakpoint in FindDestinationsByRegion doesn’t do anything. That is, for all intents and purposes (pet peeve: NOT “intensive purposes”), the code in FindDestinationsByRegion is not called in the first test anymore. Hence, the exception isn’t thrown. Hence, the test fails.

If you’re the type of guy/gal to dive into the IL for this kind of thing, then you may know where this is heading. From what I can gather, the yield causes this method to execute only when the results are iterated. That is, even if we call this method and gather up the results in a variable, the code still won’t execute. But as soon as you throw it in a foreach loop, then you’ve got gold, baby.

Maybe this is what smarter people mean when they throw out terms like “delayed execution”. Maybe this is the cornerstone of lambdas. All I know is that I can’t check in my code until I’ve solved this problem to my liking.

As a test for this theory, I modified the first test slightly:

[TestMethod]
[ExpectedException( typeof( ArgumentException ) )]
public void Should_throw_exception_if_search_term_is_not_provided()
{
    var sut = new VacationDestinationService();
    sut.FindDestinationsByRegion(string.Empty).Count();
}

Note the extra call to Count() at the end of the second line. Now we are not only retrieving the results, we are iterating over them. This test passes.

So what is the big picture? Is the code wrong or my tests? Assuming I *do* want to throw an exception if the argument is wrong, is the code correct? What if someone provides an empty string to this method and never iterates over the collection? Apparently, this method won’t fail in that case, but that may not be a bad thing. Or it could be because then we have code that’s kind of useless.

It seems to me that the test should be modified to cause the iteration to happen and force the exception. If someone wants to call this method without iterating, I see no problem with that. I don’t think.

Another thing I did was to replace the yield and return the List itself. This worked too but I don’t like the idea of modifying my code for what appears to be a faulty test.

Was expecting to have formed an opinion by now but it’s still kind of fuzzy. I’ll let it sit for a while and perhaps a solution will present itself to me in the form of a better developer fixing it.

Kyle the Iterative

This entry was posted in Conscientious Coding. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Yoon

    I think yield return is “too” lazy.

    Say you wanted a method called

    public IEnumerable DeleteItems(some parameters…)

    And you want that method to delete some items, and return the list of IDs of the items deleted, using yield return, because there can be a LOT of items deleted.

    Now if you simply invoke that method but you’re not interested in the IDs of the items affected, the method will actually not execute at all…

    Is there a way to force execution but still be able to skip the unneeded execution of returning all the IDs affected if the caller is not interested in those?

  • http://codebetter.com/members/kylebaley/default.aspx Kyle Baley

    GlennS: You’re right, I would just return the list if it were hard-coded like that. But then, I wouldn’t hard-code it in the first place. Somewhere buried in these comments, there’s a more detailed explanation of the actual code. I hope.

  • GlennS

    I don’t understand why you are doing what you are doing with making an array to pass to a list constructor to then iterate through, then yield only to iterate through again?

    Why not:
    return new List{
    “Ozarks”,
    “Appalachia”,
    “Western Manitoba”
    }

    or just use multiple yield returns in the method:

    yield return “Ozarks”;
    yield return “Appalachia”;
    yield return “Western Manitoba”;

  • Jim Cooper

    @Kyle

    “The only reason I’m prolonging this debate is because you are arguing your point rationally which is a refreshing change from the arguments I’m getting from my actual team members.”

    :-)

    I find it sometimes helps to have an opposing POV. You either find it’s right, and change your mind, or the process of defending/explaining your position clarifies the situation for you.

  • http://codebetter.com/members/kylebaley/default.aspx Kyle Baley

    If I gave that impression then it’s because of a subconscious tendency to remain objective in comments. In fact, shortly after I posted this, the rules changed slightly which meant that I didn’t need to transform the results before returning them, so as has been mentioned here for this scenario, yield return is unnecessary. But had the requirement remained, I would not hesitate to use it. The only reason I’m prolonging this debate is because you are arguing your point rationally which is a refreshing change from the arguments I’m getting from my actual team members.

  • Jim Cooper

    @Kyle

    “I don’t think using yield is being too clever in this case”

    I don’t think it’s being too clever either, but I probably mean something different by that than you do :-)

    “To clarify my statement about the test, I meant that the condition I was testing was invalid. ”

    Sure, I understood that. It’s not the way I would look at it though.

    So, it looks like you’re still trying to decide what to do? :-)

  • http://shane.jscconsulting.ca Shane Courtrille

    If we think of your test as a form of documentation then you definitely want to change it to inform the user how this method acts.

    But at the same time I think there is definitely a concern that implementation details are leaking out because of the yield return usage. Then again Linq has the same problem and I am gleefully looking forward to using it so I’m not going to say yield return is the new Goto…

  • http://codebetter.com/members/kylebaley/default.aspx Kyle Baley

    I don’t think using yield is being too clever in this case. I’m using it exactly how it was meant to be used. What if your team wasn’t aware that you could use default on a switch statement and introducing it caused tests to fail even though it made things easier? It’s not a matter of being elegant, just using the right tool for the job.

    To clarify my statement about the test, I meant that the condition I was testing was invalid. When I stopped to think about it, this code *shouldn’t* fail every single time it’s called with bad parameters.

    But another way, the “call but not iterate” scenario *is* unlikely in practice, so why am I inducing that exact scenario in my test?

    Jay: It was on JP’s Nothin’ But .NET course that I was first introduced to yield.

  • Jim Cooper

    @Kyle

    “there’s a limit to the extent at which I dumb down my code to accommodate the lack of knowledge of the language”

    OK, but the converse argument is that there ought to be a limit on how “clever” you try to be :-)

    “There’s a reason the yield keyword is in the language and this seems like a textbook example of it”

    It isn’t necessarily a good reason. They left goto in as well, after all :-)

    But to me, it seems more a textbook example of when NOT to use it. It looks very much to me like you’re trying to force this technique into a situation where it isn’t warranted. That’s me looking at it with nothing invested in the code, so I have no decision to defend. But equally, I don’t know the full context.

    “The test says it should *always* fail when I call this method. But that’s not true.”

    Exactly! Therefore the problem is with your code :-) If this was testing any other method (that didn’t involve a choice about using yield), what would you be changing right now? The test, or the method?

    If it was me, I’d use the simpler solution. Yield is not noticeably more elegant, IMO, and your “call but not iterate” scenario seems unlikely (and probably a result of poor coding).

    I’m also thinking that if it has one weird side effect, there are probably others you haven’t found yet.

    Has disagreeing with me crystallised your thinking at all? :-)

  • http://www.twitter.com/jittery Jay

    If you use JPBoodhoo’s developwithpassion.bdd.dll at all in your test fixtures, there’s a handy force_traversal() extension method in there to be called on deferred IEnumerable()s.

  • http://codebetter.com/members/kylebaley/default.aspx Kyle Baley

    Jim:

    I could do the same survey at the place I’m at now and will probably get the results you’re expecting. But then, I’d get those results for a lot of things, including var, nullable types, and delegates. This sounds harsher than I mean it to be but there’s a limit to the extent at which I dumb down my code to accommodate the lack of knowledge of the language. There’s a reason the yield keyword is in the language and this seems like a textbook example of it. Maybe it’s not as ubiquitous as others but my hope would be that someone would look at this and say, “I wonder what that does” and go look it up, not “this is strange and foreign to me and I want to kill it.”

    I maintain that the problem *is* with the test in this case. The test says it should *always* fail when I call this method. But that’s not true. I want it to fail only when someone iterates over the collection that returns from the method. If someone calls the method and doesn’t iterate, that may be useless code but I shouldn’t throw an exception in that case. Hence, I should modify the test to handle the real-life scenario in which this code will be exercised.

  • Jim Cooper

    Also, while there’s only a small sample of yield-using code I could find, the explicit list creating equivalents look like they would generally be easier to refactor (eg James’s use of LINQ probably wouldn’t suggest itself so readily if you were looking at the yield version).

  • Jim Cooper

    @Kyle

    “To me, having to create a new list seems messier than using yield.”

    Well, creating the new list makes it absolutely clear what’s going on, and when it’s going on, without there being the peculiar side effects that your test threw up.

    “Whether or not it’s obscure seems rather subjective.”

    Well, I can do a survey at work tomorrow if you like :-) But surely you would agree that using yield is a less well known technique than creating a list explicitly?

    If the only advantage – in this instance, anyway – of using yield is that you think it’s a cleaner style (and you’re not going to be in the majority there, I don’t think), then I believe the disadvantages outweigh that.

    If I was reviewing your code, I’d treat the use of yield a a code smell. Especially if I saw that unit test.

    That’s still the killer argument to get rid of yield, to my mind. There was nothing wrong with your test, but your code didn’t pass it. Therefore the code should change.

    Fudging the test is cheating :-)

  • James

    By the way Kyle, for the sample code you posted, using a LINQ will accomplish the same in one line of code:

    return from item in _repository.GetStuff() select transformItem(item);

  • James

    Well, the whole point of yield is to facilitate lazy evaluation. It’s also handy for creating sequences from multiple sources but presented to the caller as one.

    One drawback is that each iteration over the sequence will run all the code leading up to and following the yield statements again, which may be a performance hit compared to using a static data structure like a list or an array.

    Use it if it suits your problem, don’t if it doesn’t?

  • http://codebetter.com/members/kylebaley/default.aspx Kyle Baley

    Jim: My apologies then. The fault is with the trimmed down example I’ve provided. If this were indeed production-ready code, you are correct. The code is too complicated. My intent was to focus on the delayed execution nature of yield.

    The actual code, which I didn’t describe so as not to cloud the issue, is a little more involved. It’s not just creating a list and throwing it back to the caller. It’s doing some transformations first. So it’s more like: get a list of items from a repository and return a list of transformed items to the caller.

    The alternative, without yield is:

    var retrievedItems = _repository.GetStuff();
    var newList = new List();
    foreach ( var item in retrievedItems )
    {
    newList.Add( transformItem( item ) );
    }
    return newList;

    To me, having to create a new list seems messier than using yield. Whether or not it’s obscure seems rather subjective.

    Edvard: Hope that answers your rhetorical question as well (http://kyle.baley.org/AndYouOpenedYourMouthhellipwhyOrLdquoHowToCommentForTheGreaterGoodrdquo.aspx)

  • Jim Cooper

    @Kyle

    Nup, it’s not a joke. Maybe you’ve taken away the context which would suggest why yield would be a good idea. In what you’ve given it looks a poor choice.

    This remark of yours is why:

    “I don’t like the idea of modifying my code for what appears to be a faulty test.”

    I think you have that exactly the wrong way round. Your unit tests are examples of how your code under test will be called. In this case you have to add cruft to a test to make it pass. This implies that it is possible there will be circumstances in your real code that would also need to add cruft to make it work.

    Why do that?

    Write simpler code for both the method and its tests.

    The basic scheme with TDD is to write code to pass the test. That way around. Your (quite reasonable) test failed, therefore your code needs changing.

    What I would suggest you have done is given a good reason NOT to use yield.

  • Edvard Pitka

    Why are you using “yield” here is beyond me. Why don’t you just return the List that you already have?

  • http://codebetter.com/members/kylebaley/default.aspx Kyle Baley

    Jim: Before I make a fool of myself, I just want to clarify: that’s a joke, right?

  • Jim Cooper

    Pardon me for saying so, but your code looks really stupid. You want to return something enumerable, so what you do is create that very thing, then iterate through it calling a weird keyword that most people won’t understand to create yet another iterator that gets called at difficult to determine times.

    Just return the damn list. Both the test and your code will be easier to understand.

  • jason meckley

    @Ged, you could get the enumeration, but not bind the results. like loading a form, but binding the results during another event. if the other event is not called, the enumeration is never iterated.
    IEnumerable strings;
    void Load()
    {
    strings = GetStrings();
    }

    void UseStrings()
    {
    foreach(var str in strings)
    {
    console.writeline(str);
    }
    }

    why you would want to do this… not sure, but possible.

    a populate extension method I see is
    public static EnumerableExtensions
    {
    public static void ForceTraversal(this IEnumerable items)
    {
    items.Count();
    }
    }

    this makes the test more explicit
    [TestMethod]
    [ExpectedException( typeof( ArgumentException ) )]
    public void Should_throw_exception_if_search_term_is_not_provided()
    {
    var sut = new VacationDestinationService();
    sut.FindDestinationsByRegion(string.Empty).ForceTraveral();
    }

  • Justice

    public IEnumerable
    GetResults(string arg) {
    if(string.IsNullOrEmpty(arg)) throw new ArgumentException(“Pass an arg, please.”);
    return GetResultsInternal(arg);
    }

    private IEnumerable GetResultsInternal(string arg) {
    if(arg1.Length > 3) yield return _result1;
    else yield return _result2;
    }

  • Justin

    I’d suggest much the same as Nik above, which would be to extract the code for the list creation and yield to another method/property, perhaps NorthAmericanDestinations. The guard clause would then be scoped to the retrieval of an enumeration instead of the evaluation of the enumeration. The method above feels like unexpected behavior if I were a consumer.

    I’ve seen many of the developers I work with cross themselves up with delayed evaluation working with Linq (particularly Linq 2 Sql) and methods that are returning IQueryable. With generous usage of the Linq extension methods a large portion of an application can be constructing elaborate enumerables, leaving developers at a loss for why their breakpoints aren’t triggering where they expect.

  • Chuck
  • Matt

    Agreed, your test is wrong. I had to discover all the hard way, too, except I found most docs hard to parse to the essentials.

    The part you’re missing:
    When you use “yield return”, the ENTIRE function gets converted into a state object behind the scenes.

    Your function gets executed as follows:
    1. Retrieving the IEnumerable -> nothing gets executed. Not one line.
    2. IEnumerable.GetEnumerator -> still nothing gets executed.
    3. First call to Enumerator.MoveNext -> All the code from the top of the function to the first yield executes.
    4. Second call to MoveNext – Execution continues from previous yield hit to next yield hit.
    5. Repeat @ 3.

    6. Call IEnumerable.GetEnumerator again.
    7. Call MoveNext on new enumerator -> Function executes from the top all over again as per 3.

    It drove me crazy until I read MSDN, a bunch of blogs, and wrote some exporation code. I was re-getting my enumerator and wondering why my DB call was being made repeatedly.

  • http://www.peterritchie.com/blog Peter Ritchie
  • Nik Radfird

    The other option is to change your code, so that a private method creates the yield and returns the IEnumerable to the public method who returns it again.

    This way, in the public method you can check your conditions, it will fall over whether you iterate or not. At least that way, you know straight away that you would of got back an invalid iterator.

  • Ben Hyrman

    I was going to make a joke about Schrödinger’s Cat… but I won’t.

    So, basically, this feels fine to me. You’ll throw an error if someone passes you bad data and then tries to enumerate over the result. But if they just happen to wander off and never look at the data, then who cares.

    Still, that test does feel a bit funky now.

  • Ged

    I’m afraid your tests are wrong.
    The “yield” statement causes the compiler to create an iterator, and the iterator doesn’t get executed until you try and access the first element.

    So, you’ll have to add the Count(), or use the iterator in some way to get the result.

    More details can be found on MSDN
    http://msdn.microsoft.com/en-us/library/dscyy5s0.aspx

    Also, why would someone call the method and not be interested in the results? Aren’t they just writing redundant code?

    Ged