Sponsored By Aspose - File Format APIs for .NET

Aspose are the market leader of .NET APIs for file business formats – natively work with DOCX, XLSX, PPT, PDF, MSG, MPP, images formats and many more!

For loop follow up (Disagreeing with Richter?!)

Ok, I have received about a dozen emails in regard to my last post An in depth look at for loops basically telling me that “I am wrong because Richter says so”.  These emails have used two points of reference by Jeffrey Richter. The first is the CLR internals presentation available as a web cast in particular around minute 12 during the foreach discussion where some points are made about for and foreach. The second reference I have been getting a lot of is the pages around 305 in CLR via C# 2.


 


To Jeffrey Richter if you happen to read this, let me start by saying I have the highest respect for you, I have thoroughly enjoyed nearly everything you have ever written; it is not only factual but a pleasure to read. I would personally not know most of the information I do today if it were not for books like yours. The most beneficial parts of your content are not these hidden gems but the teaching of the process to obtain them yourself. You taught me how to fish :)


 


That said let me address some of these. I will start with CLR via C#.


“The second thing to notice about the code above is that the JIT compiler knows that the for loop is accessing array elements 0 through Length – 1. So the JIT compiler produces code that, at runtime, tests that all array accesses will be within the array’s valid range. Specifically the JIT compiler produces code to check if ((Length – 1) <= a.GetUpperBound(0)). This check occurs just before the loop. If the check is good, the JIT compiler will not generate code inside the loop to verify that each access is within the valid range. This allows array access within loops to be very fast.” Page 306 paragraph 2 CLR via C#


 


I fear that Jeffrey Richter has been caught by the same confusion as Brad Abrams. The code in question is










        static int RichtersCode() {


            Int32[] a = new Int32[5];


            int total = 0;


            for(Int32 index=0;index<a.Length;index++) {


                total += a[index];


            }


            return total;


        }


        static int RichtersCode() {


            Int32[] a = new Int32[5];


00000000  push        esi 


00000001  mov         edx,5


00000006  mov         ecx,7915982Ah


0000000b  call        FFB21F70


00000010  mov         ecx,eax


            int total = 0;


00000012  xor         esi,esi


            for(Int32 index=0;index<a.Length;index++) {


00000014  xor         edx,edx


00000016  mov         eax,dword ptr [ecx+4]


00000019  test        eax,eax


0000001b  jle         00000028


                total += a[index];


0000001d  add         esi,dword ptr [ecx+edx*4+8]


            for(Int32 index=0;index<a.Length;index++) {


00000021  add         edx,1


00000024  cmp         eax,edx


00000026  jg          0000001D


            }


            return total;


00000028  mov         eax,esi


0000002a  pop         esi 


0000002b  ret             


Listing 1: Jeffrey Richter’s code


 


By looking at this we should quickly see that it is in fact not calling GetUpperBound(0) prior to the loop. It has inlined the Length property and it never does any comparison. The JIT is realizing that since the array was created locally (on the stack??? and it knows the length that it was created with) that there is no need to make any comparisons. This cannot possibly throw an exception as written in native code. This optimization is in fact much better than the bounds hoisting we were hoping for, I believe this is being done for Constrained Execution Regions but I had already planned on discussing this theory in more depth for another post, you’ll just have to wait J


We can see that this is the same “confusing” optimization we looked at in the article because if we create the array outside of the function the optimization will not occur as can be seen here.










        static Int32[] b = new Int32[5];


        static int RichtersCode() {


            int total = 0;


            for(Int32 index=0;index<b.Length;index++) {


                total += b[index];


            }


            return total;


        }


        static int RichtersCode() {


            int total = 0;


00000000  push        edi 


00000001  push        esi 


00000002  xor         ecx,ecx


            for(Int32 index=0;index<b.Length;index++) {


00000004  xor         edx,edx


00000006  mov         esi,dword ptr ds:[022B1EC8h]


0000000c  mov         eax,dword ptr [esi+4]


0000000f  test        eax,eax


00000011  jle         00000025


00000013  mov         edi,dword ptr [esi+4]


                total += b[index];


00000016  cmp         edx,edi


00000018  jae         0000002A


0000001a  add         ecx,dword ptr [esi+edx*4+8]


            for(Int32 index=0;index<b.Length;index++) {


0000001e  add         edx,1


00000021  cmp         eax,edx


00000023  jg          00000016


            }


            return total;


00000025  mov         eax,ecx


00000027  pop         esi 


00000028  pop         edi 


00000029  ret             


0000002a  call        792B44A9


0000002f  int         3   


Listing 2: Jeffrey Richter’s code modified to use external array


 


The specific lines that show the bounds are still there are right before the assignment (in the blue area)


00000016  cmp         edx,edi


00000018  jae         0000002A


 


This is comparing the current counter against edx (which is where we have our length). It will then jump to throw the exception.


I feel I have done a fairly good job illustrating that there is in fact no array bounds hoisting occurring; we are looking at the second optimization discussed in the article. My guess is that we have errata in the book, a late change for CER, or a JIT bug in that it does not hoist at other times (I actually think it may be a combination of the three J).


 


The second bit that I have been receiving emails about deals with foreach vs for and comments that were made in the CLR internals presentation (which let me add is an amazing presentation and I highly recommend it).  There are actually two bits in the presentation that have been brought up, the first is that foreach on a string (or other array operation) is much slower than using a for loop. The second is that for and an index is a better usage overall than a foreach.


For the first case we have to remember that foreach does in fact have special handling in both VB.NET and C# for dealing with arrays. They will in fact generate identical code to a normal for (but strings are a special special case). Let’s take a look at the code in question.








        static void TestForeach() {


            string test = “thisisatestingstring”;


            foreach (char c in test) {


                Console.WriteLine(c);


            }


        }


 


        static void TestForLoop() {


            string test = “thisisatestingstring”;


            for(int i=0;i<test.Length;i++) {


                Console.WriteLine(test[i]);


            }


        }


 


Listing 3: Two methods to iterate through strings


 


When we disassemble this to IL we receive the following







.method private hidebysig static void TestForeach() cil managed
{
.maxstack 2
.locals init (
[0] string text1,
[1] char ch1,
[2] string text2,
[3] int32 num1)
L_0000: ldstr “thisisatestingstring”
L_0005: stloc.0
L_0006: ldloc.0
L_0007: stloc.2
L_0008: ldc.i4.0
L_0009: stloc.3
L_000a: br.s L_001e
L_000c: ldloc.2
L_000d: ldloc.3
L_000e: callvirt instance char string::get_Chars(int32)
L_0013: stloc.1
L_0014: ldloc.1
L_0015: stsfld char PerformanceTestHarness.Program::dummy2
L_001a: ldloc.3
L_001b: ldc.i4.1
L_001c: add
L_001d: stloc.3
L_001e: ldloc.3
L_001f: ldloc.2
L_0020: callvirt instance int32 string::get_Length()
L_0025: blt.s L_000c
L_0027: ret
}

Listing 4: Foreach over string IL


 







.method private hidebysig static void TestForLoop() cil managed
{
.maxstack 2
.locals init (
[0] string text1,
[1] int32 num1)
L_0000: ldstr “thisisatestingstring”
L_0005: stloc.0
L_0006: ldc.i4.0
L_0007: stloc.1
L_0008: br.s L_001a
L_000a: ldloc.0
L_000b: ldloc.1
L_000c: callvirt instance char string::get_Chars(int32)
L_0011: stsfld char PerformanceTestHarness.Program::dummy2
L_0016: ldloc.1
L_0017: ldc.i4.1
L_0018: add
L_0019: stloc.1
L_001a: ldloc.1
L_001b: ldloc.0
L_001c: callvirt instance int32 string::get_Length()
L_0021: blt.s L_000a
L_0023: ret
}

Listing 5: for loop IL


 


They are in fact nearly identical. There is no performance gain to be reached here by using for. It is very important to remember that foreach has dual definitions. Admittedly I always say that looking at IL is bad so here is the disassembled native code.









        static void TestForeach() {


            string test = “thisisatestingstring”;


00000000  push        edi 


00000001  push        esi 


00000002  mov         esi,dword ptr ds:[02314AF8h]


            foreach (char c in test) {


00000008  xor         edx,edx


0000000a  mov         edi,dword ptr [esi+8]


0000000d  test        edi,edi


0000000f  jle         00000024


00000011  movzx       ecx,word ptr [esi+edx*2+0Ch]


                dummy2 = c;


00000016  mov         word ptr ds:[00912FE8h],cx


0000001d  add         edx,1


            foreach (char c in test) {


00000020  cmp         edi,edx


00000022  jg          00000011


00000024  pop         esi 


            }


        }


00000025  pop         edi 


00000026  ret           


  


        static void TestForLoop() {


            string test = “thisisatestingstring”;


00000000  push        edi 


00000001  push        esi 


00000002  mov         esi,dword ptr ds:[02314AF8h]


            for(int i=0;i<test.Length;i++) {


00000008  xor         ecx,ecx


0000000a  mov         edi,dword ptr [esi+8]


0000000d  test        edi,edi


0000000f  jle         00000024


                dummy2 = test[i];


00000011  movzx       eax,word ptr [esi+ecx*2+0Ch]


00000016  mov         word ptr ds:[00912FE8h],ax


            for(int i=0;i<test.Length;i++) {


0000001d  add         ecx,1


00000020  cmp         edi,ecx


00000022  jg          00000011


00000024  pop         esi 


            }


        }


00000025  pop         edi 


00000026  ret             


Listing 6: Disassembled code


 


 


The second bit from the presentation that people have emailed me about is some comments made about for being better than foreach in general. I believe many people are taking these comments out of context as Jeffrey does in fact qualify these comments in the context of collections, where they do hold true as the enumerator simply does an indexed lookup on the collection. I gave what I thought was a good example of where it would not be faster, a linked list, in my original post that I will go into a bit more detail with.


Let’s consider for a minute how a linked list works. I store in my class a pointer (reference) to the first item in my list, if you want the 5th item I will iterate through my list until I hit the 5th node and return it to you, in this process I touch five nodes of the list. The same holds true for nine, thirty four, or seven thousand; we can therefore say that an indexed lookup into our linked list is O(n) (Big O n), we can actually make a stronger statements but this will suffice. If we were to create an enumerator for our linked list, its MoveNext()  function would simply return CurrentNode.Next this operation occurs in O(1). We can therefore say that our n O(1) operations will be much faster than n O(n) operations .. we could in fact even summarize this by saying that the enumerator will produce n touches of nodes where as the for with indexes will produce n touches of nodes. This is not that big of a deal on small lists but on a big list (say 100 items) the difference would be 100 touches with the enumerator vs 4950 touches with the index.


This holds true with other objects as well although not usually to this scale, a tree for instance would be much heavier to hit with indexes as opposed to an enumeration. Most items such as this will not offer an index based lookup so it’s not that big of a deal ..


 


Hope this clears up any confusion.

This entry was posted in Under The Covers. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

One Response to For loop follow up (Disagreeing with Richter?!)

  1. Regula Guy says:

    Fascinating Mr Spock

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>