Ok, I have received about a dozen emails in regard to my last post An in depth look at for loops basically telling me that “I am wrong because Richter says so”. These emails have used two points of reference by Jeffrey Richter. The first is the CLR internals presentation available as a web cast in particular around minute 12 during the foreach discussion where some points are made about for and foreach. The second reference I have been getting a lot of is the pages around 305 in CLR via C# 2.
To Jeffrey Richter if you happen to read this, let me start by saying I have the highest respect for you, I have thoroughly enjoyed nearly everything you have ever written; it is not only factual but a pleasure to read. I would personally not know most of the information I do today if it were not for books like yours. The most beneficial parts of your content are not these hidden gems but the teaching of the process to obtain them yourself. You taught me how to fish
That said let me address some of these. I will start with CLR via C#.
“The second thing to notice about the code above is that the JIT compiler knows that the for loop is accessing array elements 0 through Length – 1. So the JIT compiler produces code that, at runtime, tests that all array accesses will be within the array’s valid range. Specifically the JIT compiler produces code to check if ((Length – 1) <= a.GetUpperBound(0)). This check occurs just before the loop. If the check is good, the JIT compiler will not generate code inside the loop to verify that each access is within the valid range. This allows array access within loops to be very fast.” Page 306 paragraph 2 CLR via C#
I fear that Jeffrey Richter has been caught by the same confusion as Brad Abrams. The code in question is
static int RichtersCode() { Int32[] a = new Int32[5]; int total = 0; for(Int32 index=0;index<a.Length;index++) { total += a[index]; } return total; } |
static int RichtersCode() { Int32[] a = new Int32[5]; 00000000 push esi 00000001 mov edx,5 00000006 mov ecx,7915982Ah 0000000b call FFB21F70 00000010 mov ecx,eax int total = 0; 00000012 xor esi,esi for(Int32 index=0;index<a.Length;index++) { 00000014 xor edx,edx 00000016 mov eax,dword ptr [ecx+4] 00000019 test eax,eax 0000001b jle 00000028 total += a[index]; 0000001d add esi,dword ptr [ecx+edx*4+8] for(Int32 index=0;index<a.Length;index++) { 00000021 add edx,1 00000024 cmp eax,edx 00000026 jg 0000001D } return total; 00000028 mov eax,esi 0000002a pop esi 0000002b ret |
Listing 1: Jeffrey Richter’s code |
By looking at this we should quickly see that it is in fact not calling GetUpperBound(0) prior to the loop. It has inlined the Length property and it never does any comparison. The JIT is realizing that since the array was created locally (on the stack??? and it knows the length that it was created with) that there is no need to make any comparisons. This cannot possibly throw an exception as written in native code. This optimization is in fact much better than the bounds hoisting we were hoping for, I believe this is being done for Constrained Execution Regions but I had already planned on discussing this theory in more depth for another post, you’ll just have to wait J
We can see that this is the same “confusing” optimization we looked at in the article because if we create the array outside of the function the optimization will not occur as can be seen here.
static Int32[] b = new Int32[5]; static int RichtersCode() { int total = 0; for(Int32 index=0;index<b.Length;index++) { total += b[index]; } return total; } |
static int RichtersCode() { int total = 0; 00000000 push edi 00000001 push esi 00000002 xor ecx,ecx for(Int32 index=0;index<b.Length;index++) { 00000004 xor edx,edx 00000006 mov esi,dword ptr ds:[022B1EC8h] 0000000c mov eax,dword ptr [esi+4] 0000000f test eax,eax 00000011 jle 00000025 00000013 mov edi,dword ptr [esi+4] total += b[index]; 00000016 cmp edx,edi 00000018 jae 0000002A 0000001a add ecx,dword ptr [esi+edx*4+8] for(Int32 index=0;index<b.Length;index++) { 0000001e add edx,1 00000021 cmp eax,edx 00000023 jg 00000016 } return total; 00000025 mov eax,ecx 00000027 pop esi 00000028 pop edi 00000029 ret 0000002a call 792B44A9 0000002f int 3 |
Listing 2: Jeffrey Richter’s code modified to use external array |
The specific lines that show the bounds are still there are right before the assignment (in the blue area)
00000016 cmp edx,edi
00000018 jae 0000002A
This is comparing the current counter against edx (which is where we have our length). It will then jump to throw the exception.
I feel I have done a fairly good job illustrating that there is in fact no array bounds hoisting occurring; we are looking at the second optimization discussed in the article. My guess is that we have errata in the book, a late change for CER, or a JIT bug in that it does not hoist at other times (I actually think it may be a combination of the three J).
The second bit that I have been receiving emails about deals with foreach vs for and comments that were made in the CLR internals presentation (which let me add is an amazing presentation and I highly recommend it). There are actually two bits in the presentation that have been brought up, the first is that foreach on a string (or other array operation) is much slower than using a for loop. The second is that for and an index is a better usage overall than a foreach.
For the first case we have to remember that foreach does in fact have special handling in both VB.NET and C# for dealing with arrays. They will in fact generate identical code to a normal for (but strings are a special special case). Let’s take a look at the code in question.
static void TestForeach() { string test = “thisisatestingstring”; foreach (char c in test) { Console.WriteLine(c); } }
static void TestForLoop() { string test = “thisisatestingstring”; for(int i=0;i<test.Length;i++) { Console.WriteLine(test[i]); } }
|
Listing 3: Two methods to iterate through strings |
When we disassemble this to IL we receive the following
|
Listing 4: Foreach over string IL |
| ||
Listing 5: for loop IL | ||
They are in fact nearly identical. There is no performance gain to be reached here by using for. It is very important to remember that foreach has dual definitions. Admittedly I always say that looking at IL is bad so here is the disassembled native code.
static void TestForeach() { string test = “thisisatestingstring”; 00000000 push edi 00000001 push esi 00000002 mov esi,dword ptr ds:[02314AF8h] foreach (char c in test) { 00000008 xor edx,edx 0000000a mov edi,dword ptr [esi+8] 0000000d test edi,edi 0000000f jle 00000024 00000011 movzx ecx,word ptr [esi+edx*2+0Ch] dummy2 = c; 00000016 mov word ptr ds:[00912FE8h],cx 0000001d add edx,1 foreach (char c in test) { 00000020 cmp edi,edx 00000022 jg 00000011 00000024 pop esi } } 00000025 pop edi 00000026 ret
static void TestForLoop() { string test = “thisisatestingstring”; 00000000 push edi 00000001 push esi 00000002 mov esi,dword ptr ds:[02314AF8h] for(int i=0;i<test.Length;i++) { 00000008 xor ecx,ecx 0000000a mov edi,dword ptr [esi+8] 0000000d test edi,edi 0000000f jle 00000024 dummy2 = test[i]; 00000011 movzx eax,word ptr [esi+ecx*2+0Ch] 00000016 mov word ptr ds:[00912FE8h],ax for(int i=0;i<test.Length;i++) { 0000001d add ecx,1 00000020 cmp edi,ecx 00000022 jg 00000011 00000024 pop esi } } 00000025 pop edi 00000026 ret |
Listing 6: Disassembled code |
The second bit from the presentation that people have emailed me about is some comments made about for being better than foreach in general. I believe many people are taking these comments out of context as Jeffrey does in fact qualify these comments in the context of collections, where they do hold true as the enumerator simply does an indexed lookup on the collection. I gave what I thought was a good example of where it would not be faster, a linked list, in my original post that I will go into a bit more detail with.
Let’s consider for a minute how a linked list works. I store in my class a pointer (reference) to the first item in my list, if you want the 5th item I will iterate through my list until I hit the 5th node and return it to you, in this process I touch five nodes of the list. The same holds true for nine, thirty four, or seven thousand; we can therefore say that an indexed lookup into our linked list is O(n) (Big O n), we can actually make a stronger statements but this will suffice. If we were to create an enumerator for our linked list, its MoveNext() function would simply return CurrentNode.Next this operation occurs in O(1). We can therefore say that our n O(1) operations will be much faster than n O(n) operations .. we could in fact even summarize this by saying that the enumerator will produce n touches of nodes where as the for with indexes will produce ∑n touches of nodes. This is not that big of a deal on small lists but on a big list (say 100 items) the difference would be 100 touches with the enumerator vs 4950 touches with the index.
This holds true with other objects as well although not usually to this scale, a tree for instance would be much heavier to hit with indexes as opposed to an enumeration. Most items such as this will not offer an index based lookup so it’s not that big of a deal ..
Hope this clears up any confusion.
Fascinating Mr Spock