OK I had left this out of my previous post an in depth look at for loops as I considered the behavior to be a JIT bug. I discussed this with some very smart people (including quite a few at Microsoft) and they agreed but my problem report was marked with “by design” so now I should explain what happens and the hoops you have to jump through in order to work around the issue.
We have discussed before the concept of bounds check removal. Since we know that the array in question has a length l, and we are storing the reference to that array in a register we know that the length cannot possibly change. We know that the length cannot possibly change because the size of an array object is immutable, we can change the data within the array but not the actual size. There is some discussion to be had on this point and future implementations but I will leave that for another discussion.
Let’s look at some C# code that illustrates this point. I am going to discuss VB.NET and other languages later as it applies to them as well (even more so than C#).
|
class Program {
static int[] foo = new int[10000];
static int WillNotRemoveChecksStaticReference() {
int total = 0;
for (int i = 0; i < foo.Length; i++) {
total |= foo[ i ];
}
return total;
}
static int WillRemoveChecksLocal() {
int total = 0;
int[] bar = new int[10000];
for (int i = 0; i < bar.Length; i++) {
total |= bar[ i ];
}
return total;
}
static int WillRemoveChecksPassedIn(int[] bar) {
int total = 0;
for (int i = 0; i < bar.Length; i++) {
total |= bar[ i ];
}
return total;
}
static void Main(string[] args) {
WillNotRemoveChecksStaticReference();
WillRemoveChecksForStaticReference();
WillRemoveChecksLocal();
WillRemoveChecksPassedIn(foo);
}
} |
|
Listing 1: C# Code to explore |
The problem is easily seen when we look through the WillNotRemoveChecksStaticReference method. Here is the disassembly of optimized code for that method. I have marked the code using the same standard as in the last post (green is preamble, blue is the code in the loop, yellow is the code to increment the counter and see if the loop should be ended).
|
static int WillNotRemoveChecksStaticReference() {
int total = 0;
00000000 push edi
00000001 push esi
00000002 xor ecx,ecx
for (int i = 0; i < foo.Length; i++) {
00000004 xor edx,edx
00000006 mov esi,dword ptr ds:[02275A44h]
0000000c mov eax,dword ptr [esi+4]
0000000f test eax,eax
00000011 jle 00000025
00000013 mov edi,dword ptr [esi+4]
total |= foo[ i ];
00000016 cmp edx,edi
00000018 jae 0000002A
0000001a or ecx,dword ptr [esi+edx*4+8]
for (int i = 0; i < foo.Length; i++) {
0000001e add edx,1
00000021 cmp eax,edx
00000023 jg 00000016
}
return total;
00000025 mov eax,ecx
00000027 pop esi
00000028 pop edi
00000029 ret
0000002a call 79443579
0000002f int 3 |
|
Listing 2: Disassembly of WillNotRemoveChecksStaticReference |
We can quickly see here that the bounds check is in fact not being removed as it would be if the array were either a local or passed in. This means that there is a compare and a branch added to every iteration of the loop. In the previous article we learned how expensive this can be.
If you go to my problem report you will see that someone named TAG has commented on it that this is impossible to have happen (or would require huge amounts of complex logic). On first glance I had suspected something similar; my guess (and the guess of nearly everyone I have talked to about this) was that it was due to volatility. Consider for a moment that there was another thread executing that changed foo to now be a smaller array, this could feasibly make our code write out into never-never land if it did not check every iteration of the loop as the size of the array could change .. right?
Then again maybe not!
The operation would in fact always be safe to remove the bounds check from so long as we are storing our pointer to the array in a register. I am fairly certain the JIT knows if it stored the reference in a register (it did apply this optimization so it is not a huge jump in logic to think that it knows that it did this). Since we are storing our pointer to the array in a register VOLATILITY DOES NOT APPLY (sorry for bold caps but this is an important point). Any change to the original reference will not affect our loop; our register will still point to the old “safe” copy of the array that we originally read the length from.
The reason I imagine that the JIT does not do this optimization automatically is the condition that exists if we cannot put the pointer to the array into a register. In this case the operation is no longer safe as we will see the change to the array if it is made from another thread.
In order to help prove the point (and show the workaround) we can in fact force the JIT to do exactly what we want by using the following C# code.
|
static int WillRemoveChecksForStaticReference() {
int total = 0;
int[] bar = foo;
for (int i = 0; i < bar.Length; i++) {
total |= bar[ i ];
}
return total;
} |
|
Listing 3: C# code that does remove bounds checking |
Before continuing, would you delete the seemingly useless code this contains in a single threaded app (or any app)? It does however make a very subtle difference in our disassembled code.
|
static int WillRemoveChecksForStaticReference() {
int total = 0;
00000000 push esi
00000001 xor ecx,ecx
int[] bar = foo;
00000003 mov esi,dword ptr ds:[02275A44h]
for (int i = 0; i < bar.Length; i++) {
00000009 xor edx,edx
0000000b mov eax,dword ptr [esi+4]
0000000e test eax,eax
00000010 jle 0000001D
total |= bar[ i ];
00000012 or ecx,dword ptr [esi+edx*4+8]
for (int i = 0; i < bar.Length; i++) {
00000016 add edx,1
00000019 cmp eax,edx
0000001b jg 00000012
}
return total;
0000001d mov eax,ecx
0000001f pop esi
00000020 ret |
|
Listing 4: Disassembly of WillRemoveChecksForStaticReference |
Although the code itself is quite different, you will notice that the disassembly is almost identical. The only differences deal with the bounds removal and the placement of a single line of assembly.
In the first bit of code we had in the loop initialization
00000006 mov esi,dword ptr ds:[02275A44h]
In the second bit of code we had
00000003 mov esi,dword ptr ds:[02275A44h]
Just before the loop started, the JIT could very easily have done the bounds removal optimization for us as opposed to making us write the very non-intuitive code setting the static to a local first. The only situation where the two differ in behavior is if our array is not being moved into a register (esi in this case). In fact the JIT in the first case is essentially treating our array as if it "partially volatile". If we needs this type of behavior we would probably mark our array volatile ourselves.
So after all of this I will create another rule for optimizing loops.
Never loop on an array that is or is contained in a field or static, always copy the reference to a local first to be explicit about the way that you want to handle changes to the original array on another thread.
Hopefully this will change in a future version of the JIT (though I wonder now since my issue was marked by design). It is still a good best practice to do this as you are being explicit in that you want to loop through every item of the array you started with, and that you are not interested in any changes to that array made by another thread during that process.
Although it is a good best practice to always use a local reference in order to be explicit I still think that the JIT should handle this for you in the case that it is optimizing to a register. I think I have also shown fairly well that this optimization is not impossible but would in fact be relatively simple (simple branch on whether or not the array reference has been optimized to a register).