That's a good question, you are correct that I forgot to mention this, I will add a section to add this. This will have 2 distinct behaviors (one when inlined one when not) With inlining it is smart enough to realize what you are doing (and to undo it) for (int i = 0, j = length.Length; i < j; i++) { 00000014 xor edx,edx 00000016 mov eax,dword ptr [ecx+4] 00000019 test eax,eax 0000001b jle 00000026 total |= i; 0000001d or esi,edx for (int i = 0, j = length.Length; i < j; i++) { 0000001f add edx,1 00000022 cmp edx,eax 00000024 jl 0000001D } This is identical to the code produced but i<length.Length Without inlining it will put it into the preamble of the loop producing functional equivalent results to the other hoisted examples (although slightly different orderring and obviously the variable maintains a better scope) for (int i = 0, j = length.GetUpperBound(0); i < j; i++) { 00000013 xor esi,esi 00000015 mov ecx,eax 00000017 xor edx,edx 00000019 cmp dword ptr [ecx],ecx 0000001b call 792666A8 00000020 test eax,eax 00000022 jle 0000002D total |= i; 00000024 or edi,esi for (int i = 0, j = length.GetUpperBound(0); i < j; i++) { 00000026 add esi,1 00000029 cmp esi,eax 0000002b jl 00000024 }
The key thing to notice is that the JIT still does not realize what we are doing with array bounds hoists ...
for (int i = 0, j = length.GetUpperBound(0); i < j; i++) {00000016 xor esi,esi 00000018 mov ecx,edi 0000001a xor edx,edx 0000001c cmp dword ptr [ecx],ecx 0000001e call 792664C8 00000023 test eax,eax 00000025 jle 00000039 00000027 mov edx,dword ptr [edi+4] total |= length;0000002a cmp esi,edx 0000002c jae 0000003F 0000002e or ebx,dword ptr [edi+esi*4+8] for (int i = 0, j = length.GetUpperBound(0); i < j; i++) {00000032 add esi,1 00000035 cmp esi,eax 00000037 jl 0000002A }
Good catch! Cheers, Greg
hmmm ... Test : JohnsReverse took 42553687187.3528 ns, average ns = 425536.871873528 Test : GregsInt32Reverse took 33369146730.09 ns, average ns = 333691.4673009 Press any key to continue . . . in release with JIT optimizations ... 80k string size ... maybe I have bigger ratio of processor / memory speed?
but you are right .. the shifts are expensive ... using rotl/rotr on the register would remove this but I can't for the life of me get the JIT to produce that code :)
OK. I apologize for my quick response a few hours ago which I am forced to change do to some things I didn't think of (namely abbreviations). Some people have brought up some really fun