CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Greg Young [MVP]

June 2006 - Posts

  • Row and Cell Level Security In SQL Server

    I am back after my brief travelling I have some good posts sitting in word just about ready to go and they should be posted over the next few days .. For those who are interested I have been in Phoenix, Las Vegas, Washington DC, Vancouver, Toronto, Providence, and Connecticut over the last ten days or so. I also caught a 30 lb striper and about a dozen blues in the sound which really made my week :)

    I came accross a link this morning that shows how to do something I have been wanting to do for a long time in SQL Server. Although this is not "true" row/cell level security it is still an interesting read.

    http://www.microsoft.com/technet/prodtechnol/sql/2005/multisec.mspx

     

  • For loop follow up (Disagreeing with Richter?!)

    Ok, I have received about a dozen emails in regard to my last post An in depth look at for loops basically telling me that “I am wrong because Richter says so”.  These emails have used two points of reference by Jeffrey Richter. The first is the CLR internals presentation available as a web cast in particular around minute 12 during the foreach discussion where some points are made about for and foreach. The second reference I have been getting a lot of is the pages around 305 in CLR via C# 2.

     

    To Jeffrey Richter if you happen to read this, let me start by saying I have the highest respect for you, I have thoroughly enjoyed nearly everything you have ever written; it is not only factual but a pleasure to read. I would personally not know most of the information I do today if it were not for books like yours. The most beneficial parts of your content are not these hidden gems but the teaching of the process to obtain them yourself. You taught me how to fish :)

     

    That said let me address some of these. I will start with CLR via C#.

    "The second thing to notice about the code above is that the JIT compiler knows that the for loop is accessing array elements 0 through Length – 1. So the JIT compiler produces code that, at runtime, tests that all array accesses will be within the array’s valid range. Specifically the JIT compiler produces code to check if ((Length – 1) <= a.GetUpperBound(0)). This check occurs just before the loop. If the check is good, the JIT compiler will not generate code inside the loop to verify that each access is within the valid range. This allows array access within loops to be very fast." Page 306 paragraph 2 CLR via C#

     

    I fear that Jeffrey Richter has been caught by the same confusion as Brad Abrams. The code in question is

            static int RichtersCode() {

                Int32[] a = new Int32[5];

                int total = 0;

                for(Int32 index=0;index<a.Length;index++) {

                    total += a[index];

                }

                return total;

            }

            static int RichtersCode() {

                Int32[] a = new Int32[5];

    00000000  push        esi 

    00000001  mov         edx,5

    00000006  mov         ecx,7915982Ah

    0000000b  call        FFB21F70

    00000010  mov         ecx,eax

                int total = 0;

    00000012  xor         esi,esi

                for(Int32 index=0;index<a.Length;index++) {

    00000014  xor         edx,edx

    00000016  mov         eax,dword ptr [ecx+4]

    00000019  test        eax,eax

    0000001b  jle         00000028

                    total += a[index];

    0000001d  add         esi,dword ptr [ecx+edx*4+8]

                for(Int32 index=0;index<a.Length;index++) {

    00000021  add         edx,1

    00000024  cmp         eax,edx

    00000026  jg          0000001D

                }

                return total;

    00000028  mov         eax,esi

    0000002a  pop         esi 

    0000002b  ret             

    Listing 1: Jeffrey Richter’s code

     

    By looking at this we should quickly see that it is in fact not calling GetUpperBound(0) prior to the loop. It has inlined the Length property and it never does any comparison. The JIT is realizing that since the array was created locally (on the stack??? and it knows the length that it was created with) that there is no need to make any comparisons. This cannot possibly throw an exception as written in native code. This optimization is in fact much better than the bounds hoisting we were hoping for, I believe this is being done for Constrained Execution Regions but I had already planned on discussing this theory in more depth for another post, you’ll just have to wait J

    We can see that this is the same “confusing” optimization we looked at in the article because if we create the array outside of the function the optimization will not occur as can be seen here.

            static Int32[] b = new Int32[5];

            static int RichtersCode() {

                int total = 0;

                for(Int32 index=0;index<b.Length;index++) {

                    total += b[index];

                }

                return total;

            }

            static int RichtersCode() {

                int total = 0;

    00000000  push        edi 

    00000001  push        esi 

    00000002  xor         ecx,ecx

                for(Int32 index=0;index<b.Length;index++) {

    00000004  xor         edx,edx

    00000006  mov         esi,dword ptr ds:[022B1EC8h]

    0000000c  mov         eax,dword ptr [esi+4]

    0000000f  test        eax,eax

    00000011  jle         00000025

    00000013  mov         edi,dword ptr [esi+4]

                    total += b[index];

    00000016  cmp         edx,edi

    00000018  jae         0000002A

    0000001a  add         ecx,dword ptr [esi+edx*4+8]

                for(Int32 index=0;index<b.Length;index++) {

    0000001e  add         edx,1

    00000021  cmp         eax,edx

    00000023  jg          00000016

                }

                return total;

    00000025  mov         eax,ecx

    00000027  pop         esi 

    00000028  pop         edi 

    00000029  ret             

    0000002a  call        792B44A9

    0000002f  int         3   

    Listing 2: Jeffrey Richter’s code modified to use external array

     

    The specific lines that show the bounds are still there are right before the assignment (in the blue area)

    00000016  cmp         edx,edi

    00000018  jae         0000002A

     

    This is comparing the current counter against edx (which is where we have our length). It will then jump to throw the exception.

    I feel I have done a fairly good job illustrating that there is in fact no array bounds hoisting occurring; we are looking at the second optimization discussed in the article. My guess is that we have errata in the book, a late change for CER, or a JIT bug in that it does not hoist at other times (I actually think it may be a combination of the three J).

     

    The second bit that I have been receiving emails about deals with foreach vs for and comments that were made in the CLR internals presentation (which let me add is an amazing presentation and I highly recommend it).  There are actually two bits in the presentation that have been brought up, the first is that foreach on a string (or other array operation) is much slower than using a for loop. The second is that for and an index is a better usage overall than a foreach.

    For the first case we have to remember that foreach does in fact have special handling in both VB.NET and C# for dealing with arrays. They will in fact generate identical code to a normal for (but strings are a special special case). Let’s take a look at the code in question.

            static void TestForeach() {

                string test = "thisisatestingstring";

                foreach (char c in test) {

                    Console.WriteLine(c);

                }

            }

     

            static void TestForLoop() {

                string test = "thisisatestingstring";

                for(int i=0;i<test.Length;i++) {

                    Console.WriteLine(testIdea [I]);

                }

            }

     

    Listing 3: Two methods to iterate through strings

     

    When we disassemble this to IL we receive the following

    .method private hidebysig static void TestForeach() cil managed
    {
          .maxstack 2
          .locals init (
                [0] string text1,
                [1] char ch1,
                [2] string text2,
                [3] int32 num1)
          L_0000: ldstr "thisisatestingstring"
          L_0005: stloc.0 
          L_0006: ldloc.0 
          L_0007: stloc.2 
          L_0008: ldc.i4.0 
          L_0009: stloc.3 
          L_000a: br.s L_001e
          L_000c: ldloc.2 
          L_000d: ldloc.3 
          L_000e: callvirt instance char string::get_Chars(int32)
          L_0013: stloc.1 
          L_0014: ldloc.1 
          L_0015: stsfld char PerformanceTestHarness.Program::dummy2
          L_001a: ldloc.3 
          L_001b: ldc.i4.1 
          L_001c: add 
          L_001d: stloc.3 
          L_001e: ldloc.3 
          L_001f: ldloc.2 
          L_0020: callvirt instance int32 string::get_Length()
          L_0025: blt.s L_000c
          L_0027: ret 
    }

    Listing 4: Foreach over string IL

     

    .method private hidebysig static void TestForLoop() cil managed
    {
          .maxstack 2
          .locals init (
                [0] string text1,
                [1] int32 num1)
          L_0000: ldstr "thisisatestingstring"
          L_0005: stloc.0 
          L_0006: ldc.i4.0 
          L_0007: stloc.1 
          L_0008: br.s L_001a
          L_000a: ldloc.0 
          L_000b: ldloc.1 
          L_000c: callvirt instance char string::get_Chars(int32)
          L_0011: stsfld char PerformanceTestHarness.Program::dummy2
          L_0016: ldloc.1 
          L_0017: ldc.i4.1 
          L_0018: add 
          L_0019: stloc.1 
          L_001a: ldloc.1 
          L_001b: ldloc.0 
          L_001c: callvirt instance int32 string::get_Length()
          L_0021: blt.s L_000a
          L_0023: ret 
    }

    Listing 5: for loop IL

     

    They are in fact nearly identical. There is no performance gain to be reached here by using for. It is very important to remember that foreach has dual definitions. Admittedly I always say that looking at IL is bad so here is the disassembled native code.

            static void TestForeach() {

                string test = "thisisatestingstring";

    00000000  push        edi 

    00000001  push        esi 

    00000002  mov         esi,dword ptr ds:[02314AF8h]

                foreach (char c in test) {

    00000008  xor         edx,edx

    0000000a  mov         edi,dword ptr [esi+8]

    0000000d  test        edi,edi

    0000000f  jle         00000024

    00000011  movzx       ecx,word ptr [esi+edx*2+0Ch]

                    dummy2 = c;

    00000016  mov         word ptr ds:[00912FE8h],cx

    0000001d  add         edx,1

                foreach (char c in test) {

    00000020  cmp         edi,edx

    00000022  jg          00000011

    00000024  pop         esi 

                }

            }

    00000025  pop         edi 

    00000026  ret           

      

            static void TestForLoop() {

                string test = "thisisatestingstring";

    00000000  push        edi 

    00000001  push        esi 

    00000002  mov         esi,dword ptr ds:[02314AF8h]

                for(int i=0;i<test.Length;i++) {

    00000008  xor         ecx,ecx

    0000000a  mov         edi,dword ptr [esi+8]

    0000000d  test        edi,edi

    0000000f  jle         00000024

                    dummy2 = testIdea [I];

    00000011  movzx       eax,word ptr [esi+ecx*2+0Ch]

    00000016  mov         word ptr ds:[00912FE8h],ax

                for(int i=0;i<test.Length;i++) {

    0000001d  add         ecx,1

    00000020  cmp         edi,ecx

    00000022  jg          00000011

    00000024  pop         esi 

                }

            }

    00000025  pop         edi 

    00000026  ret             

    Listing 6: Disassembled code

     

     

    The second bit from the presentation that people have emailed me about is some comments made about for being better than foreach in general. I believe many people are taking these comments out of context as Jeffrey does in fact qualify these comments in the context of collections, where they do hold true as the enumerator simply does an indexed lookup on the collection. I gave what I thought was a good example of where it would not be faster, a linked list, in my original post that I will go into a bit more detail with.

    Let’s consider for a minute how a linked list works. I store in my class a pointer (reference) to the first item in my list, if you want the 5th item I will iterate through my list until I hit the 5th node and return it to you, in this process I touch five nodes of the list. The same holds true for nine, thirty four, or seven thousand; we can therefore say that an indexed lookup into our linked list is O(n) (Big O n), we can actually make a stronger statements but this will suffice. If we were to create an enumerator for our linked list, its MoveNext()  function would simply return CurrentNode.Next this operation occurs in O(1). We can therefore say that our n O(1) operations will be much faster than n O(n) operations .. we could in fact even summarize this by saying that the enumerator will produce n touches of nodes where as the for with indexes will produce n touches of nodes. This is not that big of a deal on small lists but on a big list (say 100 items) the difference would be 100 touches with the enumerator vs 4950 touches with the index.

    This holds true with other objects as well although not usually to this scale, a tree for instance would be much heavier to hit with indexes as opposed to an enumeration. Most items such as this will not offer an index based lookup so it’s not that big of a deal ..

     

    Hope this clears up any confusion.

  • An in depth look at for loops

    My last two posts have been about Performance Measurement and Viewing Unmanaged Code in VS.NET, these posts while rather interesting have really been to set us up for the this discussion. If you have not read these posts you may want to as I will be using information from them often in this post.

    There are a lot of posts on the Internet discussing varying methods of looping using for loops and which perform best. These posts also generally give advice as to how you should handle your looping based upon performance metrics. Most of these types of posts suffer from an affliction I have discussed previously; they look at IL (Intermediate Language) to rationalize performance which is just flat out wrong as subtle differences can cause JIT (Just in Time) optimizations to go haywire.

    In this post we will look at the unmanaged code produced by varying constructs to come up with a definitive answer to the performance question. We will also look at some other optimization methods that people claim happen but do not always happen and discuss some reasons why they may not be happening and of course set some rules for fast looping.

    In trying to apply this the version of your framework is important as the JIT may vary from version to version, I do not believe anything here will change based upon the platform but that is a possibility. I used version 2.0.50727 for all examples in this document. Other versions such as Rotor, Mono, or 1.x  will most likely show differing behaviors.

    Hoisting?

    Before I start getting into code I would like to discuss the concept of hoisting as it will be the focal point of this entire discussion. When dealing with loops we can break any given loop into three sections

    1)      The preamble (setting up for the loop, doing things such as setting our counter to 0)

    2)      The code within the loop

    3)      The code to increment our counter and check for the end of iteration possibly jumping back to step two

    For the rest of this post I will color code the disassembled assembly as section one (green), section two (blue), section 3 (yellow) to make things easier to follow

    Any code that is in step two or three will be run N times as the loop is executed. The code in step one will only be executed a single time when we first enter the loop. The concept of hoisting involves moving code out of steps two and three and into step one. Hoisting obviously gives you a huge performance gain as the code will only be run once. 

    Simple Hoisting Example

    If you read alot articles on for loop performance they will tell you to not manually hoist because the JIT will end up doing it for you. In order to test this I have created the following test code (which can be used in conjunction with the harness discussed in the previous Performance Measurement post)

        class Program {

            static int[] foo = new int[100000];

            static int Dummy;

            static void Test1NoHoist() {

                int total = 0;

                for (int i = 0; i < foo.Length; i++) {

                    total|=i;

                }

                Dummy = total;

            }

     

            static void Test1WithHoist() {

                int total = 0;

                int length = foo.Length;

                for (int i = 0; i < length; i++) {

                    total|=i;

                }

                Dummy = total;

            }

     

            static void Main(string[] args) {

                TestHarness.Test("Test1NoHoist", 10000, new TestHandler(Test1NoHoist));

                TestHarness.Test("Test1WithHoist", 10000, new TestHandler(Test1WithHoist));

                Console.ReadLine();

            }

        }

    Listing 1: Basic manual hoisting example

     

    You will quickly notice that in Test1WithHoist we manually took the check Length property on foo and moved it to before the loop; this is a common optimization if you come from a C/C++ world. The thought behind this is that since Length translates to get_Length() that we are saving ourselves from having to call the method n times (remember that this would be in the second or third section and everything in the third section is called n times).

    An astute reader will notice an oddity dealing with the “Dummy” variable that is assigned after the loop. We will come back this a bit later on… good catch ;)

    The by running our performance test, we can see that the code executes in roughly equivalent time. 

    Test

    Total Time (ns)

    Average (ns)

    Test1NoHoist

    820583612.35

    82058.36

    Test1WithHoist

    812735363.70

    81273.53

     

    Let’s take a look at the generated code to offer proof that the two bits of code should run in equivalent time (if you want to see the results for yourself remember to follow the instructions for getting JIT optimized code from Viewing Unmanaged Code in VS.NET.

    Test1NoHoist

    Test1WithHoist

    00000000  xor         ecx,ecx

    00000002  xor         edx,edx

    00000004  mov         eax,dword ptr ds:[022B1EC4h]

    00000009  mov         eax,dword ptr [eax+4]

    0000000c  test        eax,eax

    0000000e  jle         00000019

    00000010  or          ecx,edx

    00000012  add         edx,1

    00000015  cmp         eax,edx

    00000017  jg          00000010

    00000019  mov         dword ptr ds:[00912FE8h],ecx

    0000001f  ret             

     

     

    00000000  xor         ecx,ecx

    00000002  mov         eax,dword ptr ds:[022B1EC4h]

    00000007  mov         edx,dword ptr [eax+4]

    0000000a  xor         eax,eax

    0000000c  test        edx,edx

    0000000e  jle         00000019

    00000010  or          ecx,eax

    00000012  add         eax,1

    00000015  cmp         eax,edx

    00000017  jl          00000010

    00000019  mov         dword ptr ds:[00912FE8h],ecx

    0000001f  ret             

    Listing 2: Disassembled versions of our functions

     

     

    A kind of neat item in these two bits of code is that the only real difference is that EAX and EDX are interchanged … This makes them look a bit more different than they are but rest assured they are pretty much identical. In both cases, the access to the stop variable has been hoisted out of the loop (in section three, both are comparing their counter to a register that was preloaded in section one). To better illustrate the point, here is a disassembly of the first for loop (without manual hoisting) when run without JIT optimizations (i.e. it will not be hoisted).

    0000002a  xor         esi,esi

    0000002c  nop             

    0000002d  jmp         00000034

    0000002f  nop             

    00000030  or          edi,esi

    00000032  nop