Performance : String Reverse

There are some people are trying to get the “fastest” string reverse function. http://sqljunkies.com/WebLog/amachanic/archive/2006/07/17/22253.aspx


http://weblogs.sqlteam.com/mladenp/archive/2006/03/19/9350.aspx


 There is some history on this problem at Justin Roger’s blog


You know me, I can’t stay away from a performance challenge…


Here is my original submission (about what you would expect and it’s a fairly naïve implementation) … Note that using the same string (breaking immutabilty rules and doing it in place) is not acceptable as an optimization :)








        public static unsafe string GregsUnsafeReverse(string s) {


            string ret = string.Copy(s);


            fixed (char* start = ret) {


                char* begin = start;


                char* end = start + s.Length – 1;


                char* stop = start + (end – start) / 2;


                while (begin < stop) {


                    *begin ^= *end;


                    *end ^= *begin;


                    *begin ^= *end;


                    begin++;


                    end–;


                }


            }


            return ret;


        }


Listing 1: unsafe code to reverse string


 


There are two other unsafe entries over on Justin Roger’s blog with similar performance characteristics.


What no one is talking about is that this is a piss poor way of doing it. We are copying 16 bits at a time. I think we can do much better. In this example I will use int32s to do it (although it can be done even faster with int64s (especially on a 64 bit machine)). If I could get it to use a ROTL instead, this should easily beat the copy.








        public static unsafe string GregsInt32Reverse(string s) {


            UInt32 Low;


            UInt32 High;


   


            string ret = string.Copy(s);


            fixed (char* start = ret) {


                UInt32* begin = (UInt32*)start;


                UInt32* end = (UInt32*)(start + s.Length – 2);


                UInt32* stop = begin + (end – begin) / 2;


                while (begin <= stop) {


                    Low = *begin;


                    Low = ((Low) << 16)| (Low >> 16);


                    High = *end;


                    High = ((High) << 16) | (High >> 16);


                    *begin = High;


                    *end = Low;


                    begin++;


                    end–;


                }


            }


            return ret;


        }


Listing 2: Reverse using int32s


 


As for the difference .. it’s about 25% faster (this is 1000 iterations with an 80k string)


Test : GregsUnsafeReverse took 424749975.105176 ns, average ns = 424749.97510517
Test : GregsInt32Reverse took 329840796.768017 ns, average ns = 329840.796768017


I am sure someone can grab this and squeak out some more performance by using int64s but I am lazy so I will just leave it at “this is how you could make it even faster” J Another thing which would make it faster is if I could just use ROTL instead of shifting + oring but I can’t seem to figure out a way to get the JIT to output that for me

This entry was posted in Under The Covers. Bookmark the permalink. Follow any comments here with the RSS feed for this post.

15 Responses to Performance : String Reverse

  1. bcoelho says:

    You could find my article at my blog http://bcoelho.blog.com/ or at codeproject http://www.thecodeproject.com/useritems/XmlInsertPerfomanceTest.asp.

    Hope to hear your feedback.

  2. Greg says:

    Sure either is fine .. love to read it :)

  3. bcoelho says:

    I have wrote an article on the perfomance of xml insertions in my blog. There you can find the code that I’ve wrote to test the perfomance of the insertions. You you don’t mind, I can post here the url to the blog or email it to you.

    Thank you.

  4. Greg says:

    bcoelho I dont have it handy anymore but it should be pretty easy to duplicate. Also I have a much faster version of this laying around by removing the use of pointer dereferences and using array accesses if you are interested in it.

  5. bcoelho says:

    Greg,

    Can you show the code that you use to test the perfomance of the algorithms?

    Thank you in advance.

  6. Greg says:

    very nice! (I hadnt thought of stackalloc which should be a touch faster)

    Per plugin JIT, did you know this was originally looking like it was going to be done? Way back in the days of 1.0 beta there were 3 JITs. There was optjit (if i remember this one properly), jit, and econojit (which implemented code pitching to amortize memory usage this JIT can be found in SSCLI now) … (there was also prejit … what a name Pre-JustInTime hmm :)…so they renamed it to ngen)

    These jits were plugin based with the CLR, you decided depending on your circumstances which you wanted to use. It had been discussed to make the interface public … if you look back into the dotnet lists on develop.com you should find them (2001 I believe)

    Personally I don’t want to write my own JIT but I would like a JIT that optimized to some of the things I mention. This is a prime case of where the CONCEPT of JIT rocks. Not all processors have a hardware based rotate instruction (x86 does) but on a power pc the preferred method is to use 2 32 bit registers emulating a 64 bit register, shl (overflowing into the second register then or them together). The ability for me to be able to write (val << 16) | (val>>16) and it to pick it up and optimize that pattern for the CPU (or to let me use a rotl instruction in IL) is the main benefit of compiling at runtime with a known environment.

  7. Alois says:

    Hi Greg,

    you seem to be begging for your own JIT Plugin. I have actually tried to contact Microsoft Research to get some feedback if this idea is feasible. Imagine if you could create JIT plugins that create optimized assembly code for your graphics card processor (http://geekswithblogs.net/akraus1/archive/2006/06/23/82859.aspx).

    By the way your GregsUnsafeReverse does contain an error. Try a string with a-z and reverse it you will get
    zyxwvutsrqpomnlkjihgfedcba the mn characters are not switched.
    You need to use while (begin <= stop) to get a correct result.

    My own function does not beat yours but it is pretty close:

    public static unsafe string AloisAlgo(string s)
    {
    char* arr = stackalloc char[s.Length];

    int j= s.Length-1;
    fixed (char* start = s)
    {
    char* end = start + s.Length;
    char* current = start;
    while (current != end)
    {
    arr[j] = *current;
    j–;
    current++;
    }
    }
    return new string(arr,0,s.Length);
    }

    Yours,
    Alois Kraus

  8. Greg says:

    ok after reading up some documentation on various intel chips. ROTL has the same performance as SHL .. I can therefore assume that the following code would execute at the same speed as the ROTL

    public static unsafe string GregsInt32Reverse(string s) {
    UInt32 Low;
    UInt32 High;
    string ret = string.Copy(s);
    fixed (char* start = ret) {
    UInt32* begin = (UInt32*)start;
    UInt32* end = (UInt32*)(start + s.Length – 2);
    while (begin < end) {
    Low = *begin;
    Low = (Low) << 16;
    *end = Low;
    High = *end;
    High = (High) << 16;
    *begin = High;
    begin++;
    end–;
    }
    }
    return ret;
    }

    This code is about 20% faster than the simple unsafe on the machines where it came up being identical previously. Of course the problem is that its not actually reversing as its losing the overflow :) It wins for 20 byte, 80 byte, 800 byte, 8k, and 80k … now if only I could get a rotl.

    I am toying with adding an IL instruction to mono right now to offer a more definate proof that this optimization really exists (unless people would be willing to accept me copy/pasting the JIT generated assembly to MASM, alterring it and running it there?)

  9. Greg says:

    ROTL ROTL my kingdom for a ROTL

  10. Thanks for the additional commentary on this topic, Greg… However, I think we need to consider how often we actually reverse 80k strings in real programs… I think John’s 20 byte string is just a bit more real-world… ;)

  11. johnwood says:

    Oh, also my tests were with a 20 byte string :) I think at a point like this the competition becomes a little hard to win definitively :))

  12. Greg says:

    although you are right as well in that I could optimize a bit better too by removing the stop variable and just using begin < end

  13. Greg says:

    ok tested on another machine and they come out about the same ..

    80k string 10000 iterations

    Test : JohnsReverse took 3043694595.65126 ns, average ns = 304369.459565126
    Test : GregsInt32Reverse took 2963980647.19858 ns, average ns = 296398.064719858

    800 byte string 1000000 iterations.
    Test : JohnsReverse took 1610683234.92359 ns, average ns = 1610.68323492359
    Test : GregsInt32Reverse took 1678762222.35589 ns, average ns = 1678.76222235589

  14. Greg says:

    hmmm …

    Test : JohnsReverse took 42553687187.3528 ns, average ns = 425536.871873528
    Test : GregsInt32Reverse took 33369146730.09 ns, average ns = 333691.4673009
    Press any key to continue . . .

    in release with JIT optimizations … 80k string size … maybe I have bigger ratio of processor / memory speed?

     

    but you are right .. the shifts are expensive … using rotl/rotr on the register would remove this but I can’t for the life of me get the JIT to produce that code :)

  15. johnwood says:

    I’m not sure using Int32s would really speed it up that much given all the bit twiddling you have to do to reverse it.

    If I try this code:

    public static unsafe string JohnsReverse(string s)
    {
    string newcopyout = string.Copy(s);
    fixed (char* start = newcopyout)
    {
    char *en = (start + s.Length – 1);
    char *st = start;
    while (st {
    char old = *en;
    *en = *st;
    *st = old;
    st++;
    en–;
    }
    }
    return newcopyout;
    }

    … it runs quite a bit faster than yours (1.7s compared to 2.3s on 10mil iterations).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>