Micro- optimization tips to increase performance

 

In a previous post Lesson
learned from a real-world focus on performance
,
I explained how we increased the NDepend
static analysis process performances with a 4 factor. The post quantified
performance gain, and interestingly enough 25% came from micro-optimizations.
Basically the lesson learned was:

 

  • Micro-optimizations
    are worth it while premature optimizations are the root of evil
    (as Donald Knuth said a long
    time ago). The difference between micro-optimizations and premature
    optimizations is that the first ones are driven by measurement while the second
    ones are driven by guesses, intuitions and hunches.

 

Today, I would like to enumerate
some micro-optimization tips we found effective. Please, constantly keep in mind
that these tips are not advices on how you should develop all your code.

  • Do your code your way,
  • then measure performance,
  • then apply
    micro-optimization tips where performance problems are detected.

Also as explained in the
previous post, the bulk of performance gain will come anyway from better algorithms.
Micro-optimizations should come after having rationalizing your algorithms. Keep the bang for buck  tenet in mind: Invest first your time and energy in things that will give you the best value.

 

 

Prefer for loops over foreach loops

I wrote a controversial
post
,
claiming that using for loop instead of foreach loops has been proven to be effective for us. Try for your code and see if it also works for you.

 

 

Prefer array over List<T> if possible

In the loop post, I also explained
that using arrays instead of List<T>
is also a good way to enhance performance. This is not a surprise considering
that on one hand the CLR and the IL are both optimized to work with arrays, and
on the other hands, List<T>
implementation relies on arrays. It is interesting to decompile List<T> and see that it comes with
a _items : T[] field assigned each
time that the capacity of the list is changing. Of course arrays length remains
constant and this is why List<T>
is sometime more suited.
We even noticed that when doing intensive loops on a list, it might be worth getting a temporary array from the List<T> through the method List<T>.ToArray(), and then perform your loops on the array.

 

 

Refactor your algorithms to rely on hash table
constant time search

The absolutely killer
performance collections are hash tables
, implemented through HashSet<T>
and Dictionary<K,V>.  Here also I wrote a post
on comparing different hash table implementations performance.
Obviously, hash
table should be used only when there is the need to look out for an
object
amongst many other objects. But with a bit of astute, it seems to me
that most
of algorithms can be rewritten in order to rely on the hash table magic
which is: checking if a set contains a particular object,
in a constant time (i.e independently from the size of the set) and perform
a search operation in a set in a constant time. More explanation here.

 

Maybe the hash table
trick is more an algorithm rewriting tips than a micro-optimization tips, but I
really do think it is essential to spread the word about it.

 

 

Use CLR primitive types instead of typed data

The set of primitive type
is pretty limited: integers types (int, short, ulong, byte…), float numbers types
(double, float, decimal), boolean, string and char (maybe we could add also enumerations). But think about it: no
matter how complex an object grape is, it is an aggregation of data stored with
primitive types instances + maybe some unmanaged handles.

 

The fact is that both the
CLR/JIT compiler and the IL language are super optimized to cop with primitive
types. So it is performance wise to use a primitive types instead of a
typed data. For example, in .NET paths are represented with simple string. This
is something that really bothers me and for NDepend we created our own
library to represent and manipulate path: NDepend.Helpers.FilePathDirectory.
With this library paths are typed and frankly when doing intensive path
operations this is a blessing. But if in the future we figure out that some specific
algorithms relying on typed paths are costly, we won’t hesitate to temporarily represent
paths with raw strings.

 

 

Use POCO instead of aggregated object

POCO stands for Plain Old
CLR Object
.
A POCO is an object that doesn’t reference nor require complex infrastructure
to be used. A complex infrastructure can be a database connection or a file
handle. IMHO (and only In Humble My Opinion) a complex infrastructure can be any object that is not a CLR
primitive type. A POCO is then an object that doesn’t use anything else that
just CLR primitive types. As mentioned above, primitive types are optimized and
having POCO is consequently better for performance. Another good advocate for POCO is that such objects increase also
design and maintenance. Indeed, POCO avoid creating overly complex custom
classes interactions.

 

Notes: There are debates in comments of this post about my quick definition of POCO. I won’t play with words and, please, refer to the POCO wikipedia definition, which is not very different from what I wrote anyway.

 

 

Bufferize properties getter result in local
variable

Here is an example:

 

 

void MethodNotOptimized(Order order) {
string firstName = order.Client.PersonName.FirstName;
string lastName = order.Client.PersonName.LastName;
}

void MethodOptimized(Order order) {
PersonName personName = order.Client.PersonName;
string firstName = personName.FirstName;
string lastName = personName.LastName;
}

 

 

No matter how smart is the
JIT compiler, I doubt it does the analysis to see that the object returned by
some property getters (here order.Client.PersonName) is constant.
Of course here the performance gain is very small, it is just saving 2 getter/method
calls. But browse your code and you’ll see that this tips can be applied also in
many costly scenario, especially those involving loops, meaning you can save N getter/methods calls.

 

This example shows well
that  unfortunately, micro-optimization often degrade the code elegance.

 

 

 

void MethodNotOptimized(Order order) {
foreach(Product product in order.Products) {
product.SetClientName(
order.Client.PersonName.FirstName,
order.Client.PersonName.LastName);
}
}

void MethodOptimized(Order order) {
PersonName personName = order.Client.PersonName;
string firstName = personName.FirstName;
string lastName = personName.LastName;
IList<Product> products = order.Products;
int productsCount = products.Count;
for(int i=0; i< productsCount; i++) {
Product product = products [ i ] ;
product.SetClientName(firstName, lastName);
}
}

 

 


 

 

Non-private fields can make sense

Indeed, and I know that
this one might shock you. But even if the JIT compiler might be smart enough to do it for you, it can
save a few CPU cycles to access directly a field instead of using a property. I
personally use this trick scarcely, when I measured that this will indeed
improve performance. And some algorithms like ranking computation spend 60% of the time reading and assigning fields.

 

If possible, use the readonly keyword if you have the chance
to have an immutable field. For other scenario, I still protect my field accesses
with a CQL rule
that warns if the field is not read or assigned in specific scenario. And harnessing
the possibility to declare CQL rule in source code is particularly interesting
here:

 

 

using NDepend.Framework.CQL;

class Foo {
[CQLConstraintAttribute(
@"// <Name>m_Field should only be witten by FooUser.MethodWriter().</Name>
WARN IF Count > 0 IN SELECT METHODS WHERE
IsDirectlyWritingField ""Foo.m_Field"" AND
!NameIs ""FooUser.MethodWriter(Foo)""
", Active = true, DisplaySelectionViewInReport = true)]
internal int m_Field;
}

class FooUser {
internal void MethodWriter(Foo foo) {
foo = 3;
}
}

 

 

 

Test tier API to get the best of it

When performance is an
issue, it can be worth to re-think about how your code is using tier API. I have
a real-world example in mind. Once I wrote code that first checked if a file
existed, and then read the file last write time. After testing, it appears to
be cheaper to use a FileInfo object
to perform these 2 operations. I suppose that the FileInfo class bufferises some information to avoid some sort of path
operation the second time.

 

using System.IO;

void MethodNotOptimized(string path) {
if(File.Exists(path)) {
DateTime lastWriteTime = File.GetLastWriteTime(path);
}
}

void MethodOptimized(string path) {
FileInfo fileInfo = new FileInfo(path);
if(fileInfo.Exists(path)) {
DateTime lastWriteTime = fileInfo.LastWriteTime;
}
}

 

 

Know which branches of your code are the most used

And this last example illustrates
an important thing: know which branches of your code is the most used. In this previous
example, I knew that most of the time the file was supposed to exist. Thus I
didn’t even have to test if creating a FileInfo
object and then check for the file existence was more costly than just calling File.Exists().

This entry was posted in Uncategorized. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • Atut

    Better you increase readability of code snippets. It wraps statements and also have HTML markup admist it.

  • Anonymous

    I don’t see how using for is such a big deal or is “harder”, just use VS snippets. I like for because it is very tweakable via the params and to me I like the syntax of it.

  • http://codebetter.com/members/Patrick-Smacchia/default.aspx Patrick Smacchia

    Mike, thanks for all these precision. You underlined well the trade-off: perf vs. design/elegance.

    And sure, C++/CLI is an appealing option when it comes to performance of .NET code.

  • Mike Strobel

    I’ll agree with you that hash-based collections, particularly HashSet, is a great idea whenever you frequently need to check for arbitrary values. HashSet is a particularly wonderful collection type, as it exposes very performant set-based operations like UnionWith, IntersectWith, and ExceptWith. Definition disagreements aside, I also agree on the merits of using POCO.

    I always recommend caching property values locally, but not necessarily for performance–this is a good practice in general, particularly in cases where a property value may change between reads (and the algorithm expects an unchanged value).

    On the rest, I tend to disagree. Unless you are iterating over huge (and I do mean *huge* collections, with tens of thousands of elements), I see no performance justification for falling back on arrays and ‘for’ loops. There are a few reasons for this. First, arrays aren’t particularly safe–they have the same variance rules as in Java, which break type safety. Using a List to wrap an array eliminates this problem. I also generally advise against exposing writable collection, but that’s another discussion altogether. Regarding ‘for’ loops, I find them to be less readable and generally more troublesome to maintain, particularly if you introduce more than one control variable. The combination of foreach loops and iterators was included in large part because the performance hit (in most cases) *is* minimal, and the semantics are easier to read. Iterators may also be suspended and resumed easily. With the introduction of the LINQ APIs in .NET 3.5, we now have a very powerful query model that works against any collection types, and which may be optimized for specific types of collections. Complex queries which previously required several nested loops can now be written with a single query, and the performance cost in most scenarios isn’t worth considering.

    Non-private fields are okay in internal code, but I would advise against exposing any public fields in your APIs. Exposed fields on internal types can be particularly useful in cases where you need to perform atomic operations on shared data using CompareExchange or any of the other Interlocked methods. While this is fine for internal code, exposing public fields generally violates established design standards. And some of us do care about standards–we have a pretty nice ecosystem in .NET, so let’s not pollute it with oddities if we can avoid it, eh? :)

    As for using value types, this can be a mixed bag. It really depends on the context. If you are working with these types in isolation, then you may certainly see a performance benefit. However, if you are frequently passing these values as method parameters, you may actually see performance degradation, as the value gets copied on each call. Naturally, the appropriateness of value types boils down to how many reference types you would otherwise be instantiating vs. how many copy-by-value operations you might end up performing. Copy-by-value be avoided by using ‘ref’ parameters, but I recommend against it because C# has no notion of const ref parameters.

    If you’re really concerned about performance in the CLR, I recommend spending some time with C++/CLI. C++/CLI provides some nifty constructs for improving performance, which I have leveraged in “bottleneck” code. These include const ref parameters, forced inlining of method and properties, better control over boxing and unboxing, and better interop performance. Using arrays and ‘for’ loops in C++/CLI is more forgivable in my opinion, as the language has no support for custom iterators anyway, and it’s a common enough practice in C++ that you’re not going to surprise or annoy too many people.

    Anyhoo, I hope I didn’t come across as too condescending–on the whole, I think you wrote a good post, and I’m always interested to see people discussing performance issues.

    All the best,
    Mike

  • http://www.NDepend.com Patrick Smacchia

    My experience showed that you can expect 10 to 25% better perf with such tricks. It is not elegant, it is time consuming, the result is uncertain, I completely agree but if you are in the case where performance matters a lot and you already improved your algorithm, there is no other choices.

    Concerning optimization with parallelization my experience is that it is much much more complicated than just spawning new threads, even in the ideal case where threads doesn’t share any states and doesn’t do any rendez-vous. Soon programmers will learn that harnessing efficiently several cores is much more difficult than expected, mainly because of low-level memory/cache access contention.

  • Bernhard Hofmann

    Optimisation is always going to bring out debate on it’s worth in terms of the benefits for the time spent in the profile-change-profile-compare loop. I agree wholeheartedly with optimising code, but I’d like to add that it is often not worth my employer’s money. Often the code we write runs significantly faster than the IO we perform, and as a result the benefits of micro-optimisation are not seen or appreciated. I would value multi-threading and parallel processing far more than making code less maintainable in order to save a few CPU cycles. The above comes from my personal experience with web services, web sites, Windows services, and Windows applications so it’s all IMHO of course. ;)

  • allwarr

    None of these optimalizations make sense in 99.9% of any code. Really surprised, someone can spend so much time on it.

  • http://rikkus.info Rik Hemsley

    Regarding the captcha thing: What happens if I can’t use a pointing device and I want to post a comment?

  • http://rikkus.info Rik Hemsley

    The only situation I can imagine using _any_ of these is when writing something which does heavy number crunching and finding the compiler isn’t allowing me to squeeze every last cycle out.

    If I saw any of these in normal code, I’d be very, very angry.

  • Per Erik Stendahl

    for vs foreach: foreach is just so much more readable and less errorprone that I’m never going to write for instead of foreach just to save cycles.

    I think the correct solution is to write to Microsoft and have them fix the compiler. Less preferable would be IL-rewriting tool in the postbuild event.

    Many of these optimizations really should be handled by tools.

  • http://www.NDepend.com Patrick Smacchia

    Ok, concerning POCO I refer to the Wikipedia definition which is not very different than mine anyway, I just propose even more restriction.
    http://en.wikipedia.org/wiki/POCO

    Concerning foreach there are huge debate that I will close with a simple advice: in my code, using for instead of foreach slightly enhanced performance. Just try by yourself with a profiler and see if you can get better performance.

    Concerning copy to array to get faster loop later, I am talking of ListCopyToArray(). Decompile it by yourself and you’ll see it ends up calling a CLR unmanaged method that is certainly ultra-optimized.
    internal static extern void Array.Copy(…).

    Concerning the Demeter violations, very often micro-optimization makes code less elegant and break design, I agree (like public fields for example). It is up to you: break design and put careful comments to clearly inform that it is in the name of performance.

  • http://www.neovolve.com Rory Primrose

    I think you can easily get a lot of heat in suggesting that everyone should code using for instead of foreach as it is a small performance benefit. All these micro-optimizations help though and it’s not a big ask when it comes down to a slight change in coding style. I actually code in foreach because it is so easy, but then use R# to convert it to a for statement.

    The foreach statement also normally stuffs up MSTest code coverage because the enumerator usually doesn’t implement IDisposable and so the ((IDisposable)enumerator).Dispose() block is never hit.

    You make a statement in “Prefer array over List if possible” about converting the collection to an array and then running a foreach. Need to be careful on this one because the implementation of ToArray/CopyTo methods on type may rely on a foreach internally. A foreach should only be changed into a for statement if the type exposes an indexer property or when the array copy implementation doesn’t rely on foreach.

    For example, I did this exact change when loop though ConfigurationElementCollection but then found that ConfigurationElementCollection.CopyTo does a foreach under the covers. To call ToArray/CopyTo and then run a for in these cases will make performance worse as it will run a for after a foreach. It would be cheaper to stick with just the foreach in this case.

    Also, in your optimized “Bufferize properties getter result in local variable” sample, you can inline firstName and lastName variables so the result isn’t that much more code than the original.

  • Jeremy Gray

    @Patrick – I’m with Liam on this one. Good post in general (except for the nasty Demeter violations in your code samples) but please do refrain from redefining POCO. The term POCO has a very clear history (on the Java side the originating term POJO has been around for ages) and in its most common usage represents any class that has not been forced to inherit from a particular base class to be able to interoperate with a tool or framework. The term has nothing to do with types of members.

  • http://hackingon.net Liam McLennan

    Way to redefine POCO. A car is a mechanical device that moves around. A giraffe also moves around. Therefore a car is a giraffe.

  • http://www.NDepend.com Patrick Smacchia

    Omer, it might be indeed cheaper but IMHO the less exception my code provokes the better it is, then I still prefer the original proposed solution.

  • Omer Mor

    I believe that when you know that most of the time the file will exist, then you can just try to get the LastWriteTime, and catch the relevant exception if it doesn’t. Checking for existence is cheaper in the exceptional case that the file is missing, but if it is rare than most the time the try-catch method will be faster (and better looking IMO).

  • http://twitter.com/minhajuddin Khaja Minhajuddin

    Great article and very good tips. However, do you think we really need to invest our time in micro optimizations when throwing some more memory and CPU cycles would have the same effect. Do you think in Today’s age of cheap computing power, Micro Optimizations are still needed?