Patrick Smacchia [MVP C#]

Sponsors

The Lounge

Wicked Cool Jobs

News

  • NDepend v3 is fully integrated in Visual Studio, and is now available for download! Software dependencies visualization, 82 .NET software metrics, continuous rule validations, assembly version diff, declarative code queries and more ! http://ndepend.com

Advertisement

Micro- optimization tips to increase performance

 

In a previous post Lesson learned from a real-world focus on performance, I explained how we increased the NDepend static analysis process performances with a 4 factor. The post quantified performance gain, and interestingly enough 25% came from micro-optimizations. Basically the lesson learned was:

 

  • Micro-optimizations are worth it while premature optimizations are the root of evil (as Donald Knuth said a long time ago). The difference between micro-optimizations and premature optimizations is that the first ones are driven by measurement while the second ones are driven by guesses, intuitions and hunches.

 

Today, I would like to enumerate some micro-optimization tips we found effective. Please, constantly keep in mind that these tips are not advices on how you should develop all your code.

  • Do your code your way,
  • then measure performance,
  • then apply micro-optimization tips where performance problems are detected.

Also as explained in the previous post, the bulk of performance gain will come anyway from better algorithms. Micro-optimizations should come after having rationalizing your algorithms. Keep the bang for buck  tenet in mind: Invest first your time and energy in things that will give you the best value.

 

 

Prefer for loops over foreach loops

I wrote a controversial post, claiming that using for loop instead of foreach loops has been proven to be effective for us. Try for your code and see if it also works for you.

 

 

Prefer array over List<T> if possible

In the loop post, I also explained that using arrays instead of List<T> is also a good way to enhance performance. This is not a surprise considering that on one hand the CLR and the IL are both optimized to work with arrays, and on the other hands, List<T> implementation relies on arrays. It is interesting to decompile List<T> and see that it comes with a _items : T[] field assigned each time that the capacity of the list is changing. Of course arrays length remains constant and this is why List<T> is sometime more suited. We even noticed that when doing intensive loops on a list, it might be worth getting a temporary array from the List<T> through the method List<T>.ToArray(), and then perform your loops on the array.

 

 

Refactor your algorithms to rely on hash table constant time search

The absolutely killer performance collections are hash tables, implemented through HashSet<T> and Dictionary<K,V>.  Here also I wrote a post on comparing different hash table implementations performance. Obviously, hash table should be used only when there is the need to look out for an object amongst many other objects. But with a bit of astute, it seems to me that most of algorithms can be rewritten in order to rely on the hash table magic which is: checking if a set contains a particular object, in a constant time (i.e independently from the size of the set) and perform a search operation in a set in a constant time. More explanation here.

 

Maybe the hash table trick is more an algorithm rewriting tips than a micro-optimization tips, but I really do think it is essential to spread the word about it.

 

 

Use CLR primitive types instead of typed data

The set of primitive type is pretty limited: integers types (int, short, ulong, byte…), float numbers types (double, float, decimal), boolean, string and char (maybe we could add also enumerations). But think about it: no matter how complex an object grape is, it is an aggregation of data stored with primitive types instances + maybe some unmanaged handles.

 

The fact is that both the CLR/JIT compiler and the IL language are super optimized to cop with primitive types. So it is performance wise to use a primitive types instead of a typed data. For example, in .NET paths are represented with simple string. This is something that really bothers me and for NDepend we created our own library to represent and manipulate path: NDepend.Helpers.FilePathDirectory. With this library paths are typed and frankly when doing intensive path operations this is a blessing. But if in the future we figure out that some specific algorithms relying on typed paths are costly, we won’t hesitate to temporarily represent paths with raw strings.

 

 

Use POCO instead of aggregated object

POCO stands for Plain Old CLR Object. A POCO is an object that doesn’t reference nor require complex infrastructure to be used. A complex infrastructure can be a database connection or a file handle. IMHO (and only In Humble My Opinion) a complex infrastructure can be any object that is not a CLR primitive type. A POCO is then an object that doesn’t use anything else that just CLR primitive types. As mentioned above, primitive types are optimized and having POCO is consequently better for performance. Another good advocate for POCO is that such objects increase also design and maintenance. Indeed, POCO avoid creating overly complex custom classes interactions.

 

Notes: There are debates in comments of this post about my quick definition of POCO. I won't play with words and, please, refer to the POCO wikipedia definition, which is not very different from what I wrote anyway.

 

 

Bufferize properties getter result in local variable

Here is an example:

 

 

void MethodNotOptimized(Order order) {
string firstName = order.Client.PersonName.FirstName;
string lastName = order.Client.PersonName.LastName;
}

void MethodOptimized(Order order) {
PersonName personName = order.Client.PersonName;
string firstName = personName.FirstName;
string lastName = personName.LastName;
}

 

 

No matter how smart is the JIT compiler, I doubt it does the analysis to see that the object returned by some property getters (here order.Client.PersonName) is constant. Of course here the performance gain is very small, it is just saving 2 getter/method calls. But browse your code and you’ll see that this tips can be applied also in many costly scenario, especially those involving loops, meaning you can save N getter/methods calls.

 

This example shows well that  unfortunately, micro-optimization often degrade the code elegance.

 

 

 

void MethodNotOptimized(Order order) {
foreach(Product product in order.Products) {
product.SetClientName(
order.Client.PersonName.FirstName,
order.Client.PersonName.LastName);
}
}

void MethodOptimized(Order order) {
PersonName personName = order.Client.PersonName;
string firstName = personName.FirstName;
string lastName = personName.LastName;
IList<Product> products = order.Products;
int productsCount = products.Count;
for(int i=0; i< productsCount; i++) {
Product product = products [ i ] ;
product.SetClientName(firstName, lastName);
}
}

 

 


 

 

Non-private fields can make sense

Indeed, and I know that this one might shock you. But even if the JIT compiler might be smart enough to do it for you, it can save a few CPU cycles to access directly a field instead of using a property. I personally use this trick scarcely, when I measured that this will indeed improve performance. And some algorithms like ranking computation spend 60% of the time reading and assigning fields.

 

If possible, use the readonly keyword if you have the chance to have an immutable field. For other scenario, I still protect my field accesses with a CQL rule that warns if the field is not read or assigned in specific scenario. And harnessing the possibility to declare CQL rule in source code is particularly interesting here:

 

 

using NDepend.Framework.CQL;

class Foo {
[CQLConstraintAttribute(
@"// <Name>m_Field should only be witten by FooUser.MethodWriter().</Name>
WARN IF Count > 0 IN SELECT METHODS WHERE
IsDirectlyWritingField ""Foo.m_Field"" AND
!NameIs ""FooUser.MethodWriter(Foo)""
", Active = true, DisplaySelectionViewInReport = true)]
internal int m_Field;
}

class FooUser {
internal void MethodWriter(Foo foo) {
foo = 3;
}
}

 

 

 

Test tier API to get the best of it

When performance is an issue, it can be worth to re-think about how your code is using tier API. I have a real-world example in mind. Once I wrote code that first checked if a file existed, and then read the file last write time. After testing, it appears to be cheaper to use a FileInfo object to perform these 2 operations. I suppose that the FileInfo class bufferises some information to avoid some sort of path operation the second time.

 

using System.IO;

void MethodNotOptimized(string path) {
if(File.Exists(path)) {
DateTime lastWriteTime = File.GetLastWriteTime(path);
}
}

void MethodOptimized(string path) {
FileInfo fileInfo = new FileInfo(path);
if(fileInfo.Exists(path)) {
DateTime lastWriteTime = fileInfo.LastWriteTime;
}
}

 

 

Know which branches of your code are the most used

And this last example illustrates an important thing: know which branches of your code is the most used. In this previous example, I knew that most of the time the file was supposed to exist. Thus I didn’t even have to test if creating a FileInfo object and then check for the file existence was more costly than just calling File.Exists().


Posted Sun, Apr 19 2009 2:28 PM by Patrick Smacchia

[Advertisement]

Comments

DotNetShoutout wrote Micro- optimization tips to increase performance - Patrick Smacchia - CodeBetter.Com
on Sun, Apr 19 2009 9:55 AM

Thank you for submitting this cool story - Trackback from DotNetShoutout

Jason Haley wrote Interesting Finds: April 19, 2009
on Sun, Apr 19 2009 10:04 AM

Interesting Finds: April 19, 2009

Khaja Minhajuddin wrote re: Micro- optimization tips to increase performance
on Sun, Apr 19 2009 12:26 PM

Great article and very good tips. However, do you think we really need to invest our time in micro optimizations when throwing some more memory and CPU cycles would have the same effect. Do you think in Today's age of cheap computing power, Micro Optimizations are still needed?

Patrick Smacchia [MVP C#] wrote Do we need Micro-Optimization?
on Sun, Apr 19 2009 1:35 PM

I just posted some micro-optimization tricks I found efficient and I got the following question: Khaja

Omer Mor wrote re: Micro- optimization tips to increase performance
on Sun, Apr 19 2009 2:49 PM

I believe that when you know that most of the time the file will exist, then you can just try to get the LastWriteTime, and catch the relevant exception if it doesn't. Checking for existence is cheaper in the exceptional case that the file is missing, but if it is rare than most the time the try-catch method will be faster (and better looking IMO).

Patrick Smacchia wrote re: Micro- optimization tips to increase performance
on Sun, Apr 19 2009 3:02 PM

Omer, it might be indeed cheaper but IMHO the less exception my code provokes the better it is, then I still prefer the original proposed solution.

Liam McLennan wrote re: Micro- optimization tips to increase performance
on Sun, Apr 19 2009 4:59 PM

Way to redefine POCO. A car is a mechanical device that moves around. A giraffe also moves around. Therefore a car is a giraffe.

Jeremy Gray wrote re: Micro- optimization tips to increase performance
on Sun, Apr 19 2009 5:55 PM

@Patrick - I'm with Liam on this one. Good post in general (except for the nasty Demeter violations in your code samples) but please do refrain from redefining POCO. The term POCO has a very clear history (on the Java side the originating term POJO has been around for ages) and in its most common usage represents any class that has not been forced to inherit from a particular base class to be able to interoperate with a tool or framework. The term has nothing to do with types of members.

Rory Primrose wrote re: Micro- optimization tips to increase performance
on Sun, Apr 19 2009 11:50 PM

I think you can easily get a lot of heat in suggesting that everyone should code using for instead of foreach as it is a small performance benefit. All these micro-optimizations help though and it's not a big ask when it comes down to a slight change in coding style. I actually code in foreach because it is so easy, but then use R# to convert it to a for statement.

The foreach statement also normally stuffs up MSTest code coverage because the enumerator usually doesn't implement IDisposable and so the ((IDisposable)enumerator).Dispose() block is never hit.

You make a statement in "Prefer array over List<T> if possible" about converting the collection to an array and then running a foreach. Need to be careful on this one because the implementation of ToArray/CopyTo methods on type may rely on a foreach internally. A foreach should only be changed into a for statement if the type exposes an indexer property or when the array copy implementation doesn't rely on foreach.

For example, I did this exact change when loop though ConfigurationElementCollection but then found that ConfigurationElementCollection.CopyTo does a foreach under the covers. To call ToArray/CopyTo and then run a for in these cases will make performance worse as it will run a for after a foreach. It would be cheaper to stick with just the foreach in this case.

Also, in your optimized "Bufferize properties getter result in local variable" sample, you can inline firstName and lastName variables so the result isn't that much more code than the original.

Patrick Smacchia wrote re: Micro- optimization tips to increase performance
on Mon, Apr 20 2009 3:34 AM

Ok, concerning POCO I refer to the Wikipedia definition which is not very different than mine anyway, I just propose even more restriction.

en.wikipedia.org/.../POCO

Concerning foreach there are huge debate that I will close with a simple advice: in my code, using for instead of foreach slightly enhanced performance. Just try by yourself with a profiler and see if you can get better performance.

Concerning copy to array to get faster loop later, I am talking of List<T>CopyToArray(). Decompile it by yourself and you'll see it ends up calling a CLR unmanaged method that is certainly ultra-optimized.

internal static extern void Array.Copy(...).

Concerning the Demeter violations, very often micro-optimization makes code less elegant and break design, I agree (like public fields for example). It is up to you: break design and put careful comments to clearly inform that it is in the name of performance.

Per Erik Stendahl wrote re: Micro- optimization tips to increase performance
on Mon, Apr 20 2009 4:11 AM

for vs foreach: foreach is just so much more readable and less errorprone that I'm never going to write for instead of foreach just to save cycles.

I think the correct solution is to write to Microsoft and have them fix the compiler. Less preferable would be IL-rewriting tool in the postbuild event.

Many of these optimizations really should be handled by tools.

Rik Hemsley wrote re: Micro- optimization tips to increase performance
on Mon, Apr 20 2009 6:35 AM

The only situation I can imagine using _any_ of these is when writing something which does heavy number crunching and finding the compiler isn't allowing me to squeeze every last cycle out.

If I saw any of these in normal code, I'd be very, very angry.

Rik Hemsley wrote re: Micro- optimization tips to increase performance
on Mon, Apr 20 2009 6:36 AM

Regarding the captcha thing: What happens if I can't use a pointing device and I want to post a comment?

allwarr wrote re: Micro- optimization tips to increase performance
on Mon, Apr 20 2009 7:57 AM

None of these optimalizations make sense in 99.9% of any code. Really surprised, someone can spend so much time on it.

Bernhard Hofmann wrote re: Micro- optimization tips to increase performance
on Mon, Apr 20 2009 8:25 AM

Optimisation is always going to bring out debate on it's worth in terms of the benefits for the time spent in the profile-change-profile-compare loop. I agree wholeheartedly with optimising code, but I'd like to add that it is often not worth my employer's money. Often the code we write runs significantly faster than the IO we perform, and as a result the benefits of micro-optimisation are not seen or appreciated. I would value multi-threading and parallel processing far more than making code less maintainable in order to save a few CPU cycles. The above comes from my personal experience with web services, web sites, Windows services, and Windows applications so it's all IMHO of course. ;)

Patrick Smacchia wrote re: Micro- optimization tips to increase performance
on Mon, Apr 20 2009 9:08 AM

My experience showed that you can expect 10 to 25% better perf with such tricks.  It is not elegant,  it is time consuming,  the result is uncertain,  I completely agree but if you are in the case where performance matters a lot and you already improved your algorithm, there is no other choices.

Concerning optimization with parallelization my experience is that it is much much more complicated than just spawning new threads, even in the ideal case where threads doesn't share any states and doesn't do any rendez-vous.  Soon programmers will learn that harnessing efficiently several cores is much more difficult than expected, mainly because of low-level memory/cache access contention.

Mike Strobel wrote re: Micro- optimization tips to increase performance
on Tue, Apr 21 2009 12:05 PM

I'll agree with you that hash-based collections, particularly HashSet<T>, is a great idea whenever you frequently need to check for arbitrary values.  HashSet<T> is a particularly wonderful collection type, as it exposes very performant set-based operations like UnionWith, IntersectWith, and ExceptWith.  Definition disagreements aside, I also agree on the merits of using POCO.

I always recommend caching property values locally, but not necessarily for performance--this is a good practice in general, particularly in cases where a property value may change between reads (and the algorithm expects an unchanged value).

On the rest, I tend to disagree.  Unless you are iterating over huge (and I do mean *huge* collections, with tens of thousands of elements), I see no performance justification for falling back on arrays and 'for' loops.  There are a few reasons for this.  First, arrays aren't particularly safe--they have the same variance rules as in Java, which break type safety.  Using a List<T> to wrap an array eliminates this problem.  I also generally advise against exposing writable collection, but that's another discussion altogether.  Regarding 'for' loops, I find them to be less readable and generally more troublesome to maintain, particularly if you introduce more than one control variable.  The combination of foreach loops and iterators was included in large part because the performance hit (in most cases) *is* minimal, and the semantics are easier to read.  Iterators may also be suspended and resumed easily.  With the introduction of the LINQ APIs in .NET 3.5, we now have a very powerful query model that works against any collection types, and which may be optimized for specific types of collections.  Complex queries which previously required several nested loops can now be written with a single query, and the performance cost in most scenarios isn't worth considering.

Non-private fields are okay in internal code, but I would advise against exposing any public fields in your APIs.  Exposed fields on internal types can be particularly useful in cases where you need to perform atomic operations on shared data using CompareExchange or any of the other Interlocked methods.  While this is fine for internal code, exposing public fields generally violates established design standards.  And some of us do care about standards--we have a pretty nice ecosystem in .NET, so let's not pollute it with oddities if we can avoid it, eh? :)

As for using value types, this can be a mixed bag.  It really depends on the context.  If you are working with these types in isolation, then you may certainly see a performance benefit.  However, if you are frequently passing these values as method parameters, you may actually see performance degradation, as the value gets copied on each call.  Naturally, the appropriateness of value types boils down to how many reference types you would otherwise be instantiating vs. how many copy-by-value operations you might end up performing.  Copy-by-value be avoided by using 'ref' parameters, but I recommend against it because C# has no notion of const ref parameters.

If you're really concerned about performance in the CLR, I recommend spending some time with C++/CLI.  C++/CLI provides some nifty constructs for improving performance, which I have leveraged in "bottleneck" code.  These include const ref parameters, forced inlining of method and properties, better control over boxing and unboxing, and better interop performance.  Using arrays and 'for' loops in C++/CLI is more forgivable in my opinion, as the language has no support for custom iterators anyway, and it's a common enough practice in C++ that you're not going to surprise or annoy too many people.

Anyhoo, I hope I didn't come across as too condescending--on the whole, I think you wrote a good post, and I'm always interested to see people discussing performance issues.

All the best,

Mike

Patrick Smacchia wrote re: Micro- optimization tips to increase performance
on Wed, Apr 22 2009 4:57 AM

Mike, thanks for all these precision. You underlined well the trade-off: perf vs. design/elegance.

And sure, C++/CLI is an appealing option when it comes to performance of .NET code.

Anonymous wrote re: Micro- optimization tips to increase performance
on Wed, Apr 22 2009 8:53 AM

I don't see how using for is such a big deal or is "harder", just use VS snippets.  I like for because it is very tweakable via the params and to me I like the syntax of it.

Atut wrote re: Micro- optimization tips to increase performance
on Thu, Apr 30 2009 12:29 AM

Better you increase readability of code snippets. It wraps statements and also have HTML markup admist it.

on Sun, Feb 14 2010 8:57 AM

C# is not C

Add a Comment

(required)  
(optional)
(required)  
Remember Me?
Devlicio.us