Patrick Smacchia [MVP C#]

Sponsors

The Lounge

Advertisement

Images in this post missing? We recently lost them in a site migration. We're working to restore these as you read this. Should you need an image in an emergency, please contact us at imagehelp@codebetter.com
Lessons learned from a real-world focus on performance

 
I am glad to announce that we just released a new version of NDepend where the analysis phase duration has been divided by 4 and the memory consumption has been divided by 2.

 

An interesting question is: Does such a massive performance gain means that it was badly coded the first time? Previous slower versions met several thousands of functional requirement needed to analyze properly any .NET application. In this sense, it was nicely coded. But still, there were room for improvement and my believe is that there is always room for better performance. Even on the current much faster version we identified several significant optimizations to be done. The downside is that this will require a lot of work. So the first lesson learned is that there is always room for better performance. We can complete this rule with the fact that the amount of work to make a program run faster grows exponentially with the gain expected.

 

We quantified each performance gain source and we obtained this chart. As said we have a 75% gain obtained thanks to algorithm optimization, micro-optimization and parallelization. The current analysis duration (25% of the original analysis duration) can be split between 17% of the time spent in our code, and 8% of the time spent in tier code. In our case, tier code is the code of Mono.Cecil, used to build some object models of the assemblies analyzed.

 

 

 

 

The bulk of performance was obtained with algorithm improvement and this is not a surprise. IMHO the surprise comes from the relatively small gain obtained from parallelization. Indeed, we did some massive refactoring to parallelized the analysis with the parallel monologue idea in mind. Now each assembly is analyzed as a task that doesn’t have any synchronization need with any other task. In other words each task uses its own states that are not visible from the other tasks. Unfortunately this requirement is not enough to ensure proper scaling on the number of processor. While we get the 15% gain from between 1 and 2 processors, the gain is almost zero between 2 and 4 processors. We identified some potential IO contentions and memory issues that will require more attention in the future. This leads to another lesson: Don’t expect that scaling on many processors will be something easy, even if you don’t share states and don’t use synchronization.

 

When it comes to enhancing performance there is only one way to do things right: measure and focus your work on the part of the code that really takes the bulk of time, not the one that you think takes the bulk of time. I have a quick real-world illustration. Personally I have always been amazed by the performance of the C# and VB.NET compilers. Imagine, it can compile in a few seconds thousands of source files that took dozens or even hundreds of man-year to be written. At analysis time, NDepend needs also to parse sources files but it does it partially to only obtain comment and Cyclomatic Complexity  metrics. By measuring file loading time we had a good surprise: loading source file in memory is much cheaper than expected. For example, to load the 1728 C# sources files of NDepend it takes less than 0.2 seconds on the 13 seconds needed for a complete analysis of its own code. Now, knowing that it is almost free to load the source files in memory I am a bit less impressed by the performance of the C# and VB.NET compilers. And it leads to an important lesson: NEVER anticipate the performance cost, ALWAYS measure it. And the only professional way to measure is to rely on performance profiler tools. Actually, we use both dotTrace from JetBrain and Red-Gate ANTS Profiler and are happy with it.

 

Something important concerning the measure of code performance is the percentage of time spent inside tier code, things like DB or network access. As we explained in the previous chart, in our case study 35% of the analysis time is spent inside the Cecil code. We can infer from this number that if we want to have an analysis duration divided by 2 in the future, our code will need to run 4.3 faster! ((100 – 35) / (50 -35) = 4,3). This clearly shows the importance of another lesson: assess your limits by measuring the percentage of time spent in tier code.

 

In the dissection of performance gain, an interesting point is the 15% gain obtained from micro-optimization. I mean things like using the right kind of loop, using the right kind of collection, preferring to deal with primitive CLR objects as int and string (POCO idea), choosing carefully between structures and classes, buffering properties getter results into local variables or even exposing non-private fields. I could summarized the lesson learned by the fact that micro-optimizations are worth it while premature optimizations are the root of evil (as Donald Knuth said a long time ago). The difference between micro-optimizations and premature optimizations is that the first ones are driven by measurement while the second ones are driven by guesses, intuitions and hunches.


Something not quantified in our performance gain is the fact that memory consumption has been roughly divided by 2. Less memory doesn’t necessarily means less managed objects but in our particular case we do allocate less objects. The nice consequence is that less objects means less time spent in the GC. Less memory also means less virtual memory page fault, which is IMHO the worst thing when it comes to polish performances of a program.

 

Finally, we are now experiencing a great advantage of optimization: automatic tests naturally ran much faster and it is a good motivation to run them more often!

 

 

 

I end up with an illustration of the tenet there is always room for better performance: the Amiga demo scene in which I participated in the early nineties. Every demos ran on a constant hardware. It was the perfect incentive for demo developers to produce code more and more optimized. During several years, every months some record were beaten, like the number of 3D polygons, the number of sprites or the number of dots displayed simultaneously at the rate of 50 frames per seconds. As far as I can estimate, the performance factor obtained in a few years was something around x50! Imagine what it means, running in one second a computation that took initially a minute. And as far as I know this massive gain was the result of both better algorithms (with many pre-computations and delegations to sub-chip) and micro-optimizations at assembly language level (better use of the chip registers, better use of the set of instructions...).

 

 


Posted 12-01-2008 10:31 AM by Patrick Smacchia

[Advertisement]

Comments

Dew Drop - December 1, 2008 | Alvin Ashcraft's Morning Dew wrote Dew Drop - December 1, 2008 | Alvin Ashcraft's Morning Dew
on 12-01-2008 9:30 AM

Pingback from  Dew Drop - December 1, 2008 | Alvin Ashcraft's Morning Dew

jon wrote re: Lessons learned from a real-world focus on performance
on 12-01-2008 10:01 AM

you forgot

Migrate to C++ gain 99%

Patrick Smacchia wrote re: Lessons learned from a real-world focus on performance
on 12-01-2008 12:54 PM

I am not sure it is that easy Jon, I used to do "a lot" of C++ before .NET.

For sure you can get a non-neglectable gain in performance with C++ but for sure also you'll need much much more time to achieve the same set of features.

Sebastien Pouliot wrote re: Lessons learned from a real-world focus on performance
on 12-01-2008 2:08 PM

Many lessons you learned young padawan, but still much to learn*.

<quote>asses your limits by measuring the percentage of time spent in tier code.</quote>

Measuring tier code is *not* measuring your limits (but it can be measuring your old decisions ;-). This is even less true when you lucky enough to depend on open-source tier code. You made a good decision in choosing Cecil (not that there were a lot of alternatives ;-) and you can/should capitalize on it.

Take this lesson from someone who depends very much on Cecil performance *and* invested a bit time into making it faster and less memory hungry: You too can make Cecil better, get the advantages inside your software and contribute to the community around Cecil. If you have reached 35% of your time inside Cecil then it does look like a prime candidate to make NDepend better, performance wise.

BTW since last week Cecil gained new Has{x} properties that can help you reduce your memory usage quite a bit (since NDepend probably touch every properties and creates empty collection everywhere possible).

* or as Yoda used to say "use the source Luke"!

Patrick Smacchia wrote re: Lessons learned from a real-world focus on performance
on 12-01-2008 2:39 PM

Sebastien, I know that Jb is working hard for dramatic performance gain in the Cecil future. I omited this info to keep it for further post, when the performance gain will be effectively available (as I didn't talk about NDepend improvements before they are available). This is all about avoiding frustrations and noise, blogging about facts, not expectations.

And yes, we can put the hand in, as we already do by helping making Cecil a more robust fx by reporting every single problem NDepend users have because of Cecil (I would say around 15 issues reported during 2008).

Sebastien Pouliot wrote re: Lessons learned from a real-world focus on performance
on 12-01-2008 5:17 PM

Well the example was about Cecil* but that was not the spirit of the lesson ;-)

Today anyone with a binary-thinking about "his" code versus "other" code is making a {pick your size} mistake. There is a middle ground, which can be ignored or embraced, when you use open source inside your software (be it open source or not). Your (good) choice of such tools makes crossing into the "other" code a real possibility, i.e. something to consider and not a limitation (that's the lesson ;-).

Now you don't have to do this for "other" benefits since it will likely (and positively) affect your own code/software. In fact you should not, at least while doing performance work, make any difference between your code and open source code. Clearly you want  your software (!= your code) to get the maximum benefits out of your next coding hour/day/week - so why should you care where this next/best opportunity is ?

p.s. filling bug is great, but you can (generally) do this on any product (even if IMHO you'll get better/faster result in an open source project ;-), but that's padawan, not jedi stuff ;-)

* that being said I concur that JB is doing an incredible job on Cecil and its future. Part of it is accepting changes (features, bug fixes and optimization) from anyone using Cecil. Clearly there's no need to wait for the next big release to enjoy enhancements :-)

private lessons | Digg hot tags wrote private lessons | Digg hot tags
on 12-01-2008 9:49 PM

Pingback from  private lessons | Digg hot tags

Bob Dobalina wrote re: Lessons learned from a real-world focus on performance
on 12-01-2008 10:01 PM

Great post - I heartily agree with measuring, and at least several times, before cutting.

...but... one small spelling error gave me many giggles - I believe you meant "asses->s<- your limits..."  The way it is now, it's plural for a donkey. :D

free online virtual worlds | Digg hot tags wrote free online virtual worlds | Digg hot tags
on 12-01-2008 10:29 PM

Pingback from  free online virtual worlds | Digg hot tags

2008 December 02 - Links for today « My (almost) Daily Links wrote 2008 December 02 - Links for today &laquo; My (almost) Daily Links
on 12-02-2008 5:11 AM

Pingback from  2008 December 02 - Links for today &laquo; My (almost) Daily Links

Pk Aus at the sydney motor show for Audi in 2007 wrote Pk Aus at the sydney motor show for Audi in 2007
on 12-02-2008 7:23 AM

Pingback from  Pk Aus at the sydney motor show for Audi in 2007

Alex Danvy wrote re: Lessons learned from a real-world focus on performance
on 12-02-2008 2:31 PM

Oh, I miss the Copper and Blitter chips. What a nice plateform was the Amiga.

I really agree with the idea of mesure first then spend time to optimize.

Mike Stockdale wrote re: Lessons learned from a real-world focus on performance
on 12-03-2008 1:07 PM

Good post.

Performance tuning is so simple in concept: measure where it's slow and make it faster, but often so difficult in practice!

David on .NET wrote Nouvelle version de NDepend
on 12-04-2008 1:31 PM

Nouvelle version de NDepend

TRS-80 vs. Alpha, and Parallel Optimization « Sutter’s Mill wrote TRS-80 vs. Alpha, and Parallel Optimization &laquo; Sutter&#8217;s Mill
on 12-19-2008 5:49 PM

Pingback from  TRS-80 vs. Alpha, and Parallel Optimization &laquo; Sutter&#8217;s Mill

Will Dean wrote re: Lessons learned from a real-world focus on performance
on 12-21-2008 1:53 PM

Your file loading time sounds a bit unlikely unless those files (and the bits of the filesystem which refer to them) are all in cache already.

Can you really load 1700 files (presumably in multiple directories) in 200ms from a cold start?

#.think.in wrote #.think.in infoDose #12 (15th Dec - 19th Dec)
on 12-21-2008 5:50 PM

#.think.in infoDose #12 (15th Dec - 19th Dec)

Patrick Smacchia wrote re: Lessons learned from a real-world focus on performance
on 12-22-2008 3:03 AM

Will, indeed on a cold start files are taking more time to be loaded and it seems the analysis is 25% slower all in all (not only because src file loading). But cold start is just one on N, especially for a developer that works all day with the same set of source files.

Patrick Smacchia [MVP C#] wrote Using NDepend on large project, a success story
on 01-04-2009 1:44 PM

Sébastien Andreo just wrote a blog post explaining how using some NDepend features made the life of its

Community Blogs wrote Using NDepend on large project, a success story
on 01-04-2009 2:31 PM

Sébastien Andreo just wrote a blog post explaining how using some NDepend features made the life of its

Steve Nuchia wrote re: Lessons learned from a real-world focus on performance
on 01-29-2009 4:33 PM

"Tier" code isn't a performance floor until you've minimized the amount of it you're calling.  The comments about tweaking the open-source "tier" in this case miss the point, kinda.

You have to look at how you're using it.  I know nothing about the specifics of this example (came here from sutter's mill) but, generically, once I'm down to "library" and "system" code I want to look at the call graph and make sure I'm using good batch sizes, evaulate using streaming versions, whatever is appropriate to the situation.  Usually I can cut the "tier" time down substantially by changing the way I'm using it.

TRS-80 vs. Alpha, and Parallel Optimization | ???????????? wrote TRS-80 vs. Alpha, and Parallel Optimization | ????????????
on 02-12-2009 8:46 PM

Pingback from  TRS-80 vs. Alpha, and Parallel Optimization | ????????????

Patrick Smacchia [MVP C#] wrote My 100th blog post: Top 5 development practices you should care for
on 03-25-2009 3:49 AM

This is a modest number but I am happy to have reached it, especially taking account that I spend weekly

Community Blogs wrote My 100th blog post: Top 5 development practices you should care for
on 03-25-2009 3:59 AM

This is a modest number but I am happy to have reached it, especially taking account that I spend weekly

Patrick Smacchia [MVP C#] wrote Micro- optimization tips to increase performance
on 04-19-2009 8:29 AM

In a previous post Lesson learned from a real-world focus on performance , I explained how we increased

Add a Comment

(required)  
(optional)
(required)  
Remember Me?