CodeBetter.Com
CodeBetter.Com
RSS 2.0 via Feedburner
           Do you Twitter? Follow us @CodeBetter

Patrick Smacchia [MVP C#]

February 2008 - Posts

  • Iterators, LINQ deferred execution and prime numbers computation

    There is something that I am proud of. In my book Practical .NET2 and C#2 I presented the concept of deferred execution and deferred execution is a central concept in LINQ in-memory querying. The C# team made deferred execution easy to implement thanks to the concept of iterators introduced with C#2.

     

    At that time of C#2, 2 years ago, the community was wondering why the hell the C# team invested so much in such a tricky compiler artifact, that is basically only useful to save you a dozen lines of code when you implement your own collection classes. But thanks to the Wesner Moise thoughts in its blog, I realized that iterators are a good way to implement some funny stuff such as some pipeline patterns. This is illustrated by the example 47 of the Chapter 14

     

    Here is this example:

     

    using System.Collections.Generic;

    class Program {

       static public IEnumerable<int> PipelineIntRange(int begin, int end) {

          System.Diagnostics.Debug.Assert(begin < end);

          for (int i = begin; i <= end; i++) {

             System.Console.WriteLine("Yield:" + i);

             yield return i;

          }

       }

       static public IEnumerable<int> PipelineMultiply(int factor,

                                                   IEnumerable<int> input) {

          foreach (int i in input)

             yield return i * factor;

       }

       static public IEnumerable<int> PipelineFilterModulo(int modulo,

                                                   IEnumerable<int> input) {

          foreach (int i in input)

             if (i % modulo == 0)

                yield return i;

       }

       static public IEnumerable<int> PipelineJoin(IEnumerable<int> input1,

                                                   IEnumerable<int> input2) {

          foreach (int i in input1)

             yield return i;

          foreach (int i in input2)

             yield return i;

       }

       static void Main() {

          IEnumerable<int> pipeline = PipelineJoin(

             PipelineIntRange(-4, -2), PipelineFilterModulo(3,

                                          PipelineMultiply(2,

                                          PipelineIntRange(1, 10))));

          System.Console.WriteLine("Begin foreach:" + i);

          foreach (int i in pipeline)

             System.Console.WriteLine(i);

       }

    }

     

    This program outputs:

     

    Begin foreach:


    Yield:1
    Yield:2
    Yield:3
    6
    Yield:4
    Yield:5
    Yield:6
    12
    Yield:7
    Yield:8
    Yield:9
    18
    Yield:10
    Yield:-4

    -4
    Yield:-3
    -3
    Yield:-2
    -2 

     

    Here is a small explanation why the pipeline output the sequence  6 12 18 -4 -3 -2:

     

    PipelineIntRange(1,10)    1  2  3  4  5  6  7  8  9  10

    PipelineMultiply(2)       2  4  6  8  10 12 14 16 18 20

    PipelineFilterModulo(3)         6        12       18

    PipelineIntRange(-4, -2)                                -4 -3 -2  

    PipelineJoin                    6        12       18    -4 -3 -2

     

    What is funny is the way we obtain the sequence. The order the Console.WriteLine() we put into the program are executed shows 2 tricky things:

     

    • First, no computation happen before the foreach() instruction. The pipeline object actually represents a tree of pipeline node objects.

     

    • Second, each integer goes through all chain pipeline nodes (except if a node discard it) before the next integer is treated.

     

    This is deferred execution!

     

    Now, it is easy to rewrite this example with C#3 extension methods to make the code more LINQ compliant:

     

     

    using System.Collections.Generic;

    static class Program {

       static public IEnumerable<int> PipelineIntRange(int begin, int end) {

          System.Diagnostics.Debug.Assert(begin < end);

          for (int i = begin; i <= end; i++) {

             System.Console.WriteLine("Yield:" + i);

             yield return i;

          }

       }

       static public IEnumerable<int> PipelineMultiply(

                                                   this IEnumerable<int> input,

                                                   int factor) {

          foreach (int i in input)

             yield return i * factor;

       }

       static public IEnumerable<int> PipelineFilterModulo(

                                                   this IEnumerable<int> input,

                                                   int modulo) {

          foreach (int i in input)

             if (i % modulo == 0)

                yield return i;

       }

       static public IEnumerable<int> PipelineJoinWith(

                                                   this IEnumerable<int> input1,

                                                   IEnumerable<int> input2) {

          foreach (int i in input1)

             yield return i;

          foreach (int i in input2)

             yield return i;

       }

       static void Main() {

          IEnumerable<int> pipeline =

             PipelineIntRange(1, 10)

             .PipelineMultiply(2)

             .PipelineFilterModulo(3)

             .PipelineJoinWith(PipelineIntRange(-4, -2));

     

          foreach (int i in pipeline)

             System.Console.WriteLine(i);

       }

    }

     

    This time, the way the pipeline object is defined makes it much more look alike a LINQ query.

     

     

     

    More fun with prime number computation!

     

    The example 50 of the Chapter 14 is even more fun.

     

    This time, the deal is to chain iterators to compute prime numbers. This is possible thanks to an easy algorithm to compute prime numbers named Sieve of Erathostène. The following schema shows how the algorithm works:

     

    2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |

      3   5   7   9    11    13    15    17 | 3 is prime, 

                                            | 4,6,8,10,12,14,16,18 and

                                            | 20 are multiples of 2.

                                            |

          5   7        11    13          17 | 5 is prime, 9 and 21

                                            | are multiples of 3.

                                           

              7        11    13          17 | 7 is prime, the list doesn’t

                                            | contain any multiple of 5.

                                           

                       11    13          17 | 11 is prime, the list doesn’t

                                            | contain any multiple of 7.

                             13          17 | 13 is prime, the list doesn’t

                                            | contain any multiple of 11.

                                           

                                         17 | 17 is prime, the list doesn’t

                                            | contain any multiple of 13.

    Here is the code:

     

    using System;

    using System.Collections.Generic;

    class Program {

       static public IEnumerable<int> PipelineIntRange( int begin, int end ) {

          System.Diagnostics.Debug.Assert( begin < end );

          for ( int i = begin; i <= end; i++ )

             yield return i;

       }

       static public IEnumerable<int> PipelinePrime( IEnumerable<int> input ) {

          using ( IEnumerator<int> e = input.GetEnumerator() ) {

             e.MoveNext();

             int prime = e.Current;

             // The first number of the list is a prime.

             Console.WriteLine( prime );

             if ( prime != 0 ) {

                while ( e.MoveNext() ) {

                   // Remove all multiple of the found prime.

                   if ( e.Current % prime != 0 )

                      yield return e.Current;

                }

             }

          }

       }

       const int N = 100;

       static void Main() {

          // Apply the approximation of Gauss/de la Vallée Poussin

          // to get the number of iterators.

          int N_PRIME = (int)Math.Floor( ((double)N)/Math.Log(N) );

     

          // Build a pipeline of N_PRIME PipelineIntegerRange chained with

          // a PipelineIntegerRange. Each call to PipelinePrime yield an

          // iterator object that we store.

          List<IEnumerable<int>> list = new List<IEnumerable<int>>();

          list.Add(PipelinePrime( PipelineIntRange( 2, N ) ) );

          for( int i=1 ; i<N_PRIME ; i++ )

             list.Add( PipelinePrime(list[i-1]) );

     

          // Cascade the computation among iterators by yielding every

          // numbers between 2 and N.

          foreach ( int i in list[N_PRIME-1] );

       }

    }     

     

     

    Here also it is possible to make the example more clear thanks to C#3 extension methods. This time we don’t need a list anymore and we are directly chaining filters:

     

    using System;

    using System.Collections.Generic;

    static class Program {

       static public IEnumerable<int> PipelineIntRange( int begin, int end ) {

          System.Diagnostics.Debug.Assert( begin < end );

          for ( int i = begin; i <= end; i++ )

             yield return i;

       }

       static public IEnumerable<int> PipelinePrime(

           this IEnumerable<int> input ) {

          using ( IEnumerator<int> e = input.GetEnumerator() ) {

             e.MoveNext();

             int prime = e.Current;

             // The first number of the list is a prime.

             Console.WriteLine( prime );

             if ( prime != 0 ) {

                while ( e.MoveNext() ) {

                   // Remove all multiple of the found prime.

                   if ( e.Current % prime != 0 )

                      yield return e.Current;

                }

             }

          }

       }

       const int N = 100;

       static void Main() {

          // Apply the approximation of Gauss/de la Vallée Poussin

          // to get the number of iterators.

          int N_PRIME = (int)Math.Floor( ((double)N)/Math.Log(N) );

     

          // Build a pipeline of N_PRIME PipelineIntegerRange chained with

          // a PipelineIntegerRange. Each call to PipelinePrime yield an

          // iterator object that we store.

          IEnumerable<int> pipeline = PipelineIntRange(2,N);

     

          for( int i=1 ; i<N_PRIME ; i++ )

             pipeline = pipeline.PipelinePrime();

     

          // Cascade the computation among iterators by yielding every

          // numbers between 2 and N.

          foreach ( int i in pipeline);

       }

    }     

     

    This example is a good start for understanding how to build LINQ queries dynamically.

     

    In case you want to dig more in iterators under the hood, the entire section of my book I wrote about iterators is available as an article here. I explain what the C# compiler is doing under the hood to mimic the notion of co-routine.  There is also another tricky example that shows how iterators can be used to model the notion of continuation.

     

    Apart dealing with NDepend development I enjoy being a trainer on .NET, C# and also doin design and quality consultancy with NDepend. It is an excellent way to keep contact with real developers that have real problems that can be solved with tools such as NDepend. I am working closely with a bank with thousands of developers that is migrating most of its legacy to .NET. Hence, students I cop with are motivated developers with a solid background in C++ and Java. Thanks to LINQ I am now happy to show that things like iterators, co-routine or closures and more are not just advanced geeky things but are in fact a dramatic weapon to make code more expressive. This makes the Java guys, generally reluctant to learning .NET, more open minded about what a multi-language platform bring to the mainstream development. Some could say that Ruby was the initial starter but nobody can deny that LINQ is really really elegant.

     

    Also I would like to precise that I had the chance to proof-read a great book about LINQ  (LINQ in Action, Manning Publications) co-written by my friend Fabrice Marguerie. I plan to review the book soon. For now I can’t say much more than Go buy it! Fabrice and its 2 colleagues just made an awesome work at explaining new C#3 stuff, LINQ concepts, LINQ use cases (in-memory querying, LINQ2SQL, LINQ2XML…), LINQ under the hood tricky things, the multiple ways to extend LINQ for your own needs and best practices to use LINQ.

     

     

  • Code metrics on Coupling, Dead Code, Design flaws and Re-engineering

    There is a whole range of interesting code metrics relative to coupling. The simplest ones are named Afferent Coupling (Ca) and Efferent Coupling (Ce). Basically, the Ca for a code element is the number of code elements that use it and the Ce is the number of code elements that it uses.




    You can define Ca and Ce for the graph of assemblies dependencies, the graph of namespaces dependencies, the graph of types dependencies and the graph of methods dependencies of a code base. You can also define the Ca metric on the fields of a program as the number of methods that access the field. This leads to 9 metrics all supported by the tool NDepend. We precise that when computing Ce, NDepend takes account of code elements defined in tier code such as the code of the .NET framework.

     

    With NDepend, if you wish to know which methods of your program are massively used you can write the following CQL Query :

     

    SELECT TOP 10 METHODS ORDER BY MethodCa DESC

     

    Being used a lot is not necessarily a problem. However it is still interesting to know which part of your code base is used a lot. For example, if we apply the CQL query below to the core of the .NET framework (i.e mscorlib, System, System.Core, System.Xml…) we obtain the following 10 methods:

     

    methods

    Afferent coupling at method level (MethodCa)

    System.Environment.GetResourceString(String)

    2632

    System.Object..ctor()

    2416

    System.ArgumentNullException..ctor(String)

    2368

    System.String.get_Length()

    1795

    System.Type.GetTypeFromHandle(RuntimeTypeHandle)

    1762

    System.IDisposable.Dispose()

    1119

    System.SR.GetString(String)

    1115

    System.InvalidOperationException..ctor(String)

    1101

    System.Object.ToString()

    1056

    System.ArgumentException..ctor(String)

    1010

     

     

    High Efferent Coupling and design flaws

     

    If you wish to know which types of your program are heavy users of other types you just have to write:

     

    SELECT TOP 10 TYPES ORDER BY TypeCe DESC

     

    High Ce might reveal a design problem. Types with high Ce are entangled with many other implementations. The higher the Ce, the higher the number of responsibilities the type has. If we apply the CQL query above to the core of the .NET framework we obtain the following list:

     

    types

    Efferent coupling at type level (TypeCe)

    Microsoft.CSharp.CSharpCodeGenerator

    172

    Microsoft.VisualBasic.VBCodeGenerator

    161

    System.Net.HttpWebRequest

    138

    System.Net.Sockets.Socket

    137

    System.AppDomain

    131

    System.RuntimeType

    128

    System.Xml.Xsl.XsltOld.Compiler

    125

    System.Xml.Xsl.Xslt.QilGenerator

    124

    System.Xml.Serialization.XmlSchemaImporter

    120

    System.Diagnostics.Process

    116

     

    As expected this list contains high level classes such as AppDomain, Process or Socket. This sort of classes with high Ce are needed to implement complex concepts that spawn numerous concerns. For example, by selecting the 131 types used by the AppDomain class I can tell that AppDomain is concerned by assembly security (Code Access Security, strong naming…), Windows users and file security, OS environment info, .NET Remoting, .NET Reflection, Threading and globalization.
     

    Could the AppDomain class be split into several smaller classes? I guess no because this is such an essential class. But this is the exception. Generally, classes with high Ce would be more like the CSharpCodeGenerator class which represent alone a component. I can see that the class CSharpCodeGenerator deals with a lot of implementation detail of the C# language such as exception, type casting, all sorts of members (method, field, event, property…), comment, indentation etc. This sounds good as long as each details has its own implementation the class CSharpCodeGenerator acts as a high level mediator. But if I dig further and decompile some methods I can see that the class CSharpCodeGenerator also cops with file management and contains a lot of logic (>1000 Lines of Code) to handle some CodeDOM details. This is likely an indication that this code could have been better designed with the help of several smaller collaborating classes.

     

    Afferent Coupling and Dead Code

     

    Ce values are meaningful values when it comes to assessing some design and when you have to re-engineer code. Ca values are also useful, especially when equal to 0. A Ca value equals to 0 indicates a potential dead code element. A dead code element is an element that can be discarded because it is not used by the program anymore. Pruning dead code is a necessary task to make sure that your code is rationalized. Here also the tool NDepend can help you because it knows about Ca. However things get more complicated here because there are numerous cases where a zero Ca doesn’t mean dead code. For example, entry points (i.e Main methods), class constructors or finalizers represent some methods that will always have a zero Ca. However these methods are not dead code because the CLR will call them at runtime.

     

    Here is the CQL rule that we provide by default to detect dead methods:

     

    // <Name>Potentially unused methods</Name>

    WARN IF Count > 0 IN SELECT TOP 10 METHODS WHERE

     MethodCa == 0 AND            // Ca=0 -> No Afferent Coupling -> The method 

                                  // is not used in the context of this

                                  // application.

     

     !IsPublic AND                // Public methods might be used by client 

                                  // applications of your assemblies.

     

     !IsEntryPoint AND            // Main() method is not used by-design.

     

     !IsExplicitInterfaceImpl AND // The IL code never explicitely calls 

                                  // explicit interface methods implementation.

     

     !IsClassConstructor AND      // The IL code never explicitely calls class

                                  // constructors.

     

     !IsFinalizer                 // The IL code never explicitely calls

                                  // finalizers.

     

    Notice how we consider that public methods should be not considered as dead code in the general case. This rule generally matches a lot of false positive because when statically analyzing the IL code, we can see that often overridden implementations are not statically linked. Hence to get a first evaluation of dead code it is worth adding the restricting condition AND !IsVirtual. This particular issue will be addressed by further versions of NDepend.

     

    Things gets more easy and efficient when it comes to detect dead fields and dead types. Here are the 2 CQL rules we propose by default and their particular conditions to avoid false positive:

     

    // <Name>Potentially unused fields</Name>

    WARN