Functional Programming Unit Testing – Part 4

In our previous installment, we talked about bringing together the traditional xUnit tests and QuickCheck property-based tests together in a single cohesive step.  For this installment, let’s talk about test coverage.

But, before we continue, let’s get caught up to where we are today:

 

Code Coverage

Code coverage is an important metric used as part of our design process to describe to what degree our source code has been tested.  The code coverage tools inspect the code directly as a form of white box testing of your code.  I believe having a high code coverage percentage is important, although such hard-line stances of 100% path code coverage required is most often unnecessary and is evil.  However, for some applications, such as safety-critical, some form of 100% coverage should be considered.

What do we consider as part of the criteria when we’re calculating code coverage?

  • Function coverage
    All functions in the program called?
  • Statement coverage
    All lines in the program called?
  • Branch coverage
    All control structures such as if/then/else evaluated to true and false?
  • Condition coverage
    All boolean sub-expressions evaluated to true and false?
  • Path coverage
    All possible routes through the program called?
  • Entry/exit coverage
    All possible call and return of the function executed?

Of course some of these are connected in some way such as the following:

  • Decision coverage includes statement coverage since exercising every branch must lead to exercising every statement.
  • Path coverage includes branch coverage.

Where should we focus?  Using such things as statement coverage, decision coverage, and/or condition/branch coverage, around 80-90% of code coverage would suffice.  Getting to 100% test code coverage is unrealistic and doesn’t always ensure quality, and the amount of energy required for this is wasteful. The number we’re looking for is somewhere greater than 80%.

We can use above metrics to determine how well we’re writing our tests for our applications.  For many algorithms, it’s important to ensure that we have our edge cases covered, especially those in safety-critical systems.  Let’s walk through an example in Haskell for code coverage.

 

Code Coverage with Haskell Program Coverage (HPC)

The Haskell Program Coverage (HPC) tool is a built-in extension to the Haskell compiler used to record and display the parts of the code that were executed during a run of your program.  With the criteria given above, we are able to record which functions, branches, expressions among other things were evaluated.

The HPC tool is designed to give you the following metrics:

  • Expressions used (Function coverage)
  • Boolean coverage
    • Guard coverage
    • if confitions
    • Qualifiers
  • Alternatives used
  • Local declarations used
  • Top-level declarations used

Let’s walk through an example of how to use this tool to your advantage.  In the previous post, I’ve shown some QuickCheck code that doesn’t give 100% code coverage so that I can show you how to better achieve it.  Let’s look at the example again.

First, let’s look at the implementation of the ROT13 algorithm again:

–file Encryption.hs
module Encryption(rot13) where

import Data.Char

rot13 :: String -> String 
rot13 = 
  map mapRot
  where mapRot :: Char -> Char
        mapRot c | c >= ‘A’ && c <= ‘Z’ = rot ‘A’ c
                 | c >= ‘a’ && c <= ‘z’ = rot ‘a’ c
                 | otherwise            = c
        rot :: Char -> Char -> Char 
        rot b c = chr $ (ord c – ord b + 13) `mod` 26 + ord b

 

Now, let’s look at our QuickCheck property-based tests to perform to ensure the correctness of our algorithm. 

— file EncryptionTests.hs
import Data.Char
import Data.List
import Encryption
import Test.Framework
import Test.Framework.Providers.QuickCheck
import Test.QuickCheck

instance Arbitrary Char where
  arbitrary   = elements ([‘A’..‘Z’] ++ [‘a’..‘z’])

Equal
prop_rot13_equals s = 
  rot13 s == rot13 s

Single is inequal to original
prop_rot13_single_notEquals s = 
  rot13 s /= s

Double is equal to original          
prop_rot13_double_equals s =   
  (rot13 . rot13) s == s

Distribution shapes should be equal  
prop_rot13_group_equals s = 
  getDistro s == getDistro (rot13 s)
  where getDistro = sort . map length . group . sort

tests = [
  testGroup "ROT13 Tests" [
    testProperty "prop_rot13_equals" prop_rot13_equals,
    testProperty "prop_rot13_single_notEquals" prop_rot13_single_notEquals,
    testProperty "prop_rot13_double_equals" prop_rot13_double_equals,
    testProperty "prop_rot13_group_equals" prop_rot13_group_equals]
]
    
main = defaultMain tests

 

In order for us to capture the test coverage data from HPC, we need to add the -fhpc flag to the command-line for compiling our tests such as this:

>ghc -fhpc EncryptionTests.hs make

After instrumenting the code, we then run our code in order to capture the results.  You may have noticed that it created a .hpc folder with a .mix file.  When we run our code, we get the following results as usual.

>EncryptionTests
ROT13 Tests:
  prop_rot13_equals: [OK, passed 100 tests]
  prop_rot13_single_notEquals: [OK, passed 100 tests]
  prop_rot13_double_equals: [OK, passed 100 tests]
  prop_rot13_group_equals: [OK, passed 100 tests]

         Properties  Total
Passed  4           4
Failed  0           0
Total   4           4

 

You will also note that it created a .tix file which captures the actual code coverage metrics.  Let’s now analyze the results of our run:

>hpc report encryptiontests
97% expressions used (95/97)
33% boolean coverage (1/3)
      33% guards (1/3), 1 always True, 1 unevaluated
     100% ‘if’ conditions (0/0)
     100% qualifiers (0/0)
66% alternatives used (2/3)
100% local declarations used (3/3)
100% top-level declarations used (8/8)

Analyzing the results, we realize we’ve made a mistake.  If you look back at our Arbitrary Char instance, we’re only using alphabetic characters.  The problem arises is that we’re not testing a portion of our rot13 function which takes a character that isn’t alphabetic.  But, when we change this, we have to be mindful that our tests will have to change as well.  Why?  Because the inequality check will not be successful if there are not letters involved.  Let’s make some changes and then check the results again.

instance Arbitrary Char where
  arbitrary   = elements ([‘A’..‘Z’] ++ [‘a’..‘z’] ++ "!@#$%^&*()" )

Single is inequal to original
prop_rot13_single_notEquals s =
  any isAlpha s ==> rot13 s /= s

Now we can recompile our code once again as we did above and do the run once more.

>hpc report encryptiontests
100% expressions used (99/99)
66% boolean coverage (2/3)
      66% guards (2/3), 1 always True
     100% ‘if’ conditions (0/0)
     100% qualifiers (0/0)
100% alternatives used (3/3)
100% local declarations used (3/3)
100% top-level declarations used (8/8)

Much better!  Now we have 100% coverage on our ROT13 implementation.  We also have the ability to dig deeper into the analysis through the use of the markup command.  This will generate web pages which contain drill-down information about our code metrics.   Below is a sample screen shot of my final results of my last run.

hpc_markup

This tool is quite powerful for the code analysis we need to ensure that we’re writing the right kind of tests for our specifications and implementations.  Now, let’s turn our attention to the F# world.  What options do we have?

 

Code Coverage with TestDriven.NET and NCover

Once again, the TestDriven.NET addition to F# saves us once again when it comes to code coverage.  With the integration of NCover, we have the ability to perform rather rich analytics on our code much like above using HPC.  Let’s take the code from the previous post and look at the relevant parts.

#light

namespace CodeBetter.Samples

module EncryptionTests =
  open System 
  open FsCheck
  open FsCheck.Generator
  open Xunit

  open Encryption
  open ListExtensions
  open FsCheckExtensions

  type CharGenerator =
    static member Chars = 
      elements([‘A’..‘Z’]
               [‘a’..‘z’])
  
  overwriteGenerators (typeof<CharGenerator>)
  
  let prop_rot13_equals s =
    propl (rot13 s = rot13 s)

  [<Fact>]
  let test_prop_rot13_equals() =  
    check config prop_rot13_equals

  let prop_rot13_double_equals s =
    propl ((rot13 >> rot13) s = s)

  [<Fact>]
  let test_prop_rot13_double_equals() =  
    check config prop_rot13_double_equals
    
  let prop_rot13_single_notEquals s =
    propl (rot13 s <> s)
  
  [<Fact>]
  let test_prop_rot13_single_notEquals() =  
    check config prop_rot13_single_notEquals
  
  let prop_rot13_group_equals s =
    let getDistro = ListExtensions.defaultSort >> 
                    ListExtensions.group >> 
                    List.map List.length >> 
                    ListExtensions.defaultSort
    propl (getDistro s = getDistro (rot13 s))
    
  [<Fact>]
  let test_prop_rot13_group_equals() =  
    check config prop_rot13_group_equals

 

In order to get the code metrics we need, simply right-click on the project and click Test With => Coverage.  This will bring up NCover explorer.  We can then browse our results to once again see our mistake.

ncover_failed

Now that we realize our mistake of not including normal characters, let’s make two changes.  First, let’s remove the char generator because the default should suffice.  Unlike the Haskell version, FsCheck comes with an arbitrary char instance already created.  Also, let’s ensure the success of the prop_rot13_single_notEquals function by ensuring that it contains at least one letter such as the following:

  let prop_rot13_single_notEquals s =
    List.exists Char.IsLetter s ==> 
      propl (rot13 s <> s)

This ensures that if we have at least one letter, we can ensure that the ROT13 transformation will make sure the two strings are not equal.  We can now prove our success by once again running the Test With => Coverage option and see the results as below.

ncover_success

 

Conclusion

Tools such as NCover and the Haskell Program Coverage tool, it can ensure our honesty when it comes to tests, and we get a glaring reminder when we don’t.  These tools, when combined with our traditional xUnit and property-based tests with saturation test generation can be a satisfying experience.  We’ve now covered the creation and combination of traditional xUnit tests with property-based tests and how to leverage code coverage as a tool for refining.  There is still more to be covered in this series which includes refactoring. 

This entry was posted in F#, Functional Programming, Haskell, TDD/BDD. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://fortysix-and-two.blogspot.com/ Kurt Schelfthout

    Matt,

    Yes, I’ll probably make a change to FsCheck at one point or another to deal with disposable generated values. I need to put my thinking cap on first…maybe a new generator combinator “dispose” would make sense. I’ll make a note of it on the issues page, so I don’t forget it…

    cheers,

    Kurt

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @Kurt

    Yes, it was pretty much a straight copy in order to get it to work. Glad to see that you see some interest here, and I think it’s huge for F# to have this technology.

    1) Ok, so noted on the propl properties. I can fix that

    2) Interesting in regards to the IRunner. I hadn’t thought about it that way in regards to the generated data. Will we see a fix for that then or will you keep it this way?

    Sure, feel free to link these posts.

    Matt

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @Stephen

    Thanks for the heads up, and it is now corrected.

    Matt

  • http://fortysix-and-two.blogspot.com/ Kurt Schelfthout

    Hi Matthew,

    Thanks for this series of posts, and thanks for giving FsCheck such a nice place in it. Glad to see the integration works als for xUnit, I’m an mbUnit user myself.

    Some (minor) remarks:

    -) FsCheck is “smart” enough to realize that a function returning bool or Lazy is a property. So you should be able to omit the “propl” combinator for most of your properties (except the one with the ==>)
    -) In your previous post in this series, I see that you copied (I assume.. :) ) the disposing of generated values from the FsCheck docs in your IRunner implementation. Since I recently ran into this myself, I’d like to add that this is not always what you want (e.g. when using the elements combinator to choose between open files: you want to close these files after _all_ tests, not every single one). Of course, in your example it doesn’t matter at all as you’re only generating Chars.
    -) Would you mind if I linked to your posts from the FsCheck codeplex page?

    thanks,

    Kurt

  • Stephan

    Hey Matthew, A very interesting post, but the screenshots aren’t displayed.