Recursing into Recursion – Memoization

Lately, I’ve been heads down on a lot of concurrency items which will hopefully come out soon.  In the mean time, I want to get back to the basics one last time with recursion.  As I posted earlier, I’ve been talking about recursion in the past couple of posts, and this one will be one last on the topic.  I’ll do my best to post the samples in both C# and F#.  As always, you can find my C# samples on MSDN Code Gallery at the Functional C# Samples.

Let’s catch up to where we are today:

Memoization

When doing heavy computations through recursion, memoization becomes a pretty important topic.  The idea behind memoization is to speed up your programs by avoiding repetitive calculations for previously processed function inputs.  Let’s go ahead and try a simple example using the ever so trite Fibonacci sequence as a quick example.

F#

let rec fibonacci n =
  if n <= 2 then 1 
  else fibonacci(n-1) + fibonacci(n-2)

C#

static int Fibonacci(int n)

{

    return n <= 2 ? 1 : Fibonacci(n – 2) + Fibonacci(n – 1);

}

But, if you call these above functions time and time again with nearly the same input, they have to go through the same computation again and again, which isn’t necessary.  So, instead, let’s try through the technique of memoization to speed up this computation.

F#

let fibonacci = 
  let t = new Dictionary<_,_>()

  let rec fibCached n = 
    if t.ContainsKey(n) then t.[n]

    elif n <= 2 then 1

    else

      let res = fibCached(n – 2) + fibCached(n – 1)

      t.Add(n, res)

      res

  fun n -> fibCached n

C#

public static Func<int, int> Fibonacci()

{

    var t = new Dictionary<int, int>();

    Func<int, int> fibCached = null;

    fibCached = n =>

    {

        if (t.ContainsKey(n)) return t[n];

        if (n <= 2) return 1;

        var result = fibCached(n – 2) + fibCached(n – 1);

        t.Add(n, result);

        return result;

    };

    return n => fibCached(n);

}

As you can see from the above code, we’re using a Dictionary<TKey, TValue> as our storage for already computed values.  Then we go ahead and compute the values if we have not seen them before.  We’ll see a pretty impressive speed boost when calling with repetitive values.  But, that’s not a generic solution, and I’d rather have the ability to take a generic function and pass any function to this and compute.  So, let’s go ahead and make this generic.

F#

let memoize (f : ‘a -> ‘b) =

  let t = new System.Collections.Generic.Dictionary<’a, ‘b>()

  fun n ->

    if t.ContainsKey(n) then t.[n]

    else

      let res = f n

      t.Add(n, res)

      res

  
let rec fibonacci= 
  memoize(fun n -> if n <= 2 then 1 else fibonacci(n – 1) + fibonacci(n – 2))

C#

public static Func<T, TResult> Memoize<T, TResult>(this Func<T, TResult> func)

{

    var t = new Dictionary<T, TResult>();

    return n =>

    {

        if (t.ContainsKey(n)) return t[n];

        
        var result = func(n);

        t.Add(n, result);

        return result;

    };

}

static int Fibonacci(int n)

{

    return n <= 2 ? 1 : Fibonacci(n – 2) + Fibonacci(n – 1);

}

Func<int, int> fibonacci = Fibonacci; 
var fibonacciMemoized = fibonacci.Memoize();

var fibFast1 = fibonacciMemoized(30);

As you can see, I was able to create a generic memoize function that takes a function as its input, checks whether the item is in the cache and if not, adds it to the cache.  Using the C# version, we created an extension method for our given Func<TArg, TResult>.  This allows us to memoize the function correctly.  Alternatively, I could have written the memoized function this way:

C#

static Func<int, int> Fibonacci()

{

    return Extensions.Memoize((int n) => n <= 2 ? 1 : Fibonacci(n – 2) + Fibonacci(n – 1));

}

var fibonacci = Fibonacci;

var fibFast1 = fibonacci(30);

To me, this reads a bit more of functional programming and the true power of it.  I basically took the above methods and combined them so that my Fibonacci function now returns the function instead taking an integer and returning an integer.

Beware of the Bad Memoization

In order to get memoization to work properly, you must remember not to include the value to the function when declaring the function.  This will cause a fresh memoization table to be created each and every time.  Such functions such as these should be avoided as below:

F#

let rec fibonacciIncorrect n =

  memoize (fun n -> if n <= 2 then 1 else fibonacciIncorrect(n – 2) + fibonacciIncorrect(n – 1)) n

C#

static int FibonacciIncorrect(int n)

{

    return Extensions.Memoize((int x) => x <= 2 ? 1 : FibonacciIncorrect(n – 2) + FibonacciIncorrect(n – 1))(n);

}

Generic Memoization Service

Due to the subtlety of problems mentioned above, it may be better to return an object instead of a function.  These can be pretty simple objects with no more functionality than to return values and discard the given collection.  Let’s go ahead and do this sample in F# first.  What I’m going to do is define the holder type for the calculated value:

type ITable<’a, ‘b> =

  inherit System.IDisposable

  abstract Item : ‘a -> ‘b with get

This defines for me an interface with a Discard method and an Item method which accepts ‘a and returns ‘b via a get accessor.  Pretty simple holder for my data.  Now, let’s define the memoize function which does the work:

let memoize f =

  let outerTable = new Dictionary<_,_>()

  { new ITable<_,_> with

      member t.Item

        with get(n)
          if outerTable.ContainsKey(n) then outerTable.[n]

          else

            let res = f n

            outerTable.Add(n, res)

            res

      member t.Dispose() =

        outerTable.Clear()

  }

What you can see from the above is that I’m creating an inner class of type ITable which I defined above.  Then I implement the given methods and properties I defined on the interface for Item and Discard().  During the Item property, I’m checking whether I have the given item.  If I do, I return it from the collection, else I execute the given function and store the result.  During my discard, I give myself the ability to reset the cached values.  Now, let’s implement the Fibonacci sequence function once again with our new function that we defined above.

let rec fibonacci =

  memoize 
    (fun n ->

      if n <= 2 then 1 else fibonacci.[n - 1] + fibonacci.[n - 2])

Now with this function, I can execute the given methods directly on the class that was returned from the memoize function.  As you can see, I am getting values from the fibonacci object and calling the function once again on itself.  I can then calculate the given numbers while using this code below:

let result = fibonacci.[10]

printfn “%i” result

That’s pretty slick, isn’t it?  Now, let’s try in C#.  It’s not going to be as elegant nor to the point.  But, let’s define our Table class first.

public class Table<T, U> : IDisposable

{

    private readonly Dictionary<T, U> dictionary;

    private readonly Func<T, U> func;

    public Table(Dictionary<T, U> dictionary, Func<T, U> func)

    {

        this.dictionary = dictionary;

        this.func = func;

    }

    public U this[T n]

    {

        get

        {

            if (dictionary.ContainsKey(n))

                return dictionary[n];

            var result = func(n);

            dictionary.Add(n, result);

            return result;

        }

    } 

    public void Dispose()

    {

        dictionary.Clear();

    }

}

Now that this is defined, then the rest of our methods should come nicely as noted here.

static Table<T, U> MemoizeTable<T, U>(Func<T, U> func)

{

    return new Table<T, U>(new Dictionary<T, U>(), func);

}


public static Table<int, int> Fibonacci

{

    get

    {

        return Memoize((int n) => n <= 2 ? 1 : Fibonacci[n - 2] + Fibonacci[n - 1]);

    }

}

var fibonacci = Fibonacci;

var fibFast1 = fibonacci[30];

Not as elegant in the F# version because C# won’t let me create an anonymous inner class as F# does.  If you look around the F# codebase, you’ll find them using this plenty to define such things that return IEnumerable<T> for Map and so on.  Not the prettiest solution in C#, as you can well imagine.

Some Notes on Memoization

Don’t forget that proper memoization requires the functions that you are writing to be pure.  What that means is that given the same input, the same output will always be calculated and returned.  Also note that none of the code I wrote was particularly thread safe, so I would definitely recommend if you were going to worry about concurrency here with our shared mutable state, that you would use the appropriate locks and so on.

Wrapping It Up

Now I was able to show you how to speed up some of those recursive algorithms using memoization.  This allows for a bit better efficiency when doing large and pure calculations.  Next time, I’m going to start going back to my concurrency topics as I have a lot yet untouched.  With my recent presentations on the matter, there is plenty to talk about.

This entry was posted in C#, F#, Functional Programming. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • https://profiles.google.com/116252364275406822988 Tom

    Thanks!! very usefull

  • http://codebetter.com/members/Matthew.Podwysocki/default.aspx Matthew.Podwysocki

    @Andries

    Understood, and yes, it’s for more than just Fibonacci and yes I should have referred to where it came as an inspiration. In the future, I’ll try to refrain from any mention of Fibonacci and Factorials.

    Matt

  • Andries

    ‘Now I was able to show you how to speed up some …’
    Would be nice to refer to: Expert F#, Chap. 8 ‘Mastering F#: Common Techniques’ page 187 to 189. And it would be mutch nicer to use an other example. Now people thing momoizing is nice but only for fibonacci.
    But, I enjoyed your posts though.

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @Justin

    Well, it has a couple of meanings. One of them is that it has no side effects as well as if I give the same input, I will get the same output, every single time. Hope that clarifies it for you.

    Matt

  • http://www.codethinked.com Justin Etheredge

    I’m glad that you used the word “pure” instead of “deterministic” because I was under the misconception that “pure” simply meant that the function had no side effects. I didn’t think it had to do with the resulting value being deterministic or not. It forced me to look up the definition, and now I have learned something new today. Thanks!