Fun with Folds

As I’ve announced earlier, and if you follow me on Twitter, I’ve been doing a bit of Haskell lately through the Real World Haskell book club that I started.  Most recently, through our learnings, we have been covering the basic fundamentals of functional programming.  The most fundamental piece to this is to understand how to apply the basic three higher-order functions, map, filter and fold.  This time, we’re covering the fold function in a neat challenge to convert functions which use explicit recursion to use folds instead.  So, what are folds anyways?

Understanding Folds

foldr What is a fold anyhow?  Simply put, a fold is a higher-order function that knows how to reduce a given data structure (typically a sequence of elements) into a single return value.  We could describe it as  doing something to each value in the list, updating an accumulator as we go, and returning the value of the accumulator when we’re finished.  There are two sorts of folds, a left fold and a right fold. The difference is the way the data is “folded”, as we will discuss here shortly.

The Left Fold

The left fold is used for folding from the left, otherwise known as the start of the list.  The basic definition of the left fold is the following in Haskell:

Left Fold
foldl :: (a -> b -> a) -> a -> [ b ] –> a
 

The left fold function takes a step function, an initial value for the accumulator, and the list itself.  The function calls the step function on the current accumulator and current element in the list, and then passes the new accumulator value to itself recursively to build up the accumulator until the list is exhausted.

To understand how this works, let’s walk through a simple example of the trite factorial, walking through each step of the calculation.  In this example, we’ll walk through each step of calculating the factorial of 3 in F#. 

List.fold_left (*) 1 [1..3] 
  = List.fold_left (*)
 (1 * 1)             (2::3::[]) 
  = List.fold_left (*) ((1 * 1) * 2)       (3::[]) 
  = List.fold_left (*)
 (((1 * 1) * 2) * 3) ([]) 
  =                    (((1 * 1) * 2) * 3)
  = 6

Now that we have a basic handle on how this is done, let’s look at the other fold function and what it can do for us.

 

The Right Fold

The other type of fold is the right fold, which folds from the right, or end of a given list.  The basic definition of a right fold is the following in Haskell:

Right fold
foldr :: (a -> b -> b) -> b -> [ a ] –> b
 

The signature looks very similar to the left fold, so in order to see the difference between the two, let’s walk through a right fold of our factorial once again.

List.fold_right (*) [1..3] 1
  = 1 *           List.fold_right (*)
 (2::3::[]) 1
  = 1 * (2 *      List.fold_right (*) (3::[]) 1)
  = 1 * (2 * (3 * List.fold_right (*)
 ([]) 1))
  = 1 * (2 * (3 * 1))
  = 6

The difference between the two should be a little bit more apparent now.  With a left fold, the empty list element is on the left, and the parentheses group to the left.  With our right fold, the 1 value is on the right, and the parentheses group to the right.

But, how useful is it?  Well, we could write any number of functions such as filter and map in terms of a right fold such as the following:

let filter’ p xs = 
  let step x ys = if p x then x::ys else ys
  List.fold_right step xs []

let map’ f xs = 
  let step x ys = f x :: ys
  List.fold_right step xs []

And that brings us to our challenge today, of rewriting other functions using folds instead of explicit recursion.  But, first, let’s look at one fold function you may already be familiar with.

The LINQ Aggregate

Sure enough, the LINQ Aggregate function is an implementation of a left fold.  Let’s look at the signature of one of the Aggregate overloads to get an idea of how it maps to our above functions:

public static TAccumulate Aggregate<TSource, TAccumulate>(
  this IEnumerable<TSource> source, 
  TAccumulate seed, 
  Func<TAccumulate, TSource, TAccumulate> func);
 

As you can see, we have our source, the seed and a function which passes the the accumulator and the current element to calculate the new accumulator value.  We could now use LINQ to implement our factorial function once again such as the following in C#:

// Left fold
Enumerable.Range(1, 3).Aggregate(1, (a, n) => a * n)); 

// Right fold by way of a left fold
Enumerable.Range(1, 3).Aggregate<int,Func<int,int>>
  (x => x, (f, n) => x => f(x * n))(1));

LINQ was built on the standard three higher-order functions of map (Select), filter (Where) and Fold (Aggregate).  Understanding the fold is probably the more difficult to understand of them, but one of the most powerful.  Bart De Smet covers more of the subject in detail in his post Folding left, right and the LINQ aggregation operator.

But, now it’s time we stop postponing and get to the challenge!

The Challenge

Question 10 in Chapter 4 of the Real World Haskell book gives a particularly interesting challenge.  How many of the following standard Prelude functions can be rewritten as folds:

  • any
  • cycle
  • words
  • unlines

Let’s take each one of these and determine whether you can take the standard definition and rewrite them using folds.  The first function is the any function which is defined here:

Applies a predicate to each item in the list to see if 
any satisfies
any :: (a -> Bool) -> [ a ] -> Bool
any _ []     = False
any p (x:xs) = p x || any p xs

Or as written in F#, might look something like this:

let rec any (p:‘a -> bool) : ‘a list -> bool = function
  | []    -> false
  | x::xs -> p x || any p xs;;

 

The logic for this function is rather straight forward:

  • If the incoming list is empty, then return false
  • Else, apply a logical-or to the predicate of the head and the recursive call to the any function to determine the result

The behavior of the function should operate as the following:

ghci> any even [1,3,5,7,8,9]
True

The question is, can we rewrite this function as a fold, and if so how?  The answer is actually quite simple.  By applying a right fold, we are able to step through each value and pass a Boolean accumulator which allows us to determine if any matched our predicate function.  We won’t bother applying the function to our item in the list if we should already be true.  Knowing this, could now rewrite the any such as the following:

any’ :: (a -> Bool) -> [ a ] -> Bool
any’ p = foldr step False
               where step x acc = acc || p x

Just as easily, we could write this in F# as well using the List.fold_right, such as this:

let any’ (p:‘a -> bool) (xs:‘a list) : bool =
  let step x acc = acc || p x
  List.fold_right step xs false

 

We can now verify the behavior of our any’ function such as the following:

> let even x = x % 2 = 0
  any’ even [1;3;5;7;8;9];;
val it : bool = true

Our next challenge is the cycle function.  This function is defined as the following:

Takes a finite list and turns it into an infinite circular one
cycle :: [ a ] -> [ a ]
cycle [] = error “Prelude.cycle: empty list”
cycle xs = xs’ where xs’ = xs ++ xs’

The behavior of the function should operate as the following:

ghci> let c = take 10 . cycle
ghci> c [1..3]
[1,2,3,1,2,3,1,2,3,1]

Since Haskell is a lazily evaluated language, such constructs as these work nicely to create an infinite list.  And since F# is not, let’s skip this example and focus on Haskell only.  Trying to replicate the code here will only cause nice StackOverflowExceptions to occur and we don’t want that.  So, how would we solve this problem using a fold?  Consider the following logic:

  • Apply a right fold with a concat operator to bind collections
  • Repeat input using the repeat function instead of recursion

Our solution might look something like this:

cycle’ :: [ a ] -> [ a ]
cycle’ xs = foldr (++) [] (repeat xs)

We can verify the behavior of our code now such as the following:

ghci> let c = take 10 . cycle’
ghci> c [1..3]
[1,2,3,1,2,3,1,2,3,1]

The next challenger is the words function.  This function is defined as the following:

Breaks a string into a list based upon whitespace
words :: String -> [String]
words s = case dropWhile isSpace s of
            “” -> []
            s’ -> w : words s”
                  where (w, s”) =
                         break isSpace s’

The behavior of the function should operate as the following:

ghci> words “foo bar”
["foo","bar"]

For those who are coming from the F# world, you may realize that Haskell treats strings as character lists, as opposed to an IEnumerable<char>.  Looking at the above code sample, how might we be able to use a fold instead of recursion here?  Consider the following logic:

  • Accumulate a list of strings as we build our result
  • If we encounter a space, we drop it, and then cons an empty to the accumulator
  • Else, we cons the character to our list.

Let’s see how that looks in code:

words’ :: String -> [String]
words’ [] = [[]]
words’ s = foldr step [[]] s
              where step c acc@(a:as’)
                      | isSpace c = [] : acc
                      | otherwise = (c:a):as’

Of course, we could write this using F# as well, although it’s not quite as elegant, due to the fact that strings are not considered character lists, but instead sequences.  Let’s go over briefly what the code might look like:

module Char =
  let is_space = System.Char.IsWhiteSpace

module String =
  let of_list (xs:char list) : string =
    new string(Array.of_list xs)

let words’ : string -> string list =
  let words” : char list -> char list list = function
    | [] -> [[]]
    |-> let step c acc =
              let (a::as) = acc
              if Char.is_space c then [] :: acc
              else (c::a)::as
            List.fold_right step s [[]] 
  Seq.to_list >> words” >> List.map String.of_list

We can verify the behavior of the function in F# such as the following:

> words’ “foo bar”;;
val it : string list = ["foo"; "bar"]

The final challenge is the unlines function.  It is defined as the following:

Joins strings together with a line terminator in between
unlines :: [String] -> String
unlines [] = []
unlines (l:ls) = l ++ ‘\n’ : unlines ls

What this function does is to take each line and concat the a cons of an ‘\n’ character plus the recursive call to unlines.  Knowing what we know from the above examples, this should also be easy.  How might we do it?  Let’s take the following logic:

  • Use a fold right to enumerate the lines
  • Create a step function which first concats the \r\n (Windows style)  and then concats the rest of the list
  • Use an empty list accumulator
unlines’ :: [String] -> String
unlines’ = foldr step []
             where step = ((++) . (++ “\r\n”))

 

 

Using some of these techniques, we could also apply this to F# as well.  Taking the List module from above, we could write the code as the following:

let flip f x y = f y x
let unlines’ (xs:string list) : string =
  let step = ((@) << (flip (@) (Seq.to_list “\r\n”)))
  List.fold_right step (List.map Seq.to_list xs) [] |> String.of_list

We can now evaluate the results of our function such as the following which should follow our expected behavior.

> unlines’ ["foo";"bar"];;
val it : string = “foo\r\nbar\r\n”

So the answer to the question is, yes, all of the above functions can be expressed in terms of a fold function.  And I think we learned a bit along the way as well.

Conclusion

In functional programming, the use of folds are extremely common.  With a little practice, as I’ve shown above, we will have an easier time understanding how to use folds instead of explicit recursion.  Indeed, many of the problems for which we use explicit recursion to solve could easily be solved with folds. 

Why use folds over explicit recursion?  Because using folds gives us well defined and explicit behavior, whereas explicit recursion may take many forms and harder to get just right.  By applying the basic three higher-order functions, we only need to learn this behavior once and then anyone who looks at our code will quickly understand our intent.  Because of this, you’ll be able to write much more concise code.

This entry was posted in C#, F#, Functional Programming, Haskell, User Groups. Bookmark the permalink. Follow any comments here with the RSS feed for this post.
  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @Daniel

    Ok, I was using QuickCheck to test my implementations such as this and found out no issues:

    prop_any’ :: [Int] -> Bool
    prop_any’ xs = any’ even xs == any even xs

    Which is why I thought it worked. Interesting case though what you found. I guess I need to re-evaluate.

    Matt

  • Daniel Kullmann

    Hi Matthew,

    I think your answer to any is wrong: There is no equivalent implementation using fold. This is because the function any stops evaluation of the input list if it finds an element for which the predicate is true. So for the following lists, any and any’ show different behaviour:
    [1,2,undefined] and
    [1..]

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @Sean,

    Interesting, and that’s usually what I refer to when I’m talking about folds is explicitly lists, but I’ll be sure to make sure of my language next time.

    Matt

  • http://podwysocki.codebetter.com Matthew.Podwysocki

    @Larry,

    Thanks for noticing. I should have checked that before I posted because I’ve run into this problem before.

    Matt

  • http://splonderzoek.blogspot.com/ Sean Leather

    Your description of a fold as “a higher-order function that knows how to reduce a given data structure (typically a sequence of elements) into a single return value” is better known as a “crush” in the literature:

    http://www.citeulike.org/user/spl/tag/crush

    For lists, the fold and the crush are the same. The generic programming library EMGM

    http://www.cs.uu.nl/wiki/GenericProgramming/EMGM

    actually has a generic crush function that can be used with many datatypes.

    http://hackage.haskell.org/packages/archive/emgm/0.2/doc/html/Generics-EMGM-Functions-Crush.html

    A fold (also called a catamorphism) is actually something that replaces each constructor in an algebraic datatype with a new value. Here’s some references.

    http://www.citeulike.org/user/spl/tag/fold

    But the Haskell libraries will try to tell you otherwise, considering the type class Foldable and various things called ‘fold’.

  • http://blog.kamil.dworakowski.name Kamil Dworakowski

    your implementation of words’ is wrong ;P

    for a start: words “” != words’ “”, moreover it doesn’t handle double spaces, I think

    words” xs = snd $ foldr f (“”, []) (‘ ‘:xs) where
    f ‘ ‘ (“”, ws) = (“”, ws)
    f ‘ ‘ (w, ws) = (“”, w:ws)
    f x a = first (x:) a

  • Larry Lary

    You’ve got some amusing emoticon substitution going on for ( b ) and ( a ) up there…