What constitutes a fold for types other than list?

MathematicalOrchid picture MathematicalOrchid · May 7, 2013 · Viewed 8.7k times · Source

Consider a single-linked list. It looks something like

data List x = Node x (List x) | End

It is natural to define a folding function such as

reduce :: (x -> y -> y) -> y -> List x -> y

In a sense, reduce f x0 replaces every Node with f and every End with x0. This is what the Prelude refers to as a fold.

Now consider a simple binary tree:

data Tree x = Leaf x | Branch (Tree x) (Tree x)

It is similarly natural to define a function such as

reduce :: (y -> y -> y) -> (x -> y) -> Tree x -> y

Notice that this reduction has a quite different character; whereas the list-based one is inherently sequential, this new tree-based one has more of a divide-and-conquer feel to it. You could even imagine throwing a few par combinators in there. (Where would you put such a thing in the list version?)

My question: Is this function still classified as a "fold", or is it something else? (And if so, what is it?)

Basically whenever anybody talks about folding, they always talk about folding lists, which is inherently sequential. I'm wondering whether "sequential" is part of the definition of what a fold is, or whether this is merely a coincidental property of the most commonly-used example of folding.

Answer

Tikhon Jelvis picture Tikhon Jelvis · May 7, 2013

A Fold for Every Occasion

We can actually come up with a generic notion of folding which can apply to a whole bunch of different types. That is, we can systematically define a fold function for lists, trees and more.

This generic notion of fold corresponds to the catamorphisms @pelotom mentioned in his comment.

Recursive Types

The key insight is that these fold functions are defined over recursive types. In particular:

data List a = Cons a (List a) | Nil
data Tree a = Branch (Tree a) (Tree a) | Leaf a

Both of these types are clearly recursive--List in the Cons case and Tree in the Branch case.

Fixed Points

Just like functions, we can rewrite these types using fixed points. Remember the definition of fix:

fix f = f (fix f)

We can actually write something very similar for types, except that it has to have an extra constructor wrapper:

newtype Fix f = Roll (f (Fix f))

Just like fix defines the fixed point of a function, this defines the fixed point of a functor. We can express all our recursive types using this new Fix type.

This allows us to rewrite List types as follows:

data ListContainer a rest = Cons a rest | Nil
type List a = Fix (ListContainer a)

Essentially, Fix allows us to nest ListContainers to arbitrary depths. So we could have:

Roll Nil
Roll (Cons 1 (Roll Nil))
Roll (Cons 1 (Roll (Cons 2 (Roll Nil))))

which correspond to [], [1] and [1,2] respectively.

Seeing that ListContainer is a Functor is easy:

instance Functor (ListContainer a) where
  fmap f (Cons a rest) = Cons a (f rest)
  fmap f Nil           = Nil

I think the mapping from ListContainer to List is pretty natural: instead of recursing explicitly, we make the recursive part a variable. Then we just use Fix to fill that variable in as appropriate.

We can write an analogous type for Tree as well.

"Unwrapping" Fixed Points

So why do we care? We can define fold for arbitrary types written using Fix. In particular:

fold :: Functor f => (f a -> a) -> (Fix f -> a)
fold h = h . fmap (fold h) . unRoll
  where unRoll (Roll a) = a

Essentially, all a fold does is unwrap the "rolled" type one layer at a time, applying a function to the result each time. This "unrolling" lets us define a fold for any recursive type and neatly and naturally generalize the concept.

For the list example, it works like this:

  1. At each step, we unwrap the Roll to get either a Cons or a Nil
  2. We recurse over the rest of the list using fmap.
    1. In the Nil case, fmap (fold h) Nil = Nil, so we just return Nil.
    2. In the Cons case, the fmap just continues the fold over the rest of the list.
  3. In the end, we get a bunch of nested calls to fold ending in a Nil--just like the standard foldr.

Comparing Types

Now lets look at the types of the two fold functions. First, foldr:

foldr :: (a -> b -> b) -> b -> [a] -> b

Now, fold specialized to ListContainer:

fold :: (ListContainer a b -> b) -> (Fix (ListContainer a) -> b)

At first, these look completely different. However, with a bit of massaging, we can show they're the same. The first two arguments to foldr are a -> b -> b and b. We have a function and a constant. We can think of b as () -> b. Now we have two functions _ -> b where _ is () and a -> b. To make life simpler, let's curry the second function giving us (a, b) -> b. Now we can write them as a single function using Either:

Either (a, b) () -> b

This is true because given f :: a -> c and g :: b -> c, we can always write the following:

h :: Either a b -> c
h (Left a) = f a
h (Right b) = g b

So now we can view foldr as:

foldr :: (Either (a, b) () -> b) -> ([a] -> b)

(We are always free to add parentheses around -> like this as long as they're right-associative.)

Now lets look at ListContainer. This type has two cases: Nil, which carries no information and Cons, which has both an a and a b. Put another way, Nil is like () and Cons is like (a, b), so we can write:

type ListContainer a rest = Either (a, rest) ()

Clearly this is the same as what I used in foldr above. So now we have:

foldr :: (Either (a, b) () -> b) -> ([a] -> b)
fold  :: (Either (a, b) () -> b) -> (List a -> b)

So, in fact, the types are isomorphic--just different ways of writing the same thing! I think that's pretty cool.

(As a side note, if you want to know more about this sort of reasoning with types, check out The Algebra of Algebraic Data Types, a nice series of blog posts about just this.)

Back to Trees

So, we've seen how we can define a generic fold for types written as fixed points. We've also seen how this corresponds directly to foldr for lists. Now lets look at your second example, the binary tree. We have the type:

data Tree a = Branch a (Tree a) (Tree a) | Leaf a

we can rewrite this using Fix by following the rules I did above: we replace the recursive part with a type variable:

data TreeContainer a rest = Branch rest rest | Leaf a
type Tree a = Fix (TreeContainer a)

Now we have a tree fold:

fold :: (TreeContainer a b -> b) -> (Tree a -> b)

Your original foldTree looks like this:

foldTree :: (b -> b -> b) -> (a -> b) -> Tree a -> b

foldTree accepts two functions; we'll combine the into one by first currying and then using Either:

foldTree :: (Either (b, b) a -> b) -> (Tree a -> b)

We can also see how Either (b, b) a is isomorphic to TreeContainer a b. Tree container has two cases: Branch, containing two bs and Leaf, containing one a.

So these fold types are isomorphic in the same way as the list example.

Generalizing

There is a clear pattern emerging. Given a normal recursive data type, we can systematically create a non-recursive version of the type, which lets us express the type as a fixed point of a functor. This means that we can mechanically come up with fold functions for all these different types--in fact, we could probably automate the entire process using GHC Generics or something like that.

In a sense, this means that we do not really have different fold functions for different types. Rather, we have a single fold function which is very polymorphic.

More

I first fully understood these ideas from a talk given by Conal Elliott. This goes into more detail and also talks about unfold, which is the dual to fold.

If you want to delve into this sort of thing even more deeply, read the fantastic "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire" paper. Among other things, this introduces the notions of "catamorphisms" and "anamorphisms" which correspond to folds and unfolds.

Algebras (and Coalgebras)

Also, I can't resist adding a plug for myself :P. You can see some interesting similarities between the way we use Either here and the way I used it when talking about algebras in another SO answer.

There is in fact a deep connection between fold and algebras; moreover, unfold--the aforementioned dual of fold--is connected to coalgebras, which are the dual of algebras. The important idea is that algebraic data types correspond to "initial algebras" which also define folds as outlined in the rest of my answer.

You can see this connection in the general type of fold:

fold :: Functor f => (f a -> a) -> (Fix f -> a)

The f a -> a term looks very familiar! Remember that an f-algebra was defined as something like:

class Functor f => Algebra f a where
  op :: f a -> a

So we can think of fold as just:

fold :: Algebra f a => Fix f -> a

Essentially, fold just lets us "summarize" structures defined using the algebra.