Consider a single-linked list. It looks something like
data List x = Node x (List x) | End
It is natural to define a folding function such as
reduce :: (x -> y -> y) -> y -> List x -> y
In a sense, reduce f x0
replaces every Node
with f
and every End
with x0
. This is what the Prelude refers to as a fold.
Now consider a simple binary tree:
data Tree x = Leaf x | Branch (Tree x) (Tree x)
It is similarly natural to define a function such as
reduce :: (y -> y -> y) -> (x -> y) -> Tree x -> y
Notice that this reduction has a quite different character; whereas the list-based one is inherently sequential, this new tree-based one has more of a divide-and-conquer feel to it. You could even imagine throwing a few par
combinators in there. (Where would you put such a thing in the list version?)
My question: Is this function still classified as a "fold", or is it something else? (And if so, what is it?)
Basically whenever anybody talks about folding, they always talk about folding lists, which is inherently sequential. I'm wondering whether "sequential" is part of the definition of what a fold is, or whether this is merely a coincidental property of the most commonly-used example of folding.
We can actually come up with a generic notion of folding which can apply to a whole bunch of different types. That is, we can systematically define a fold
function for lists, trees and more.
This generic notion of fold
corresponds to the catamorphisms @pelotom mentioned in his comment.
The key insight is that these fold
functions are defined over recursive types. In particular:
data List a = Cons a (List a) | Nil
data Tree a = Branch (Tree a) (Tree a) | Leaf a
Both of these types are clearly recursive--List
in the Cons
case and Tree
in the Branch
case.
Just like functions, we can rewrite these types using fixed points. Remember the definition of fix
:
fix f = f (fix f)
We can actually write something very similar for types, except that it has to have an extra constructor wrapper:
newtype Fix f = Roll (f (Fix f))
Just like fix
defines the fixed point of a function, this defines the fixed point of a functor. We can express all our recursive types using this new Fix
type.
This allows us to rewrite List
types as follows:
data ListContainer a rest = Cons a rest | Nil
type List a = Fix (ListContainer a)
Essentially, Fix
allows us to nest ListContainer
s to arbitrary depths. So we could have:
Roll Nil
Roll (Cons 1 (Roll Nil))
Roll (Cons 1 (Roll (Cons 2 (Roll Nil))))
which correspond to []
, [1]
and [1,2]
respectively.
Seeing that ListContainer
is a Functor
is easy:
instance Functor (ListContainer a) where
fmap f (Cons a rest) = Cons a (f rest)
fmap f Nil = Nil
I think the mapping from ListContainer
to List
is pretty natural: instead of recursing explicitly, we make the recursive part a variable. Then we just use Fix
to fill that variable in as appropriate.
We can write an analogous type for Tree
as well.
So why do we care? We can define fold
for arbitrary types written using Fix
. In particular:
fold :: Functor f => (f a -> a) -> (Fix f -> a)
fold h = h . fmap (fold h) . unRoll
where unRoll (Roll a) = a
Essentially, all a fold does is unwrap the "rolled" type one layer at a time, applying a function to the result each time. This "unrolling" lets us define a fold for any recursive type and neatly and naturally generalize the concept.
For the list example, it works like this:
Roll
to get either a Cons
or a Nil
fmap
.
Nil
case, fmap (fold h) Nil = Nil
, so we just return Nil
.Cons
case, the fmap
just continues the fold over the rest of the list.fold
ending in a Nil
--just like the standard foldr
.Now lets look at the types of the two fold functions. First, foldr
:
foldr :: (a -> b -> b) -> b -> [a] -> b
Now, fold
specialized to ListContainer
:
fold :: (ListContainer a b -> b) -> (Fix (ListContainer a) -> b)
At first, these look completely different. However, with a bit of massaging, we can show they're the same. The first two arguments to foldr
are a -> b -> b
and b
. We have a function and a constant. We can think of b
as () -> b
. Now we have two functions _ -> b
where _
is ()
and a -> b
. To make life simpler, let's curry the second function giving us (a, b) -> b
. Now we can write them as a single function using Either
:
Either (a, b) () -> b
This is true because given f :: a -> c
and g :: b -> c
, we can always write the following:
h :: Either a b -> c
h (Left a) = f a
h (Right b) = g b
So now we can view foldr
as:
foldr :: (Either (a, b) () -> b) -> ([a] -> b)
(We are always free to add parentheses around ->
like this as long as they're right-associative.)
Now lets look at ListContainer
. This type has two cases: Nil
, which carries no information and Cons
, which has both an a
and a b
. Put another way, Nil
is like ()
and Cons
is like (a, b)
, so we can write:
type ListContainer a rest = Either (a, rest) ()
Clearly this is the same as what I used in foldr
above. So now we have:
foldr :: (Either (a, b) () -> b) -> ([a] -> b)
fold :: (Either (a, b) () -> b) -> (List a -> b)
So, in fact, the types are isomorphic--just different ways of writing the same thing! I think that's pretty cool.
(As a side note, if you want to know more about this sort of reasoning with types, check out The Algebra of Algebraic Data Types, a nice series of blog posts about just this.)
So, we've seen how we can define a generic fold
for types written as fixed points. We've also seen how this corresponds directly to foldr
for lists. Now lets look at your second example, the binary tree. We have the type:
data Tree a = Branch a (Tree a) (Tree a) | Leaf a
we can rewrite this using Fix
by following the rules I did above: we replace the recursive part with a type variable:
data TreeContainer a rest = Branch rest rest | Leaf a
type Tree a = Fix (TreeContainer a)
Now we have a tree fold
:
fold :: (TreeContainer a b -> b) -> (Tree a -> b)
Your original foldTree
looks like this:
foldTree :: (b -> b -> b) -> (a -> b) -> Tree a -> b
foldTree
accepts two functions; we'll combine the into one by first currying and then using Either
:
foldTree :: (Either (b, b) a -> b) -> (Tree a -> b)
We can also see how Either (b, b) a
is isomorphic to TreeContainer a b
. Tree container has two cases: Branch
, containing two b
s and Leaf
, containing one a
.
So these fold types are isomorphic in the same way as the list example.
There is a clear pattern emerging. Given a normal recursive data type, we can systematically create a non-recursive version of the type, which lets us express the type as a fixed point of a functor. This means that we can mechanically come up with fold
functions for all these different types--in fact, we could probably automate the entire process using GHC Generics or something like that.
In a sense, this means that we do not really have different fold
functions for different types. Rather, we have a single fold
function which is very polymorphic.
I first fully understood these ideas from a talk given by Conal Elliott. This goes into more detail and also talks about unfold
, which is the dual to fold
.
If you want to delve into this sort of thing even more deeply, read the fantastic "Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire" paper. Among other things, this introduces the notions of "catamorphisms" and "anamorphisms" which correspond to folds and unfolds.
Also, I can't resist adding a plug for myself :P. You can see some interesting similarities between the way we use Either
here and the way I used it when talking about algebras in another SO answer.
There is in fact a deep connection between fold
and algebras; moreover, unfold
--the aforementioned dual of fold
--is connected to coalgebras, which are the dual of algebras. The important idea is that algebraic data types correspond to "initial algebras" which also define folds as outlined in the rest of my answer.
You can see this connection in the general type of fold
:
fold :: Functor f => (f a -> a) -> (Fix f -> a)
The f a -> a
term looks very familiar! Remember that an f-algebra was defined as something like:
class Functor f => Algebra f a where
op :: f a -> a
So we can think of fold
as just:
fold :: Algebra f a => Fix f -> a
Essentially, fold
just lets us "summarize" structures defined using the algebra.