I am getting stuck with the Wikipedia description of the predecessor function in lambda calculus.
What Wikipedia says is the following:
PRED := λn.λf.λx. n (λg.λh. h (g f)) (λu.x) (λu.u)
Can someone explain reduction processes step-by-step?
Thanks.
Ok, so the idea of Church numerals is to encode "data" using functions, right? The way that works is by representing a value by some generic operation you'd perform with it. We can therefore go in the other direction as well, which can sometimes make things clearer.
Church numerals are a unary representation of the natural numbers. So, let's use Z
to mean zero and Sn
to represent the successor of n
. Now we can count like this: Z
, SZ
, SSZ
, SSSZ
... The equivalent Church numeral takes two arguments--the first corresponding to S
, and second to Z
--then uses them to construct the above pattern. So given arguments f
and x
, we can count like this: x
, f x
, f (f x)
, f (f (f x))
...
Let's look at what PRED does.
First, it creates a lambda taking three arguments--n
is the Church numeral whose predecessor we want, of course, which means that f
and x
are the arguments to the resulting numeral, which thus means that the body of that lambda will be f
applied to x
one time fewer than n
would.
Next, it applies n
to three arguments. This is the tricky part.
The second argument, that corresponds to Z
from earlier, is λu.x
--a constant function that ignores one argument and returns x
.
The first argument, that corresponds to S
from earlier, is λgh.h (g f)
. We can rewrite this as λg. (λh.h (g f))
to reflect the fact that only the outermost lambda is being applied n
times. What this function does is take the accumulated result so far as g
and return a new function taking one argument, which applies that argument to g
applied to f
. Which is absolutely baffling, of course.
So... what's going on here? Consider the direct substitution with S
and Z
. In a non-zero number Sn
, the n
corresponds to the argument bound to g
. So, remembering that f
and x
are bound in an outside scope, we can count like this: λu.x
, λh. h ((λu.x) f)
, λh'. h' ((λh. h ((λu.x) f)) f)
... Performing the obvious reductions, we get this: λu.x
, λh. h x
, λh'. h' (f x)
... The pattern here is that a function is being passed "inward" one layer, at which point an S
will apply it, while a Z
will ignore it. So we get one application of f
for each S
except the outermost.
The third argument is simply the identity function, which is dutifully applied by the outermost S
, returning the final result--f
applied one fewer times than the number of S
layers n
corresponds to.