Why is Lambda Calculus?

Functions in Intension

In the modern view of mathematical foundations, sets are first-class objects. Everything you can talk about, can be encoded as a set- sets are sets, ordered pairs (a,b) are sets \{a,\{a,b\}\}, and functions are sets of ordered pairs. A function, then, is nothing more than a dictionary which takes an object and associates it to exactly one other object almost artificially. This viewpoint is great for thinking about ‘arbitrary’ functions, permutations are great examples. But is that really all there is to it? When asked what is 2+3 your computer does not open the +-dictionary and look up (2,3), instead it takes the input and performs a series of logical operations on the input to convert it to the desired result. (Indeed, storing the entire +-dictionary is not even physically possible!)

And in intension, that is what you would expect a function to be- a recipe, a rule of computation, a ‘box’ which takes in some object and spits out another (depending only on the input). This completely different approach to foundations of mathematics, where functions are first-class objects, led to what is called Lambda calculus. This provides a rich language where, unlike sets which are simple unordered lists, functions are living breathing objects that take arguments and can be composed. We then have a very powerful model of computation.

The Language \Lambda

Suppose \texttt{square} defines a function that takes in x and spits out x^2. This ‘rule applied to object’ approach can be written symbolically in a variety of ways-

    \begin{align*} \texttt{square}(x)=x^2 \\ x\xrightarrow{\texttt{square}}x^2 \\ \texttt{square}=\lambda x .x^2 \end{align*}

The last one is due to Alonzo Church, who (quite arbitrarily) chose \lambda to denote a quantifier that specifies what object the rule should be applied to. Church’s notation has one notable advantage over the rest- the name of the function and its description are separated and made equivalent- writing (\lambda x. x^2) is the same as writing \texttt{square}. To see how powerful this is, contrast how the two systems would express the same fact:

    \begin{align*} (\lambda x.x^2)3=9 \\ \texttt{square}(x)=x^2, \quad \texttt{square}(3)=9 \end{align*}

The latter system (which most mathematicians are used to) needs to introduce an arbitrary name to express even the simplest of facts, whereas Church’s notation reduces the name to what it should be- mere convenience. Multi-argument functions would then be written like \texttt{sum}=\lambda xy.x+y, a function that takes in two numbers and returns their sum. Frege realised that by letting the output itself be a function, this input-output correspondence could be made one-to-one while keeping all the expressive power. Then by partially applying \texttt{sum} to a single argument a, we get another function \texttt{sum}\, a = \lambda x. (x + a). This in turn means that the original function can be written as \texttt{sum}= \lambda y . \texttt{sum}\, y \,(x) = \lambda y.\lambda x. x + y. Although originally due to Frege, this idea of partially applying functions was rediscovered by Haskell Curry and is called _currying_: it says functions X\times Y \rightarrow Z are the same as functions X \rightarrow (Y \rightarrow Z). And this should be natural, for the first corresponds to Z^{X\times Y} while the latter is in (Z^Y)^X.

In principle, these ideas could lead to a foundation for all mathematics, solely in terms of functions and their application. The \lambda-calculus is an attempt at capturing this idea, that eventually paved the way for modern functional programming. This formalization by Alonzo Church gave, at the very least, an elegant notation for manipulating functions.

Conventions

  • Still keeping the old convention, we choose to read \lambda x y z .F as \lambda x.(\lambda y.( \lambda z. F)).
  • Left associativity is the natural convention to follow for application- Fxyz=((Fx)y)z.

Definition The set of \lambda-terms is defined inductively as

  • Variables are \lambda-terms.
  • If M,N are \lambda-terms then so is MN. Terms formed this way are called application terms.
  • If M is a \lambda-term and x is a variable then \lambda x . M is a \lambda-term. Terms formed this way are called abstraction terms.

This is a purely syntactic notion- we now have a set of all possible terms made out of functions, abstraction and application. I like to think of application explicitly as a binary relation and compare the definition of \lambda-terms with how one would inductively define the set of terms for a given signature and quantifiers.

A Model of Computation

The question of when two functions are the same raises profound questions- if your view is limited to the extensional perspective of functions-as-dictionaries, then it is reasonable to call two functions the same if they are the same dictionary. But what about the intensional variant, functions-as-rules? On the domain of all neutral atoms the functions \texttt{Number of electrons} and \texttt{Number of protons} have the same behaviour as dictionaries, but they are clearly different in intention. What about \lambda x. (x+1) and \lambda x.  (x+2-1)? You know they mean the same thing, even though they appear different as rules. Lambda-calculus treats them differently, and is hence a hyperintensional theory. Since \lambda-terms are finite strings of symbols, we write M\equiv N (M is identical to N) if they correspond to exactly the same string.

This is more restrictive than it should be- for \lambda x. x and \lambda y.y are not identical. Indeed, relabelling all occurrences of any bound variable should not change the meaning of the term. Call such a transformation (replacing all bound occurrences of x in M with y) an \alpha-reduction. Call two terms \alpha-equivalent if there is a sequence of alpha reductions from one to the other. In abuse of notation, I shall use \equiv to denote this relation henceforth. We shall also assume that no term shall have a variable occuring as both free and bound (by applying enough \alpha-reductions to ensure this is true.)

Similarly one would naturally expect \lambda x. (Mx) to be the same function as M, however they are syntactically very different. Hence we introduce the notion of an \eta-reduction: if T is a term, we replace some subterm of form \lambda x.(Mx) with M.

If M and N are \lambda-terms, we write [N/x]M for the term obtained by replacing every free occurence of x inside M with N. Thus if x is bound in M then [N/x]M\equiv M. The quantifier \lambda and the binary relation of application work in synchrony- application indicates that the second term acts as an argument to the first, and abstraction with \lambda binds that argument to some variable. This forms the most important computational step in \lambda-calculus, a \beta-reduction. In particular, the term (\lambda x.M )N \beta-reduces to [N/x]M. More generally for a term T, call a subterm a \beta-redex if it has form (\lambda x. M )N. We say there is a 1-step \beta-reduction from T to T' (written T\rightarrow_{1\beta} T') if T' is obtained by replacing a single \beta-redex (\lambda x. M) N in T with [N/x]M. Write \rightarrow_{*\beta} for the transitive closure.

Terms without \beta-redexes are said to be \beta-normal or \beta-irreducible. If M\rightarrow_{*\beta}N and N is \beta-normal, we say N is a \beta-normal form of M.

These relations (and their equivalence closures) form an abstract rewriting system, and a very powerful model of computation. A \lambda-computation involves simply walking down the so-created digraph, and any relevant problem could be appropriately encoded in the terms such that walking down the tree would mean successively simplifying the problem and working towards a solution. Let us encode a very simple system as an example:

Boolean algebra

Since Boolean algebra revolves around two truth-values, one possible way to encode them is in the form of a choice:

    \[\texttt{True}=\lambda xy.x \quad \quad \texttt{False}=\lambda xy.y\]

This has the advantage that one can encode “\texttt{if} \; P \; \texttt{then} \; Q \; \texttt{else}\;R” simply as the term PQR: if P is \texttt{True} then the term automatically \beta-reduces to Q. The following are possible encodings of the logic gates: I encourage you to verify they work as intended.

    \begin{align*} \texttt{Not}=\lambda x. x \; \texttt{True} \; \texttt{False} \\ \texttt{And} = \lambda x y . x \; (y \; \texttt{True} \; \texttt{False})\; \texttt{False} \quad \quad \texttt{Or}=\lambda x y . x \; \texttt{True} \; (y \; \texttt{True} \; \texttt{False}) \end{align*}

Exercise: Encode \texttt{Xor}, \texttt{Implies}, \texttt{Equal}, \texttt{Equivalent}.

Consistency

A natural question that arises is the meaningfulness of this computation- it is not a very useful system if it cannot distinguish between \texttt{True} and \texttt{False}. We need some notion of consistency. Of course \lambda-calculus is not a logical theory so it does not make sense to ask if it proves false statements. We can however generalise the principle of explosion to call the theory inconsistent if it equates all pair of terms.

Since each reduction corresponds to a logical simplification, we see that two \lambda-terms are equivalent if they lie in the same connected component of the directed graph. Church and Rosser proved the remarkable result that there is a natural injection \{\beta-\texttt{irreducible terms}\}\rightarrow \{\texttt{connected components of the graph}\}, i.e. each connected component of the graph has at most one irreducible term, to which every point in the component can be reduced. It immediately follows that distinct \beta-normal forms like \texttt{True} and \texttt{False} cannot be equated, and the model of computation is consistent. The result is called the Church-Rosser Theorem.

The Church-Rosser Theorem

An abstract rewriting system (such as \lambda-calculus) is a relation \rightarrow on a set of points, thereby forming a directed graph. We say y is a successor of x (written x \xrightarrow{*} y) if there is some finite sequence of reductions from x to y (including x\xrightarrow{*}x. Points x and y are connected (written x \overset{*}{\leftrightarrow} y) if they are connected on the undirected graph (formed by forgetting directionality of \rightarrow). Points x and y are joinable (written x \downarrow y) if they have a common successor.

We say \rightarrow has the Church-Rosser Property if any two connected points are joinable. In symbols, x \overset{*}{\leftrightarrow} y \Rightarrow x \downarrow y, i.e. \exists z \; x \xrightarrow{*}z \text{ and } y\xrightarrow{*} z. Our task is then to precisely show that \beta-reduction has the Church-Rosser property.

We can show that the Church-Rosser property is equivalent to a simpler property, confluence. A reduction \rightarrow is confluent if for every x,y_1,y_2 such that y_1 \xleftarrow{*}x \xrightarrow{*} y_2, there is a z such that y_1 \xrightarrow{*}z\xleftarrow{*} y_2. In other words, all diversions eventually meet in a confluent system. This should be a natural property to desire in your rewrite system- think about how you simplify 1 \times 2 + 3 \times 4: there are two distinct redexes 1\times 2 and 3 \times 4, but the choice of which way you go does not influence the final answer.

Lemma The Church-Rosser property is equivalent to confluence, i.e. a reduction is confluent iff it has the Church-Rosser property.

Proof The following figure sums it up.

 

Theorem (Church-Rosser) \beta-reduction is confluent.

In what follows, we reconstruct a proof due to Komori, Matsuda and Yamakawa, who improved upon Takahashi’s elegant proof using the idea of parallel reductions. Since a diversion is created only when a term has multiple redexes, we employ the key idea of a Takahashi translation, which involves reducing all \beta-redexes in a term simultaneously.

Definition The Takahashi translation T^* of a term T is defined inductively on the set of terms as:

  • x^* \equiv x
  • ((\lambda x. M)N)^*\equiv[N^*/x]M^*
  • (MN)^*\equiv M^*N^*
  •  (\lambda x. M)^*\equiv \lambda x. M^*

(Rules with lower number are prioritized.) We write M^{n*} for n Takahashi translations successively applied to M. This interacts well with \beta-reductions:

Lemma M \rightarrow_{*\beta} M^*.

Proof If M is a variable, then there is a 0-step reduction from M to M^*. Suppose the result is true for all terms with lower complexity than M, then we have

    • M \equiv \lambda x. P, then get M \equiv \lambda x.P \rightarrow_{*\beta} \lambda x. P^* by taking a proof of P \rightarrow_{*\beta}P^* and prepending \lambda x. to each step.
    • M \equiv P Q for P not an abstraction term, then M \equiv PQ \rightarrow_{*\beta}P^* Q \rightarrow_{*\beta} P^*Q^* \equiv M^*.
    • M \equiv (\lambda x. P) Q, then M \equiv (\lambda x.P)Q \rightarrow_{*\beta} (\lambda x.P) Q^* \rightarrow_{*\beta} (\lambda x.P^*) Q^* \rightarrow_{1\beta} [Q^*/x]P^*\equiv M^*.

Corollary If m\leq n then M^{m*} \rightarrow_{*\beta} M^{n*}.

Lemma M \rightarrow_{1\beta} N \Rightarrow M^* \rightarrow_{*\beta} N^*

Proof We consider the forms M and N could have, and assume the statement is true for lower complexity terms.

    • M \equiv \lambda x. P, in which case N \equiv \lambda x. P' for P\rightarrow_{1\beta} P'. Then the induction hypothesis says P^* \rightarrow_{*\beta} P'^*, and prepending \lambda x. to each term gives a proof of M^* \equiv \lambda x. P^* \rightarrow_{*\beta} \lambda x. P' ^* \equiv N^*.
    • M \equiv PQ for P not an abstraction term. Then we can have
      •  N \equiv P' Q  where  P \rightarrow_{1\beta} P', P' not an abstraction term: take the reduction P^* \rightarrow_{*\beta} P'^* and append Q^* at each stage.
      • N \equiv (\lambda x. P') Q where P \rightarrow_{1\beta} \lambda x. P': then M^* \equiv P^* Q^* \rightarrow_{*\beta} (\lambda x. P'^*) Q^* \rightarrow_{1\beta} [Q^*/x] P'^* \equiv N^*.
      • N \equiv PQ' where Q \rightarrow_{1\beta}Q': take the reduction Q^* \rightarrow_{*\beta} Q'^* and prepend P^* at each stage.
    • M \equiv (\lambda x.P)Q, so we can have
      • N \equiv (\lambda x. P') Q for P \rightarrow_{1\beta} P': take a proof of P^* \rightarrow_{*\beta} P'^* and replace each x with Q^* to get a proof of M^* \equiv [Q^*/x] P^* \rightarrow_{*\beta} [Q^*/x] P'^* \equiv N^*.
      • N \equiv (\lambda x. P) Q' for Q \rightarrow_{1\beta} Q': there are finitely many occurrences of x \in P, so finitely many occurrences of Q^* in M^*. Each occurrence can be reduced to Q'^* in finitely many steps, so we have M^* \equiv [Q^*/x]P^* \rightarrow_{*\beta} [Q'^*/x] P^* \equiv N^*.
      • N \equiv [Q/x] P: if Q is not an abstraction term, then it can be observed that [Q^*/x]P^* = ([Q/x]P)^*. If Q \equiv \lambda y. R, then M^* \equiv [\lambda y. R^* /x] P^* can have finitely many new redexes arising from occurrences like \dots xT \dots, each of which can be reduced further to get to ([\lambda y.R/x]P)^*. (This can, and should be formally proven by induction on structure of P).

Corollary M \rightarrow_{*\beta} N \Rightarrow M^{n*} \rightarrow_{*\beta} N^{n*}

These results should be intuitive since Takahashi translations simply specify an order for \beta-reductions, by dictating that you do not touch a newly created redex until you have exhausted all redexes at the base level. The same intuition also then says that it should be possible to ‘extend’ every single-step \beta-reduction to an entire Takahashi translation at that level.

Lemma M \rightarrow_{1\beta} N \Rightarrow N\rightarrow_{*\beta} M^{*}

This can be generalised to the case M \rightarrow_{n\beta} N, but we have to keep in mind that all the reductions need not be on the same level if we create new redexes at each stage. Keeping that in mind, we have

Lemma M \rightarrow_{n\beta} N \Rightarrow N\rightarrow_{*\beta} M^{n*}

Proof By induction on n: clearly true for n=0. Assume true for n, then M\rightarrow_{(n+1)\beta} N\Rightarrow M \rightarrow_{1\beta}R\rightarrow_{n\beta}N for some R. Using induction hypothesis, N\rightarrow_{*\beta} R^{n*}. Moreover, M\rightarrow_{1\beta}R\Rightarrow R\rightarrow_{*\beta}M^*\Rightarrow R^{n*}\rightarrow_{*\beta}M^{(n+1)*}. Then N \rightarrow_{*\beta} M^{(n+1)*} _\square.

Theorem (Church Rosser) If M \rightarrow_{*\beta}M_1 and M \rightarrow_{*\beta}M_2 then there is an N such that M_1 \rightarrow_{*\beta}N and M_2 \rightarrow_{*\beta}N.

Proof Suppose M \rightarrow_{m\beta} M_1 and M\rightarrow_{n\beta} M_2 and wlog n\geq m. Then M_2\rightarrow_{*\beta}M^{n*}, and M_1 \rightarrow_{*\beta} M^{m*}\rightarrow_{*\beta}M^{n*}. _\square

We have thus proved one of the most important results about \lambda-calculus, that the computations it does are consistent. It is immediate that each term has at-most one normal form, and thus the job of the computation is to get there. For instance, we have a very elegant computer that can simplify Peano-arithmetic expressions.

Encoding Peano arithmetic

We look at one possible encoding of \texttt{0} and \texttt{Succ}, the Church encoding, from which the rest of Peano arithmetic follows naturally.

    \[\texttt{0}= \lambda f x. x \quad \quad \texttt{Succ}=\lambda n. \lambda f x. f(nfx)\]

Essentially, the ‘natural number’ \texttt{n} takes a function and a value as inputs, and applies the function n times to the value. From this I can immediately write \texttt{1}=\texttt{Succ}\; \texttt{0}, and

    \[\texttt{Add}=\lambda m n. m\;\texttt{Succ}\; n \quad \quad \texttt{Mult}=\lambda mn. m \; (\texttt{Add} \; n )\; \texttt{0} \quad \quad \texttt{Exp}=\lambda mn. n \;(\texttt{Mult} \; m) \;\texttt{1}.\]

Exercise: Encode the above directly, without referring to previously defined operations. You should discover the rather curious and simple form \texttt{Exp}=\lambda m n . n \; m.

Difficult: Encode \texttt{Pred} and hence \texttt{Minus}.

A function F:\mathbb{N}\rightarrow\mathbb{N} is said to be \lambda-computable if it has an encoding f such that one can \beta-reduce f x to F(x). Turns out the set of all \lambda-computable terms is exactly the set of all Turing-computable functions, so the \lambda-calculus is exactly as strong as a Turing machine. This result is the Church-Turing thesis.

Loop = Loop

The simplicity of \lambda-calculus is what makes it expressive, but also non-terminating. Not everything necessarily has a normal form, for instance the term \texttt{loop} = (\lambda x. \, x x) (\lambda x. \, x x) is a fixed point of \beta-reduction (such programs that evaluate to themselves are termed quines.) More generally, you have the powerful fixpoint combinator \texttt{Y}= \lambda f. \; ( \lambda x. \, f (x x)) \,  (\lambda x. \, f (x x)) which takes a function and returns a fixed point for it (check this!). These recursive terms however lead to problems- especially if you are Haskell Curry, trying to investigate the foundations of mathematics. If \Lambda (or its sister theory, Combinatory logic) provided for good foundations, we would have a \lambda-term \texttt{P} that acted like logical implication- \texttt{P}xy would have to correspond to x \supset y, and \texttt{P} x x would have to be true for all x. The theorem x \supset (x \supset y) \Rightarrow x \supset y of logic would mean \texttt{P}x(\texttt{P}xy) = \texttt{P} x y would also be true.

Then for any z, we can construct the \lambda-term N = \lambda x. \texttt{P} \, x z, which checks if x implies z. The \texttt{Y}-combinator gives a fixed point for this, call it M = \texttt{Y} N = NM. Now starting from the axiom \texttt{P} M M, we have the following sequence of simplifications:

    \begin{align*} \texttt{P}MM &= \texttt{P}M(NM) \\ &= \texttt{P}M ((\lambda x. \texttt{P} x z) M ) \\ &= \texttt{P}M(\texttt{P}Mz) \\ &= \texttt{P} Mz \\ &= NM  \\ &= M \end{align*}

But then from the same axiom we have derived both \texttt{P}Mz \;( M \supset z) and M, and hence z which was arbitrary to begin with. The system could then prove every statement and hence be utterly useless, much to Mr. Curry’s disappointment. This is an instance of Curry’s Paradox, and in words it reads “If this sentence is true, then z“. At the heart of the paradox is self-reference, which lives in \texttt{Y}– aptly called Curry’s paradoxical combinator.

Such ill-foundation spells disaster, and a considerable part of the development of foundational theories is spent getting rid of self-reference. ZFC did it by the axiom of foundation, while those who preferred a functional world came up with a system of Types. But this should be the subject matter of  another post.

Leave a Reply

Your email address will not be published. Required fields are marked *