Why is Lambda Calculus?
Functions in Intension
In the modern view of mathematical foundations, sets are first-class objects. Everything you can talk about, can be encoded as a set- sets are sets, ordered pairs are sets , and functions are sets of ordered pairs. A function, then, is nothing more than a dictionary which takes an object and associates it to exactly one other object almost artificially. This viewpoint is great for thinking about ‘arbitrary’ functions, permutations are great examples. But is that really all there is to it? When asked what is your computer does not open the -dictionary and look up , instead it takes the input and performs a series of logical operations on the input to convert it to the desired result. (Indeed, storing the entire -dictionary is not even physically possible!)
And in intension, that is what you would expect a function to be- a recipe, a rule of computation, a ‘box’ which takes in some object and spits out another (depending only on the input). This completely different approach to foundations of mathematics, where functions are first-class objects, led to what is called Lambda calculus. This provides a rich language where, unlike sets which are simple unordered lists, functions are living breathing objects that take arguments and can be composed. We then have a very powerful model of computation.
Suppose defines a function that takes in and spits out . This ‘rule applied to object’ approach can be written symbolically in a variety of ways-
The last one is due to Alonzo Church, who (quite arbitrarily) chose to denote a quantifier that specifies what object the rule should be applied to. Church’s notation has one notable advantage over the rest- the name of the function and its description are separated and made equivalent- writing is the same as writing . To see how powerful this is, contrast how the two systems would express the same fact:
The latter system (which most mathematicians are used to) needs to introduce an arbitrary name to express even the simplest of facts, whereas Church’s notation reduces the name to what it should be- mere convenience. Multi-argument functions would then be written like , a function that takes in two numbers and returns their sum. Frege realised that by letting the output itself be a function, this input-output correspondence could be made one-to-one while keeping all the expressive power. Then by partially applying to a single argument , we get another function . This in turn means that the original function can be written as . Although originally due to Frege, this idea of partially applying functions was rediscovered by Haskell Curry and is called _currying_: it says functions are the same as functions . And this should be natural, for the first corresponds to while the latter is in .
In principle, these ideas could lead to a foundation for all mathematics, solely in terms of functions and their application. The -calculus is an attempt at capturing this idea, that eventually paved the way for modern functional programming. This formalization by Alonzo Church gave, at the very least, an elegant notation for manipulating functions.
- Still keeping the old convention, we choose to read as .
- Left associativity is the natural convention to follow for application- .
Definition The set of -terms is defined inductively as
- Variables are -terms.
- If are -terms then so is . Terms formed this way are called application terms.
- If is a -term and is a variable then is a -term. Terms formed this way are called abstraction terms.
This is a purely syntactic notion- we now have a set of all possible terms made out of functions, abstraction and application. I like to think of application explicitly as a binary relation and compare the definition of -terms with how one would inductively define the set of terms for a given signature and quantifiers.
A Model of Computation
The question of when two functions are the same raises profound questions- if your view is limited to the extensional perspective of functions-as-dictionaries, then it is reasonable to call two functions the same if they are the same dictionary. But what about the intensional variant, functions-as-rules? On the domain of all neutral atoms the functions and have the same behaviour as dictionaries, but they are clearly different in intention. What about and ? You know they mean the same thing, even though they appear different as rules. Lambda-calculus treats them differently, and is hence a hyperintensional theory. Since -terms are finite strings of symbols, we write ( is identical to ) if they correspond to exactly the same string.
This is more restrictive than it should be- for and are not identical. Indeed, relabelling all occurrences of any bound variable should not change the meaning of the term. Call such a transformation (replacing all bound occurrences of in with ) an -reduction. Call two terms -equivalent if there is a sequence of alpha reductions from one to the other. In abuse of notation, I shall use to denote this relation henceforth. We shall also assume that no term shall have a variable occuring as both free and bound (by applying enough -reductions to ensure this is true.)
Similarly one would naturally expect to be the same function as , however they are syntactically very different. Hence we introduce the notion of an -reduction: if is a term, we replace some subterm of form with .
If and are -terms, we write for the term obtained by replacing every free occurence of inside with . Thus if is bound in then The quantifier and the binary relation of application work in synchrony- application indicates that the second term acts as an argument to the first, and abstraction with binds that argument to some variable. This forms the most important computational step in -calculus, a -reduction. In particular, the term -reduces to . More generally for a term , call a subterm a -redex if it has form . We say there is a -step -reduction from to (written ) if is obtained by replacing a single -redex in with . Write for the transitive closure.
Terms without -redexes are said to be -normal or -irreducible. If and is -normal, we say is a -normal form of .
These relations (and their equivalence closures) form an abstract rewriting system, and a very powerful model of computation. A -computation involves simply walking down the so-created digraph, and any relevant problem could be appropriately encoded in the terms such that walking down the tree would mean successively simplifying the problem and working towards a solution. Let us encode a very simple system as an example:
Since Boolean algebra revolves around two truth-values, one possible way to encode them is in the form of a choice:
This has the advantage that one can encode “” simply as the term : if is then the term automatically -reduces to . The following are possible encodings of the logic gates: I encourage you to verify they work as intended.
Exercise: Encode , , , .
A natural question that arises is the meaningfulness of this computation- it is not a very useful system if it cannot distinguish between and . We need some notion of consistency. Of course -calculus is not a logical theory so it does not make sense to ask if it proves false statements. We can however generalise the principle of explosion to call the theory inconsistent if it equates all pair of terms.
Since each reduction corresponds to a logical simplification, we see that two -terms are equivalent if they lie in the same connected component of the directed graph. Church and Rosser proved the remarkable result that there is a natural injection , i.e. each connected component of the graph has at most one irreducible term, to which every point in the component can be reduced. It immediately follows that distinct -normal forms like and cannot be equated, and the model of computation is consistent. The result is called the Church-Rosser Theorem.
The Church-Rosser Theorem
An abstract rewriting system (such as -calculus) is a relation on a set of points, thereby forming a directed graph. We say is a successor of (written ) if there is some finite sequence of reductions from to (including . Points and are connected (written ) if they are connected on the undirected graph (formed by forgetting directionality of ). Points and are joinable (written ) if they have a common successor.
We say has the Church-Rosser Property if any two connected points are joinable. In symbols, , i.e. . Our task is then to precisely show that -reduction has the Church-Rosser property.
We can show that the Church-Rosser property is equivalent to a simpler property, confluence. A reduction is confluent if for every such that , there is a such that . In other words, all diversions eventually meet in a confluent system. This should be a natural property to desire in your rewrite system- think about how you simplify : there are two distinct redexes and , but the choice of which way you go does not influence the final answer.
Lemma The Church-Rosser property is equivalent to confluence, i.e. a reduction is confluent iff it has the Church-Rosser property.
Proof The following figure sums it up.
Theorem (Church-Rosser) -reduction is confluent.
In what follows, we reconstruct a proof due to Komori, Matsuda and Yamakawa, who improved upon Takahashi’s elegant proof using the idea of parallel reductions. Since a diversion is created only when a term has multiple redexes, we employ the key idea of a Takahashi translation, which involves reducing all -redexes in a term simultaneously.
Definition The Takahashi translation of a term is defined inductively on the set of terms as:
(Rules with lower number are prioritized.) We write for Takahashi translations successively applied to . This interacts well with -reductions:
Proof If is a variable, then there is a -step reduction from to . Suppose the result is true for all terms with lower complexity than , then we have
- , then get by taking a proof of and prepending to each step.
- for not an abstraction term, then .
- , then .
Corollary If then .
Proof We consider the forms and could have, and assume the statement is true for lower complexity terms.
- , in which case for . Then the induction hypothesis says , and prepending to each term gives a proof of
- for not an abstraction term. Then we can have
- where , not an abstraction term: take the reduction and append at each stage.
- where : then .
- where : take the reduction and prepend at each stage.
- , so we can have
- for : take a proof of and replace each with to get a proof of .
- for : there are finitely many occurrences of x \in , so finitely many occurrences of in . Each occurrence can be reduced to in finitely many steps, so we have .
- : if is not an abstraction term, then it can be observed that . If , then can have finitely many new redexes arising from occurrences like , each of which can be reduced further to get to . (This can, and should be formally proven by induction on structure of ).
These results should be intuitive since Takahashi translations simply specify an order for -reductions, by dictating that you do not touch a newly created redex until you have exhausted all redexes at the base level. The same intuition also then says that it should be possible to ‘extend’ every single-step -reduction to an entire Takahashi translation at that level.
This can be generalised to the case , but we have to keep in mind that all the reductions need not be on the same level if we create new redexes at each stage. Keeping that in mind, we have
Proof By induction on : clearly true for . Assume true for , then for some . Using induction hypothesis, . Moreover, . Then .
Theorem (Church Rosser) If and then there is an such that and .
Proof Suppose and and wlog . Then , and .
We have thus proved one of the most important results about -calculus, that the computations it does are consistent. It is immediate that each term has at-most one normal form, and thus the job of the computation is to get there. For instance, we have a very elegant computer that can simplify Peano-arithmetic expressions.
Encoding Peano arithmetic
We look at one possible encoding of and , the Church encoding, from which the rest of Peano arithmetic follows naturally.
Essentially, the ‘natural number’ takes a function and a value as inputs, and applies the function times to the value. From this I can immediately write , and
Exercise: Encode the above directly, without referring to previously defined operations. You should discover the rather curious and simple form
Difficult: Encode and hence .
A function is said to be -computable if it has an encoding such that one can -reduce to . Turns out the set of all -computable terms is exactly the set of all Turing-computable functions, so the -calculus is exactly as strong as a Turing machine. This result is the Church-Turing thesis.
Loop = Loop
The simplicity of -calculus is what makes it expressive, but also non-terminating. Not everything necessarily has a normal form, for instance the term is a fixed point of -reduction (such programs that evaluate to themselves are termed quines.) More generally, you have the powerful fixpoint combinator which takes a function and returns a fixed point for it (check this!). These recursive terms however lead to problems- especially if you are Haskell Curry, trying to investigate the foundations of mathematics. If (or its sister theory, Combinatory logic) provided for good foundations, we would have a -term that acted like logical implication- would have to correspond to , and would have to be true for all . The theorem of logic would mean would also be true.
Then for any , we can construct the -term , which checks if implies . The -combinator gives a fixed point for this, call it . Now starting from the axiom , we have the following sequence of simplifications:
But then from the same axiom we have derived both and , and hence which was arbitrary to begin with. The system could then prove every statement and hence be utterly useless, much to Mr. Curry’s disappointment. This is an instance of Curry’s Paradox, and in words it reads “If this sentence is true, then “. At the heart of the paradox is self-reference, which lives in – aptly called Curry’s paradoxical combinator.
Such ill-foundation spells disaster, and a considerable part of the development of foundational theories is spent getting rid of self-reference. ZFC did it by the axiom of foundation, while those who preferred a functional world came up with a system of Types. But this should be the subject matter of another post.