Why is Lambda Calculus?
Functions in Intension
In the modern view of mathematical foundations, sets are firstclass objects. Everything you can talk about, can be encoded as a set sets are sets, ordered pairs are sets , and functions are sets of ordered pairs. A function, then, is nothing more than a dictionary which takes an object and associates it to exactly one other object almost artificially. This viewpoint is great for thinking about ‘arbitrary’ functions, permutations are great examples. But is that really all there is to it? When asked what is your computer does not open the dictionary and look up , instead it takes the input and performs a series of logical operations on the input to convert it to the desired result. (Indeed, storing the entire dictionary is not even physically possible!)
And in intension, that is what you would expect a function to be a recipe, a rule of computation, a ‘box’ which takes in some object and spits out another (depending only on the input). This completely different approach to foundations of mathematics, where functions are firstclass objects, led to what is called Lambda calculus. This provides a rich language where, unlike sets which are simple unordered lists, functions are living breathing objects that take arguments and can be composed. We then have a very powerful model of computation.
The Language
Suppose defines a function that takes in and spits out . This ‘rule applied to object’ approach can be written symbolically in a variety of ways
The last one is due to Alonzo Church, who (quite arbitrarily) chose to denote a quantifier that specifies what object the rule should be applied to. Church’s notation has one notable advantage over the rest the name of the function and its description are separated and made equivalent writing is the same as writing . To see how powerful this is, contrast how the two systems would express the same fact:
The latter system (which most mathematicians are used to) needs to introduce an arbitrary name to express even the simplest of facts, whereas Church’s notation reduces the name to what it should be mere convenience. Multiargument functions would then be written like , a function that takes in two numbers and returns their sum. Frege realised that by letting the output itself be a function, this inputoutput correspondence could be made onetoone while keeping all the expressive power. Then by partially applying to a single argument , we get another function . This in turn means that the original function can be written as . Although originally due to Frege, this idea of partially applying functions was rediscovered by Haskell Curry and is called _currying_: it says functions are the same as functions . And this should be natural, for the first corresponds to while the latter is in .
In principle, these ideas could lead to a foundation for all mathematics, solely in terms of functions and their application. The calculus is an attempt at capturing this idea, that eventually paved the way for modern functional programming. This formalization by Alonzo Church gave, at the very least, an elegant notation for manipulating functions.
Conventions
 Still keeping the old convention, we choose to read as .
 Left associativity is the natural convention to follow for application .
Definition The set of terms is defined inductively as
 Variables are terms.
 If are terms then so is . Terms formed this way are called application terms.
 If is a term and is a variable then is a term. Terms formed this way are called abstraction terms.
This is a purely syntactic notion we now have a set of all possible terms made out of functions, abstraction and application. I like to think of application explicitly as a binary relation and compare the definition of terms with how one would inductively define the set of terms for a given signature and quantifiers.
A Model of Computation
The question of when two functions are the same raises profound questions if your view is limited to the extensional perspective of functionsasdictionaries, then it is reasonable to call two functions the same if they are the same dictionary. But what about the intensional variant, functionsasrules? On the domain of all neutral atoms the functions and have the same behaviour as dictionaries, but they are clearly different in intention. What about and ? You know they mean the same thing, even though they appear different as rules. Lambdacalculus treats them differently, and is hence a hyperintensional theory. Since terms are finite strings of symbols, we write ( is identical to ) if they correspond to exactly the same string.
This is more restrictive than it should be for and are not identical. Indeed, relabelling all occurrences of any bound variable should not change the meaning of the term. Call such a transformation (replacing all bound occurrences of in with ) an reduction. Call two terms equivalent if there is a sequence of alpha reductions from one to the other. In abuse of notation, I shall use to denote this relation henceforth. We shall also assume that no term shall have a variable occuring as both free and bound (by applying enough reductions to ensure this is true.)
Similarly one would naturally expect to be the same function as , however they are syntactically very different. Hence we introduce the notion of an reduction: if is a term, we replace some subterm of form with .
If and are terms, we write for the term obtained by replacing every free occurence of inside with . Thus if is bound in then The quantifier and the binary relation of application work in synchrony application indicates that the second term acts as an argument to the first, and abstraction with binds that argument to some variable. This forms the most important computational step in calculus, a reduction. In particular, the term reduces to . More generally for a term , call a subterm a redex if it has form . We say there is a step reduction from to (written ) if is obtained by replacing a single redex in with . Write for the transitive closure.
Terms without redexes are said to be normal or irreducible. If and is normal, we say is a normal form of .
These relations (and their equivalence closures) form an abstract rewriting system, and a very powerful model of computation. A computation involves simply walking down the socreated digraph, and any relevant problem could be appropriately encoded in the terms such that walking down the tree would mean successively simplifying the problem and working towards a solution. Let us encode a very simple system as an example:
Boolean algebra
Since Boolean algebra revolves around two truthvalues, one possible way to encode them is in the form of a choice:
This has the advantage that one can encode “” simply as the term : if is then the term automatically reduces to . The following are possible encodings of the logic gates: I encourage you to verify they work as intended.
Exercise: Encode , , , .
Consistency
A natural question that arises is the meaningfulness of this computation it is not a very useful system if it cannot distinguish between and . We need some notion of consistency. Of course calculus is not a logical theory so it does not make sense to ask if it proves false statements. We can however generalise the principle of explosion to call the theory inconsistent if it equates all pair of terms.
Since each reduction corresponds to a logical simplification, we see that two terms are equivalent if they lie in the same connected component of the directed graph. Church and Rosser proved the remarkable result that there is a natural injection , i.e. each connected component of the graph has at most one irreducible term, to which every point in the component can be reduced. It immediately follows that distinct normal forms like and cannot be equated, and the model of computation is consistent. The result is called the ChurchRosser Theorem.
The ChurchRosser Theorem
An abstract rewriting system (such as calculus) is a relation on a set of points, thereby forming a directed graph. We say is a successor of (written ) if there is some finite sequence of reductions from to (including . Points and are connected (written ) if they are connected on the undirected graph (formed by forgetting directionality of ). Points and are joinable (written ) if they have a common successor.
We say has the ChurchRosser Property if any two connected points are joinable. In symbols, , i.e. . Our task is then to precisely show that reduction has the ChurchRosser property.
We can show that the ChurchRosser property is equivalent to a simpler property, confluence. A reduction is confluent if for every such that , there is a such that . In other words, all diversions eventually meet in a confluent system. This should be a natural property to desire in your rewrite system think about how you simplify : there are two distinct redexes and , but the choice of which way you go does not influence the final answer.
Lemma The ChurchRosser property is equivalent to confluence, i.e. a reduction is confluent iff it has the ChurchRosser property.
Proof The following figure sums it up.
Theorem (ChurchRosser) reduction is confluent.
In what follows, we reconstruct a proof due to Komori, Matsuda and Yamakawa, who improved upon Takahashi’s elegant proof using the idea of parallel reductions. Since a diversion is created only when a term has multiple redexes, we employ the key idea of a Takahashi translation, which involves reducing all redexes in a term simultaneously.
Definition The Takahashi translation of a term is defined inductively on the set of terms as:
(Rules with lower number are prioritized.) We write for Takahashi translations successively applied to . This interacts well with reductions:
Lemma .
Proof If is a variable, then there is a step reduction from to . Suppose the result is true for all terms with lower complexity than , then we have

 , then get by taking a proof of and prepending to each step.
 for not an abstraction term, then .
 , then .
Corollary If then .
Lemma
Proof We consider the forms and could have, and assume the statement is true for lower complexity terms.

 , in which case for . Then the induction hypothesis says , and prepending to each term gives a proof of
 for not an abstraction term. Then we can have
 where , not an abstraction term: take the reduction and append at each stage.
 where : then .
 where : take the reduction and prepend at each stage.
 , so we can have
 for : take a proof of and replace each with to get a proof of .
 for : there are finitely many occurrences of x \in , so finitely many occurrences of in . Each occurrence can be reduced to in finitely many steps, so we have .
 : if is not an abstraction term, then it can be observed that . If , then can have finitely many new redexes arising from occurrences like , each of which can be reduced further to get to . (This can, and should be formally proven by induction on structure of ).
Corollary
These results should be intuitive since Takahashi translations simply specify an order for reductions, by dictating that you do not touch a newly created redex until you have exhausted all redexes at the base level. The same intuition also then says that it should be possible to ‘extend’ every singlestep reduction to an entire Takahashi translation at that level.
Lemma
This can be generalised to the case , but we have to keep in mind that all the reductions need not be on the same level if we create new redexes at each stage. Keeping that in mind, we have
Lemma
Proof By induction on : clearly true for . Assume true for , then for some . Using induction hypothesis, . Moreover, . Then .
Theorem (Church Rosser) If and then there is an such that and .
Proof Suppose and and wlog . Then , and .
We have thus proved one of the most important results about calculus, that the computations it does are consistent. It is immediate that each term has atmost one normal form, and thus the job of the computation is to get there. For instance, we have a very elegant computer that can simplify Peanoarithmetic expressions.
Encoding Peano arithmetic
We look at one possible encoding of and , the Church encoding, from which the rest of Peano arithmetic follows naturally.
Essentially, the ‘natural number’ takes a function and a value as inputs, and applies the function times to the value. From this I can immediately write , and
Exercise: Encode the above directly, without referring to previously defined operations. You should discover the rather curious and simple form
Difficult: Encode and hence .
A function is said to be computable if it has an encoding such that one can reduce to . Turns out the set of all computable terms is exactly the set of all Turingcomputable functions, so the calculus is exactly as strong as a Turing machine. This result is the ChurchTuring thesis.
Loop = Loop
The simplicity of calculus is what makes it expressive, but also nonterminating. Not everything necessarily has a normal form, for instance the term is a fixed point of reduction (such programs that evaluate to themselves are termed quines.) More generally, you have the powerful fixpoint combinator which takes a function and returns a fixed point for it (check this!). These recursive terms however lead to problems especially if you are Haskell Curry, trying to investigate the foundations of mathematics. If (or its sister theory, Combinatory logic) provided for good foundations, we would have a term that acted like logical implication would have to correspond to , and would have to be true for all . The theorem of logic would mean would also be true.
Then for any , we can construct the term , which checks if implies . The combinator gives a fixed point for this, call it . Now starting from the axiom , we have the following sequence of simplifications:
But then from the same axiom we have derived both and , and hence which was arbitrary to begin with. The system could then prove every statement and hence be utterly useless, much to Mr. Curry’s disappointment. This is an instance of Curry’s Paradox, and in words it reads “If this sentence is true, then “. At the heart of the paradox is selfreference, which lives in – aptly called Curry’s paradoxical combinator.
Such illfoundation spells disaster, and a considerable part of the development of foundational theories is spent getting rid of selfreference. ZFC did it by the axiom of foundation, while those who preferred a functional world came up with a system of Types. But this should be the subject matter of another post.