Consistency and Cut
Some mathematical expressions might not appear correctly in your device. In that case, right-click the expression and open the image in a new tab to view the full text.
If I had a dollar for every time someone showed me a “proof” of with a smug smile claiming maths is broken, I’d have, well, many dollars.
For the most part, proofs in mathematics are considered to be the ultimate tests of truth- if you can write a proof of it, it must be true, and moreover, a good mathematician must back every assertion with a proof on paper. But given enough thought, this correspondence is not at all obvious, and the fact that syntax – sequences of symbols – does mirror semantics – the “truth”, whatever that is- is a remarkable fact to be marvelled at. Better, an assertion that needs to be proven, mathematically of course. This study of meta-mathematics, of proofs and propositions as objects in themselves, was started off by David Hilbert, and revolves around the question of “what can be proven?”
We start off by moving into a meta-logic and defining our objects of interest. Most of mathematics can be done in what is called a formal system, which consists of
- A set of symbols, called the signature, further characterised as:
- Variable-symbols .
- Function-symbols and predicate-symbols, which have an associated natural number called arity.
- Auxiliary logical symbols.
- A set of terms, the elements of which are finite strings of symbols defined inductively:
- Every variable symbol is a term.
- If is an -ary function symbol and are terms, then is a term.
- A set of predicates, or well-formed formulae (wfs) defined inductively:
- If is an -ary predicate-symbol and are terms, then is a predicate. Predicates formed this way are called atomic formulae.
- Appropriate application of logical connectives and quantifiers to existing predicates generates new ones.
- A set of closed (without any free-variables) wfs, the elements of which are called axioms of the system.
- A finite set of relations between wfs, the elements of which are called rules of inference.
A formal system sets up an environment for writing proofs, where the first three elements dictate what can be written- variables and terms have been defined in a way that under the right interpretation, they correspond to concrete objects in the universe (or placeholders for them). Predicates, then, correspond to assertions that say how these objects relate to each other, and can be true or false based on the semantic truth in a universe.
Elements 4 and 5 describe a framework of writing proofs- sequences of predicates where is formed from conjunctions and disjunctions of axioms, and is in some rule of inference. In fact, the rules of inference can be thought of as inducing a directed graph on the set of all predicates (and conjunctions/disjunctions thereof), and then a proof of from is a path from to on this graph. The set of points reachable from the axioms should, in that case, ideally be exactly those sentences that are semantically true. Of Course, you could have entirely bogus rules of inference and deduce “cheese is bad” (objectively false) from . The task then, is to characterise the “good” rules of inference.
Notation: If the rule of inference relates two wfs as , it is customary to write , or simply when it is clear what the rule of inference is. The wf on the top is called the “premise”, while the one below is called the “consequence”, and we read it as “infer from using the rule “. Since axioms are typically not inferred from anything, they are written with an empty premise as .
A classic example of a formal system is propositional calculus, where there are no function or predicate symbols, and the logical symbols . The axioms are the families of wfs
where can be any terms. The rules of inference are:
Then, for instance, a proof of looks like
Proofs as trees
It is worth emphasising that and are genuinely different- the former says it is possible to deduce from in the meta-language, while the latter is a proposition saying implies in propositional calculus. The difference is subtle but important, nonetheless, for consistency essentially amounts to agreement between the two.
In a similar light, the connective is an artefact of the meta-language, and to simplify notation we choose to ignore the connective and instead write for the inference . Adopting this convention, we tweak our notion of a proof ladder to get what is called a proof tree: For a wf , the proof tree for is a graph in which the nodes are assigned predicates (multiple nodes might be given the same predicate), and every node can be assigned a ‘level’ (an ordinal) such that:
- level has the single node corresponding to , called the terminal wf.
- there is an edge between two nodes only if they differ by exactly one level.
- each node at level is connected to exactly one node at level .
- each node at level is connected to countably many (might be ) nodes at level , called its predecessors. Nodes with no predecessors are termed initial nodes.
- there is an assignment of ordinals such that the ordinal of every node is strictly greater than that of its predecessors.
- each initial node corresponds to an axiom.
- the node with predicate has predecessors corresponding to only if is a rule of inference.
The assignment of ordinals in rule 5 is to prevent infinitely long proofs. Write if the wf has a proof in this fashion. These rules may be changed slightly to accommodate individual systems.
Rules of inference, then, cease to be simple binary relations and the whole structure becomes a bit more complicated than a directed graph- however, it is all syntactic sugar and it is all ultimately an abstract rewrite system as described at the start.
Consistency and Explosion
The rules of inference can be thought of as a way of propagating “truth-values”, or let’s say colour. Colour the axioms and their conjunctions white, and then use the rules of inference to assign the colour white to if is white and is a rule of inference. Then all the white nodes are statements that can be proven- and for the proof system to be good, these better be semantically true (in a universe where the axioms hold).This is the idea of consistency, that there should be no proof of . In most systems, this corresponds to having proofs of both a sentence and a negation.
A formal system is inconsistent if there is a wf such that and , i.e. some statement and its negation are both provable. Inconsistent systems prove too many things- in fact, inconsistent systems prove everything: given any wf , here is a proof of it from in any system with Modus Ponens-
This is the principle of explosion: if the system is inconsistent, every predicate and its negation are theorems, and the exercise of writing proofs becomes meaningless. A consistent system proves no false statement, and then showing consistency of any extension of propositional logic amounts to finding some unprovable predicate.
We describe the formal system PA of first-order Peano arithmetic: start with the single predicate-symbol , a constant-symbol , a unary functional symbol and two binary function-symbols and . For convenience, write for , for , and for . There is the single quantifier , and logical connectives and . The axioms and rules of inference are
This system of fourteen axioms and two rules of inference is pretty powerful- it is possible to deduce important results about natural numbers, rationals (even real and complex numbers!) from these axioms. But can one deduce too much from it? Our discussion on explosion shows this is equivalent to asking if is a theorem. Clearly it isn’t an axiom, and it is not possible to deduce it from the rule of Generalization. So the only way to deduce is from some application of Modus Ponens- but how many cases will this break into? The problem is that Modus Ponens is the only rule of inference in the entire system, hence is way too general. Without it you can prove nothing, and with it you can prove- well, maybe even ?
Let us define a much stronger system PA*: the signature is the same as that of PA, and the logical connectives are . If we choose to use as short-hand for , then every wf of PA can be read as a wf of PA*. We now construct the axioms of PA*: take the set of all closed terms (terms without any variables), under the rewrite rules
These define an abstract rewriting system. Since closed terms are finite strings of symbols, is a well defined function of closed terms. Define functions from the set of closed terms to as follows:
Intuitively, is the total length of arguments of , and is the total length of arguments of . It is not too difficult to see that if the term reduces to , then . Thus we have a monotone embedding from closed terms (ordered by reduction) to ordered lexicographically. Since the latter is a well-order, the abstract rewriting system defined on the set of closed terms must be terminating. Moreover, if or is non-zero then the term must be reducible by one of the rules, so the normal (irreducible) terms must have . Thus every irreducible term is of the form . The set of irreducibles is hence , called the set of naturals. To see the rewrite system is confluent, it suffices to embed it in another confluent system. Indeed, it is possible to encode the entire system in -calculus, and since the latter is confluent (thanks to the work of Church and Rosser), the rules of arithmetic are confluent too. Thus every closed term has a unique normal form, which we call its evaluation.
For closed terms , we call the wf correct if the evaluations of and coincide. Call incorrect if they evaluate to unequal values. For the set of axioms of PA*, we take every correct wf and the negation of every incorrect wf. Since the terms can be evaluated in finite time, PA* is an axiomatic system.
Define the rules of inference as follows:
The s and s in the rules are called peripheral predicates, and can be empty (except in De Morgan and Cut for obvious reasons.) Anything that is not a peripheral wf is called a principal predicate.
The last rule of Cut will prove to be of central importance when proving consistency of PA: the principle of explosion deduces any statement from by using Cut and Dilution:
An observation that should be attributed to Gentzen, Cut is essential to proving any false statement, and our strategy for proving consistency is hence Cut-elimination. Accordingly, we define some terminology and rules pertaining to it.
- The principal wf of a Cut is called the cut wf.
- The number of propositional connectives and quantifiers in is called the degree of the cut. (The is added to ensure every cut has a non-zero degree.)
We also make some changes to the definition of the proof-tree to accommodate the Weak rules and Cut. In particular, we demand that there is a maximal degree of the cuts that appear, and call it the degree of the proof. If there are no Cuts then the proof has degree . We slightly change the assignment of ordinals: a node is assigned an ordinal equal to its predecessor if it follows from a Weak rule of inference, and an ordinal strictly greater than all of its predecessors if it follows from a Strong rule or Cut. A thread is any sequence starting at the terminal node (level ) such that is a predecessor of . Then the sequence of ordinals corresponding to the thread is decreasing, and since the ordinals are well-founded there can only be finitely many applications of Strong rules or Cut in a thread. As for the Weak rules, Consolidation can only be applied finitely many times while Exchange splits the wfs into equivalence classes of finite size. Hence each thread can be assumed to be only finitely long. The ordinal assigned to the terminal wf is a good measure of the ‘size’ of the proof, and is called the ordinal of the proof.
Define a subformula of a wf inductively:
- is a subformula of .
- If is a subformula of then so is .
- If is a subformula of , then so are and .
- If is a wf with free variable , and is a subformula of , then is also a subformula of for any term free for in .
Given any node of the proof-tree with subformula , we look at all the places where is a subformula of the nodes. Starting at and working upwards, this process continues through all Weak rules and Strong rules where is a peripheral predicate. Since each thread is finite, must be ‘introduced’ to the proof as an axiom or as the main predicate of some Strong rule or Cut. The set of all occurences of thus obtained is called the history of . Before proceeding, we prove a small result:
Observe that no Cut was used at any point. The result allows us to assume always has a proof irrespective of what is, and this will be immensely useful in proving the results about to follow. Let us use it to show that PA* is actually stronger than PA, as claimed:
Consequently, if PA* is consistent, then so is PA. This is nice because we have a lot more to work with when proving the consistency of PA*– suppose is a theorem in PA*. Then we can look at the proof tree, in particular the history of . Since every thread is finite and is not an axiom, must be introduced in the proof via a Dilution, hence must be paired with the principle predicate of the Dilution. This means occurs in the form for some theorem . But the terminal node of the proof is alone, so there must be some step where is unpaired from . Inspecting the rules of inference, the only one which allows such an unpairing is Cut. Thus if we can show that every theorem in PA* has a Cut-free proof, we can immediately conclude that the system is consistent. The next result due to Schütte does precisely that:
What we described is a new, interesting abstract rewriting system in which the objects are proofs and the reduction is as described. Since the degree of the proof is a decreasing quantity and every proof can be reduced, the rewriting system is terminating and so every proof can be reduced to a proof of degree 0, i.e. a Cut-free proof. In other words, any statement that can be proven in PA* can be proven without Cuts, hence cannot be an instance of . We have thus established the consistency of PA.
- I have been lazy with writing out proofs properly- in particular the rules of inference are sensitive to order: is, strictly speaking, not a rule of inference. It is however easily seen to be simply Dilution composed with Exchange, and thus throughout I have wilfully turned a blind eye towards the Weak rules. Hence everything written in the proofs is correct modulo Weak rules (and gaps in my understanding.)
- I have also not written what rule is being used at each stage of the proof, and it is now that I will employ the supreme power of saying that the task of figuring it out has been left as an exercise to the reader.