# Consistency and Cut

Some mathematical expressions might not appear correctly in your device. In that case, right-click the expression and open the image in a new tab to view the full text.

If I had a dollar for every time someone showed me a “proof” of with a smug smile claiming maths is broken, I’d have, well, many dollars.

For the most part, proofs in mathematics are considered to be the ultimate tests of truth- if you can write a proof of it, it must be true, and moreover, a good mathematician must back every assertion with a proof on paper. But given enough thought, this correspondence is not at all obvious, and the fact that syntax – sequences of symbols – does mirror semantics – the “truth”, whatever that is- is a remarkable fact to be marvelled at. Better, an assertion that needs to be proven, mathematically of course. This study of meta-mathematics, of proofs and propositions as objects in themselves, was started off by David Hilbert, and revolves around the question of “*what can be proven?”*

#### Formalising Logic

We start off by moving into a meta-logic and defining our objects of interest. Most of mathematics can be done in what is called a *formal system, *which consists of

- A set of symbols, called the
*signature*, further characterised as:- Variable-symbols .
- Function-symbols and predicate-symbols, which have an associated natural number called
*arity.* - Auxiliary logical symbols.

- A set of
*terms*, the elements of which are finite strings of symbols defined inductively:- Every variable symbol is a term.
- If is an -ary function symbol and are terms, then is a term.

- A set of predicates, or
*well-formed formulae*(wfs) defined inductively:- If is an -ary predicate-symbol and are terms, then is a predicate. Predicates formed this way are called
*atomic formulae.* - Appropriate application of logical connectives and quantifiers to existing predicates generates new ones.

- If is an -ary predicate-symbol and are terms, then is a predicate. Predicates formed this way are called
- A set of closed (without any free-variables) wfs, the elements of which are called
*axioms*of the system. - A finite set of relations between wfs, the elements of which are called
*rules of inference.*

A formal system sets up an environment for writing proofs, where the first three elements dictate what *can* be written- variables and terms have been defined in a way that under the right interpretation, they correspond to concrete objects in the universe (or placeholders for them). Predicates, then, correspond to assertions that say how these objects relate to each other, and can be true or false based on the semantic truth in a universe.

Elements 4 and 5 describe a framework of writing proofs- sequences of predicates where is formed from conjunctions and disjunctions of axioms, and is in some rule of inference. In fact, the rules of inference can be thought of as inducing a directed graph on the set of all predicates (and conjunctions/disjunctions thereof), and then a proof of from is a path from to on this graph. The set of points reachable from the axioms should, in that case, ideally be exactly those sentences that are semantically true. Of Course, you could have entirely bogus rules of inference and deduce “cheese is bad” (objectively false) from . The task then, is to characterise the “good” rules of inference.

**Notation:** If the rule of inference relates two wfs as , it is customary to write , or simply when it is clear what the rule of inference is. The wf on the top is called the “premise”, while the one below is called the “consequence”, and we read it as “infer from using the rule “. Since axioms are typically not inferred from anything, they are written with an empty premise as .

#### Propositional Logic

A classic example of a formal system is propositional calculus, where there are no function or predicate symbols, and the logical symbols . The axioms are the families of wfs

where can be any terms. The rules of inference are:

Then, for instance, a proof of looks like

#### Proofs as trees

It is worth emphasising that and are genuinely different- the former says it is possible to deduce from in the *meta*-language, while the latter is a proposition saying implies in propositional calculus. The difference is subtle but important, nonetheless, for consistency essentially amounts to agreement between the two.

In a similar light, the connective is an artefact of the meta-language, and to simplify notation we choose to ignore the connective and instead write for the inference . Adopting this convention, we tweak our notion of a proof ladder to get what is called a *proof tree:* For a wf , the proof tree for is a graph in which the nodes are assigned predicates (multiple nodes might be given the same predicate), and every node can be assigned a ‘level’ (an ordinal) such that:

- level has the single node corresponding to , called the
*terminal*wf. - there is an edge between two nodes only if they differ by exactly one level.
- each node at level is connected to exactly one node at level .
- each node at level is connected to countably many (might be ) nodes at level , called its predecessors. Nodes with no predecessors are termed
*initial nodes*. - there is an assignment of ordinals such that the ordinal of every node is strictly greater than that of its predecessors.
- each initial node corresponds to an axiom.
- the node with predicate has predecessors corresponding to only if is a rule of inference.

The assignment of ordinals in rule 5 is to prevent infinitely long proofs. Write if the wf has a proof in this fashion. These rules may be changed slightly to accommodate individual systems.

Rules of inference, then, cease to be simple binary relations and the whole structure becomes a bit more complicated than a directed graph- however, it is all syntactic sugar and it is all ultimately an abstract rewrite system as described at the start.

#### Consistency and Explosion

The rules of inference can be thought of as a way of propagating “truth-values”, or let’s say colour. Colour the axioms and their conjunctions white, and then use the rules of inference to assign the colour white to if is white and is a rule of inference. Then all the white nodes are statements that can be proven- and for the proof system to be good, these better be semantically true (in a universe where the axioms hold).This is the idea of *consistency, *that there should be no proof of . In most systems, this corresponds to having proofs of both a sentence and a negation.

A formal system is * inconsistent * if there is a wf such that and , i.e. some statement and its negation are both provable. Inconsistent systems prove too many things- in fact, inconsistent systems prove *everything*: given any wf , here is a proof of it from in any system with Modus Ponens-

This is the *principle of explosion:* if the system is inconsistent, every predicate and its negation are theorems, and the exercise of writing proofs becomes meaningless. A *consistent* system proves no false statement, and then showing consistency of any extension of propositional logic amounts to finding some unprovable predicate.

#### Peano Arithmetic

We describe the formal system *PA* of first-order Peano arithmetic: start with the single predicate-symbol , a constant-symbol , a unary functional symbol and two binary function-symbols and . For convenience, write for , for , and for . There is the single quantifier , and logical connectives and . The axioms and rules of inference are

This system of fourteen axioms and two rules of inference is pretty powerful- it is possible to deduce important results about natural numbers, rationals (even real and complex numbers!) from these axioms. But can one deduce too much from it? Our discussion on explosion shows this is equivalent to asking if is a theorem. Clearly it isn’t an axiom, and it is not possible to deduce it from the rule of Generalization. So the only way to deduce is from some application of Modus Ponens- but how many cases will this break into? The problem is that Modus Ponens is the only rule of inference in the entire system, hence is way too general. Without it you can prove nothing, and with it you can prove- well, maybe even ?

Let us define a much stronger system *PA**: the signature is the same as that of *PA*, and the logical connectives are . If we choose to use as short-hand for , then every wf of *PA* can be read as a wf of *PA**. We now construct the axioms of *PA**: take the set of all closed terms (terms without any variables), under the rewrite rules

These define an abstract rewriting system. Since closed terms are finite strings of symbols, is a well defined function of closed terms. Define functions from the set of closed terms to as follows:

Intuitively, is the total length of arguments of , and is the total length of arguments of . It is not too difficult to see that if the term reduces to , then . Thus we have a monotone embedding from closed terms (ordered by reduction) to ordered lexicographically. Since the latter is a well-order, the abstract rewriting system defined on the set of closed terms must be terminating. Moreover, if or is non-zero then the term must be reducible by one of the rules, so the normal (irreducible) terms must have . Thus every irreducible term is of the form . The set of irreducibles is hence , called the set of *naturals*. To see the rewrite system is confluent, it suffices to embed it in another confluent system. Indeed, it is possible to encode the entire system in -calculus, and since the latter is confluent (thanks to the work of Church and Rosser), the rules of arithmetic are confluent too. Thus every closed term has a unique normal form, which we call its evaluation.

For closed terms , we call the wf *correct* if the evaluations of and coincide. Call *incorrect* if they evaluate to unequal values. For the set of axioms of *PA**, we take every correct wf and the negation of every incorrect wf. Since the terms can be evaluated in finite time, *PA** is an axiomatic system.

Define the rules of inference as follows:

The s and s in the rules are called *peripheral predicates*, and can be empty (except in De Morgan and Cut for obvious reasons.) Anything that is not a peripheral wf is called a *principal predicate*.

#### Eliminating Cut

The last rule of Cut will prove to be of central importance when proving consistency of *PA*: the principle of explosion deduces any statement from by using Cut and Dilution:

An observation that should be attributed to Gentzen, Cut is essential to proving any false statement, and our strategy for proving consistency is hence Cut-elimination. Accordingly, we define some terminology and rules pertaining to it.

- The principal wf of a Cut is called the
*cut wf*. - The number of propositional connectives and quantifiers in is called the
*degree of the cut*. (The is added to ensure every cut has a non-zero degree.)

We also make some changes to the definition of the proof-tree to accommodate the Weak rules and Cut. In particular, we demand that there is a maximal degree of the cuts that appear, and call it the *degree of the proof*. If there are no Cuts then the proof has degree . We slightly change the assignment of ordinals: a node is assigned an ordinal equal to its predecessor if it follows from a Weak rule of inference, and an ordinal strictly greater than all of its predecessors if it follows from a Strong rule or Cut. A *thread* is any sequence starting at the terminal node (level ) such that is a predecessor of . Then the sequence of ordinals corresponding to the thread is decreasing, and since the ordinals are well-founded there can only be finitely many applications of Strong rules or Cut in a thread. As for the Weak rules, Consolidation can only be applied finitely many times while Exchange splits the wfs into equivalence classes of finite size. Hence each thread can be assumed to be only finitely long. The ordinal assigned to the terminal wf is a good measure of the ‘size’ of the proof, and is called the *ordinal of the proof.*

Define a *subformula* of a wf inductively:

- is a subformula of .
- If is a subformula of then so is .
- If is a subformula of , then so are and .
- If is a wf with free variable , and is a subformula of , then is also a subformula of for any term free for in .

Given any node of the proof-tree with subformula , we look at all the places where is a subformula of the nodes. Starting at and working upwards, this process continues through all Weak rules and Strong rules where is a peripheral predicate. Since each thread is finite, must be ‘introduced’ to the proof as an axiom or as the main predicate of some Strong rule or Cut. The set of all occurences of thus obtained is called the *history* of . Before proceeding, we prove a small result:

**Lemma**(Excluded middle) is a theorem for all closed wfs .

**Proof**Prove by induction on number of connectives and quantifiers in : if has no quantifiers/connectives then is of the form for closed terms . Then either or is an axiom, hence follows by dilution. Suppose the claim is true up to quantifiers and connectives. If has of them, then must be of the form

- (i)
- Since and must have connectives/quantifiers, and are both theorems. Then we have a proof of as:
- (ii)
- has quantifiers and connectives, hence a theorem for every natural number . Then we have a proof of as:
- (iii)
- has quantifiers and connectives, hence is a theorem. Then using the rule of Negation, is also a theorem.

Observe that no Cut was used at any point. The result allows us to assume always has a proof irrespective of what is, and this will be immensely useful in proving the results about to follow. Let us use it to show that *PA** is actually stronger than * PA*, as claimed:

** Theorem ** Every closed wf that is a theorem of *PA* is also a theorem of *PA**.

**Proof** Every theorem of *PA* must be either an axiom or deducible from Modus Ponens/Generalization.

Let us prove all axioms of *PA* are theorems of *PA**:

(Logic 1) , i.e.

Since is a theorem, immediately follows from dilution.

(Logic 2) , i.e.

(Logic 3) , i.e.

(Quantifier 1) , i.e.

(Quantifier 2) , i.e.

Write for the predicate generated by substituting the term for every free occurence of in . Remember that has no free occurence of in the axiom, so in particular .

(Equality 1) , i.e. .

If is an axiom, then done. Else, and have different evaluations. The evaluation of can’t simultaneously be equal to the evaluations of both, so or is an axiom. Use Dilution.

(Equality 2-4, Add/Mult 1-4) The remaining axioms of Equality, Addition and Multiplication follow immediately from rules of evaluating closed terms.

(Induction) , i.e.

Write for .

*Claim* is a theorem for all .

Prove by induction: Clearly true for . If true for , writing for the peripheral , we have

Then we can use Quantification and Consolidation to deduce that is true for all . Since is also a theorem, we deduce from the rule of Infinite induction that is a theorem.

Hence all axioms of *PA* are provable in *PA**. If and (i.e. ) are theorems in *PA**, then is also a theorem, proved by an application of Cut. Similarly, if is a theorem then looking at the proof tree of and replacing every instance of in its history by , we see that is a theorem for every , hence is a theorem by Infinite Induction. Thus any theorem of *PA* deduced using Modus Ponens or Generalization is also provable in *PA**.

Consequently, if *PA** is consistent, then so is *PA*. This is nice because we have a lot more to work with when proving the consistency of *PA**– suppose is a theorem in *PA**. Then we can look at the proof tree, in particular the history of . Since every thread is finite and is not an axiom, must be introduced in the proof via a Dilution, hence must be paired with the principle predicate of the Dilution. This means occurs in the form for some theorem . But the terminal node of the proof is alone, so there must be some step where is unpaired from . Inspecting the rules of inference, the only one which allows such an unpairing is Cut. Thus if we can show that every theorem in *PA** has a Cut-free proof, we can immediately conclude that the system is consistent. The next result due to Schütte does precisely that:

**Theorem**Given a proof of theorem of degree , there is another proof of of lower degree.

**Proof** We proceed by transfinite induction on the ordinal of the proof: If the ordinal of the proof is it contains no Cuts so has degree .

Suppose have shown that all proofs with ordinal can be rewritten with reduced degree. Given a proof of of ordinal , start from the terminal node and moving upwards, locate the first application of a non-weak rule, i.e. a Strong rule or Cut. If it is a Strong rule, its premises have a lower ordinal so by induction hypothesis their proofs can be rewritten with a lower degree. This must reduce the maximal degree of Cuts in the proof, so we have a new proof with lower degree. The remaining case is if the first applied non-Weak rule is a Cut (call it the last Cut):

By induction hypothesis, the proofs of and in the proof-tree can be replaced with proofs of degree . If we can replace the last Cut with a inference of lower degree, we have reduced the maximal degree of all Cuts in the proof and we will be done. Let us look at what the Cut formula could be:

Case 1: is an Atomic formula: Either or is an axiom.

Wlog is the axiom, then looking at the history of we see that it must be introduced via a Dilution. Then simply ignoring all occurences of in the proof of , we have a proof of (of degree ). Diluting this we get a proof of . Thus we have eliminated the main Cut, and any other Cuts occuring must be in the proof of hence have a degree .

Case 2:

Rewrite the cut as follows:

The proof of was Cut-free, and both the newly introduced cuts have degree lower than .

Case 3:

Then are is a premises of the Cut. We look at the history of the subformula in the proof of . Since it cannot be an axiom, it must be introduced to the proof via a dilution or infinite induction. If the introduction was a dilution, then simply erasing all occurences of gives a proof of , and diluting it allows us to deduce without employing the last Cut at all. Similarly tracing the history of in the proof of the second premise, if the introduction was a Dilution then we are done. If neither of the two was introduced via a Dilution, then must be introduced via Infinite induction and via Quantification. In particular there must be a inferences of the form and for some closed term (which evaluates to some natural ). Safely replace all occurences of with in the proof of to get a proof of . In all subsequent deductions that give a proof of from , the predicate must be peripheral, so we can replace all of its occurences by to get a valid proof of from . Likewise we have a proof of from . By induction hypothesis, all these proofs have degree . Now replace the last Cut as follows:

Then we have replaced the last Cut with a Cut of smaller degree.

Case 4:

Then is a premise of the cut. Trace the history of – if it was introduced by a Dilution then done, as usual. If not, then it must be introduced via De Morgans, and there is an inference of the form , after which we have a proof of from . Replacing all occurences of in this proof with gives us a proof of from , and likewise for . By hypothesis, all the above proofs have degree smaller than . Then replace the last Cut in the proof as:

The number of Cuts has increased, but the degree of the proof has reduced.

This concludes the proof, and every hence every proof can be written with Cuts of lower degree.

What we described is a new, interesting abstract rewriting system in which the objects are proofs and the reduction is as described. Since the degree of the proof is a decreasing quantity and every proof can be reduced, the rewriting system is terminating and so every proof can be reduced to a proof of degree 0, i.e. a Cut-free proof. In other words, any statement that can be proven in *PA** can be proven without Cuts, hence cannot be an instance of . We have thus established the consistency of *PA*.

Notes:

- I have been lazy with writing out proofs properly- in particular the rules of inference are sensitive to order: is, strictly speaking, not a rule of inference. It is however easily seen to be simply Dilution composed with Exchange, and thus throughout I have wilfully turned a blind eye towards the Weak rules. Hence everything written in the proofs is correct modulo Weak rules (and gaps in my understanding.)
- I have also not written what rule is being used at each stage of the proof, and it is now that I will employ the supreme power of saying that the task of figuring it out has been left as an exercise to the reader.