Consistency and Cut
Some mathematical expressions might not appear correctly in your device. In that case, right-click the expression and open the image in a new tab to view the full text.
If I had a dollar for every time someone showed me a “proof” of
with a smug smile claiming maths is broken, I’d have, well, many dollars.
For the most part, proofs in mathematics are considered to be the ultimate tests of truth- if you can write a proof of it, it must be true, and moreover, a good mathematician must back every assertion with a proof on paper. But given enough thought, this correspondence is not at all obvious, and the fact that syntax – sequences of symbols – does mirror semantics – the “truth”, whatever that is- is a remarkable fact to be marvelled at. Better, an assertion that needs to be proven, mathematically of course. This study of meta-mathematics, of proofs and propositions as objects in themselves, was started off by David Hilbert, and revolves around the question of “what can be proven?”
Formalising Logic
We start off by moving into a meta-logic and defining our objects of interest. Most of mathematics can be done in what is called a formal system, which consists of
- A set of symbols, called the signature, further characterised as:
- Variable-symbols
.
- Function-symbols and predicate-symbols, which have an associated natural number called arity.
- Auxiliary logical symbols.
- Variable-symbols
- A set of terms, the elements of which are finite strings of symbols defined inductively:
- Every variable symbol is a term.
- If
is an
-ary function symbol and
are terms, then
is a term.
- A set of predicates, or well-formed formulae (wfs) defined inductively:
- If
is an
-ary predicate-symbol and
are terms, then
is a predicate. Predicates formed this way are called atomic formulae.
- Appropriate application of logical connectives and quantifiers to existing predicates generates new ones.
- If
- A set of closed (without any free-variables) wfs, the elements of which are called axioms of the system.
- A finite set of relations between wfs, the elements of which are called rules of inference.
A formal system sets up an environment for writing proofs, where the first three elements dictate what can be written- variables and terms have been defined in a way that under the right interpretation, they correspond to concrete objects in the universe (or placeholders for them). Predicates, then, correspond to assertions that say how these objects relate to each other, and can be true or false based on the semantic truth in a universe.
Elements 4 and 5 describe a framework of writing proofs- sequences of predicates where
is formed from conjunctions and disjunctions of axioms, and
is in some rule of inference. In fact, the rules of inference can be thought of as inducing a directed graph on the set of all predicates (and conjunctions/disjunctions thereof), and then a proof of
from
is a path from
to
on this graph. The set of points reachable from the axioms should, in that case, ideally be exactly those sentences that are semantically true. Of Course, you could have entirely bogus rules of inference and deduce “cheese is bad” (objectively false) from
. The task then, is to characterise the “good” rules of inference.
Notation: If the rule of inference relates two wfs as
, it is customary to write
, or simply
when it is clear what the rule of inference is. The wf on the top is called the “premise”, while the one below is called the “consequence”, and we read it as “infer
from
using the rule
“. Since axioms are typically not inferred from anything, they are written with an empty premise as
.
Propositional Logic
A classic example of a formal system is propositional calculus, where there are no function or predicate symbols, and the logical symbols . The axioms are the families of wfs
where can be any terms. The rules of inference are:
Then, for instance, a proof of looks like
Proofs as trees
It is worth emphasising that and
are genuinely different- the former says it is possible to deduce
from
in the meta-language, while the latter is a proposition saying
implies
in propositional calculus. The difference is subtle but important, nonetheless, for consistency essentially amounts to agreement between the two.
In a similar light, the connective is an artefact of the meta-language, and to simplify notation we choose to ignore the
connective and instead write
for the inference
. Adopting this convention, we tweak our notion of a proof ladder to get what is called a proof tree: For a wf
, the proof tree for
is a graph in which the nodes are assigned predicates (multiple nodes might be given the same predicate), and every node can be assigned a ‘level’ (an ordinal) such that:
- level
has the single node corresponding to
, called the terminal wf.
- there is an edge between two nodes only if they differ by exactly one level.
- each node at level
is connected to exactly one node at level
.
- each node at level
is connected to countably many (might be
) nodes at level
, called its predecessors. Nodes with no predecessors are termed initial nodes.
- there is an assignment of ordinals such that the ordinal of every node is strictly greater than that of its predecessors.
- each initial node corresponds to an axiom.
- the node with predicate
has predecessors corresponding to
only if
is a rule of inference.
The assignment of ordinals in rule 5 is to prevent infinitely long proofs. Write if the wf
has a proof in this fashion. These rules may be changed slightly to accommodate individual systems.
Rules of inference, then, cease to be simple binary relations and the whole structure becomes a bit more complicated than a directed graph- however, it is all syntactic sugar and it is all ultimately an abstract rewrite system as described at the start.
Consistency and Explosion
The rules of inference can be thought of as a way of propagating “truth-values”, or let’s say colour. Colour the axioms and their conjunctions white, and then use the rules of inference to assign the colour white to if
is white and
is a rule of inference. Then all the white nodes are statements that can be proven- and for the proof system to be good, these better be semantically true (in a universe where the axioms hold).This is the idea of consistency, that there should be no proof of
. In most systems, this corresponds to having proofs of both a sentence and a negation.
A formal system is inconsistent if there is a wf such that
and
, i.e. some statement and its negation are both provable. Inconsistent systems prove too many things- in fact, inconsistent systems prove everything: given any wf
, here is a proof of it from
in any system with Modus Ponens-
This is the principle of explosion: if the system is inconsistent, every predicate and its negation are theorems, and the exercise of writing proofs becomes meaningless. A consistent system proves no false statement, and then showing consistency of any extension of propositional logic amounts to finding some unprovable predicate.
Peano Arithmetic
We describe the formal system PA of first-order Peano arithmetic: start with the single predicate-symbol , a constant-symbol
, a unary functional symbol
and two binary function-symbols
and
. For convenience, write
for
,
for
, and
for
. There is the single quantifier
, and logical connectives
and
. The axioms and rules of inference are
This system of fourteen axioms and two rules of inference is pretty powerful- it is possible to deduce important results about natural numbers, rationals (even real and complex numbers!) from these axioms. But can one deduce too much from it? Our discussion on explosion shows this is equivalent to asking if is a theorem. Clearly it isn’t an axiom, and it is not possible to deduce it from the rule of Generalization. So the only way to deduce
is from some application of Modus Ponens- but how many cases will this break into? The problem is that Modus Ponens is the only rule of inference in the entire system, hence is way too general. Without it you can prove nothing, and with it you can prove- well, maybe even
?
Let us define a much stronger system PA*: the signature is the same as that of PA, and the logical connectives are
. If we choose to use
as short-hand for
, then every wf of PA can be read as a wf of PA*. We now construct the axioms of PA*: take the set of all closed terms (terms without any variables), under the rewrite rules
These define an abstract rewriting system. Since closed terms are finite strings of symbols,
Intuitively, is the total length of arguments of
, and
is the total length of arguments of
. It is not too difficult to see that if the term
reduces to
, then
. Thus we have a monotone embedding from closed terms (ordered by reduction) to
ordered lexicographically. Since the latter is a well-order, the abstract rewriting system defined on the set of closed terms must be terminating. Moreover, if
or
is non-zero then the term must be reducible by one of the rules, so the normal (irreducible) terms must have
. Thus every irreducible term is of the form
. The set of irreducibles is hence
, called the set of naturals. To see the rewrite system is confluent, it suffices to embed it in another confluent system. Indeed, it is possible to encode the entire system in
-calculus, and since the latter is confluent (thanks to the work of Church and Rosser), the rules of arithmetic are confluent too. Thus every closed term has a unique normal form, which we call its evaluation.
For closed terms , we call the wf
correct if the evaluations of
and
coincide. Call
incorrect if they evaluate to unequal values. For the set of axioms of PA*, we take every correct wf and the negation of every incorrect wf. Since the terms can be evaluated in finite time, PA* is an axiomatic system.
Define the rules of inference as follows:
The s and
s in the rules are called peripheral predicates, and can be empty (except in De Morgan and Cut for obvious reasons.) Anything that is not a peripheral wf is called a principal predicate.
Eliminating Cut
The last rule of Cut will prove to be of central importance when proving consistency of PA: the principle of explosion deduces any statement from
by using Cut and Dilution:
- The principal wf
of a Cut is called the cut wf.
- The number of propositional connectives and quantifiers in
is called the degree of the cut. (The
is added to ensure every cut has a non-zero degree.)
We also make some changes to the definition of the proof-tree to accommodate the Weak rules and Cut. In particular, we demand that there is a maximal degree of the cuts that appear, and call it the degree of the proof. If there are no Cuts then the proof has degree . We slightly change the assignment of ordinals: a node is assigned an ordinal equal to its predecessor if it follows from a Weak rule of inference, and an ordinal strictly greater than all of its predecessors if it follows from a Strong rule or Cut. A thread is any sequence
starting at the terminal node (level
) such that
is a predecessor of
. Then the sequence of ordinals corresponding to the thread is decreasing, and since the ordinals are well-founded there can only be finitely many applications of Strong rules or Cut in a thread. As for the Weak rules, Consolidation can only be applied finitely many times while Exchange splits the wfs into equivalence classes of finite size. Hence each thread can be assumed to be only finitely long. The ordinal assigned to the terminal wf is a good measure of the ‘size’ of the proof, and is called the ordinal of the proof.
Define a subformula of a wf inductively:
is a subformula of
.
- If
is a subformula of
then so is
.
- If
is a subformula of
, then so are
and
.
- If
is a wf with free variable
, and
is a subformula of
, then
is also a subformula of
for any term
free for
in
.
Given any node of the proof-tree with subformula
, we look at all the places where
is a subformula of the nodes. Starting at
and working upwards, this process continues through all Weak rules and Strong rules where
is a peripheral predicate. Since each thread is finite,
must be ‘introduced’ to the proof as an axiom or as the main predicate of some Strong rule or Cut. The set of all occurences of
thus obtained is called the history of
. Before proceeding, we prove a small result:
Proof Prove by induction on number of connectives and quantifiers in
- (i)
- Since
and
must have
connectives/quantifiers,
and
are both theorems. Then we have a proof of
as:
- (ii)
has
quantifiers and connectives, hence
a theorem for every natural number
. Then we have a proof of
as:
- (iii)
has
quantifiers and connectives, hence
is a theorem. Then using the rule of Negation,
is also a theorem.
Observe that no Cut was used at any point. The result allows us to assume always has a proof irrespective of what
is, and this will be immensely useful in proving the results about to follow. Let us use it to show that PA* is actually stronger than PA, as claimed:
Theorem Every closed wf that is a theorem of PA is also a theorem of PA*.
Proof Every theorem of PA must be either an axiom or deducible from Modus Ponens/Generalization.
Let us prove all axioms of PA are theorems of PA*:
(Logic 1) , i.e.
Since is a theorem,
immediately follows from dilution.
(Logic 2) , i.e.
(Logic 3) , i.e.
(Quantifier 1) , i.e.
(Quantifier 2) , i.e.
Write for the predicate generated by substituting the term
for every free occurence of
in
. Remember that
has no free occurence of
in the axiom, so in particular
.
(Equality 1) , i.e.
.
If is an axiom, then done. Else,
and
have different evaluations. The evaluation of
can’t simultaneously be equal to the evaluations of both, so
or
is an axiom. Use Dilution.
(Equality 2-4, Add/Mult 1-4) The remaining axioms of Equality, Addition and Multiplication follow immediately from rules of evaluating closed terms.
(Induction) , i.e.
Write for
.
Claim is a theorem for all
.
Prove by induction: Clearly true for . If true for
, writing
for the peripheral
, we have
Then we can use Quantification and Consolidation to deduce that is true for all
. Since
is also a theorem, we deduce from the rule of Infinite induction that
is a theorem.
Hence all axioms of PA are provable in PA*. If and
(i.e.
) are theorems in PA*, then
is also a theorem, proved by an application of Cut. Similarly, if
is a theorem then looking at the proof tree of
and replacing every instance of
in its history by
, we see that
is a theorem for every
, hence
is a theorem by Infinite Induction. Thus any theorem of PA deduced using Modus Ponens or Generalization is also provable in PA*.
Consequently, if PA* is consistent, then so is PA. This is nice because we have a lot more to work with when proving the consistency of PA*– suppose is a theorem in PA*. Then we can look at the proof tree, in particular the history of
. Since every thread is finite and
is not an axiom,
must be introduced in the proof via a Dilution, hence must be paired with the principle predicate of the Dilution. This means
occurs in the form
for some theorem
. But the terminal node of the proof is
alone, so there must be some step where
is unpaired from
. Inspecting the rules of inference, the only one which allows such an unpairing is Cut. Thus if we can show that every theorem in PA* has a Cut-free proof, we can immediately conclude that the system is consistent. The next result due to Schütte does precisely that:
Proof We proceed by transfinite induction on the ordinal of the proof: If the ordinal of the proof is it contains no Cuts so has degree
.
Suppose have shown that all proofs with ordinal can be rewritten with reduced degree. Given a proof of
of ordinal
, start from the terminal node and moving upwards, locate the first application of a non-weak rule, i.e. a Strong rule or Cut. If it is a Strong rule, its premises have a lower ordinal so by induction hypothesis their proofs can be rewritten with a lower degree. This must reduce the maximal degree of Cuts in the proof, so we have a new proof with lower degree. The remaining case is if the first applied non-Weak rule is a Cut (call it the last Cut):
By induction hypothesis, the proofs of and
in the proof-tree can be replaced with proofs of degree
. If we can replace the last Cut with a inference of lower degree, we have reduced the maximal degree of all Cuts in the proof and we will be done. Let us look at what the Cut formula
could be:
Case 1: is an Atomic formula: Either
or
is an axiom.
Wlog is the axiom, then looking at the history of
we see that it must be introduced via a Dilution. Then simply ignoring all occurences of
in the proof of
, we have a proof of
(of degree
). Diluting this we get a proof of
. Thus we have eliminated the main Cut, and any other Cuts occuring must be in the proof of
hence have a degree
.
Case 2:
Rewrite the cut as follows:
The proof of was Cut-free, and both the newly introduced cuts have degree lower than
.
Case 3:
Then are
is a premises of the Cut. We look at the history of the subformula
in the proof of
. Since it cannot be an axiom, it must be introduced to the proof via a dilution or infinite induction. If the introduction was a dilution, then simply erasing all occurences of
gives a proof of
, and diluting it allows us to deduce
without employing the last Cut at all. Similarly tracing the history of
in the proof of the second premise, if the introduction was a Dilution then we are done. If neither of the two was introduced via a Dilution, then
must be introduced via Infinite induction and
via Quantification. In particular there must be a inferences of the form
and
for some closed term
(which evaluates to some natural
). Safely replace all occurences of
with
in the proof of
to get a proof of
. In all subsequent deductions that give a proof of
from
, the predicate
must be peripheral, so we can replace all of its occurences by
to get a valid proof of
from
. Likewise we have a proof of
from
. By induction hypothesis, all these proofs have degree
. Now replace the last Cut as follows:
Case 4:
Then is a premise of the cut. Trace the history of
– if it was introduced by a Dilution then done, as usual. If not, then it must be introduced via De Morgans, and there is an inference of the form
, after which we have a proof of
from
. Replacing all occurences of
in this proof with
gives us a proof of
from
, and likewise for
. By hypothesis, all the above proofs have degree smaller than
. Then replace the last Cut in the proof as:
This concludes the proof, and every hence every proof can be written with Cuts of lower degree.
What we described is a new, interesting abstract rewriting system in which the objects are proofs and the reduction is as described. Since the degree of the proof is a decreasing quantity and every proof can be reduced, the rewriting system is terminating and so every proof can be reduced to a proof of degree 0, i.e. a Cut-free proof. In other words, any statement that can be proven in PA* can be proven without Cuts, hence cannot be an instance of . We have thus established the consistency of PA.
Notes:
- I have been lazy with writing out proofs properly- in particular the rules of inference are sensitive to order:
is, strictly speaking, not a rule of inference. It is however easily seen to be simply Dilution composed with Exchange, and thus throughout I have wilfully turned a blind eye towards the Weak rules. Hence everything written in the proofs is correct modulo Weak rules (and gaps in my understanding.)
- I have also not written what rule is being used at each stage of the proof, and it is now that I will employ the supreme power of saying that the task of figuring it out has been left as an exercise to the reader.