Computer-checked mathematical proofs
Page last updated 27 Jan 2023
Can we trust a mathematical proof simply because it is accompanied by a claim of ‘checked by computer’?
Several people have indicated to me that there cannot be any error in Gödel’s original proof of incompleteness because there are ‘computer-checked’ proofs of incompleteness - where their authors claim that the proofs have been checked as correct by computer software - the assertion being that there is no possibility that there is any error in the proof. And although anyone who lives in the real world should be deeply suspicious about claims that a complex computer program is perfect, many people seem to set great credence by the claims of these authors.
Three such computer-checked proofs of incompleteness have been analyzed and fundamental errors have been found in each of them - in each case the error renders the proof totally invalid. These errors are documented in three separate papers, for the details see Analysis of Proofs of Incompleteness. I have now added an extension to my paper PDF An Error in a Computer Verified Proof of Incompleteness by Russell O’Connor on a computer proof by Russell O’Connor, showing that the proof, although supposedly fully checked by computer, actually relies on untenable assumptions regarding “representability” that are implicit in O’Connor’s definitions in his proof. The paper PDF An Error in a Computer Verified Proof of Incompleteness by John Harrison shows that Harrison’s code fails to observe elementary mathematical principles regarding the domains of variables. The paper PDF An Error in a Computer Verified Proof of Incompleteness By Natarajan Shankar shows that the proof introduces a statement which includes a fixed non-variable value as one of its values, whereas the value should in fact be a variable value. These are elementary errors that show that the blind reliance on an assumption that simply because there is a computer code there can be no error is completely untenable.
It is worth noting that it has been claimed (Footnote: Wiedijk, F. “Formal Proof - Getting Started”, 2008. Notices of the AMS 55(11), pp 1408-1414. online at PDF Formal Proof - Getting Started.) that these three computer-checked proofs of the incompleteness theorem are among the most impressive computer formalizations of proofs to date. But each one contains a fundamental error. So, the short answer to the question, “Can we trust a mathematical proof simply because it is accompanied by a claim of ‘checked by computer’?” is a resounding “No”.
See also the article on mathematical proofs in general.
Why would anyone accept unconditionally a claim of ‘checked by computer’?
There is a tendency in some quarters to have an illogical belief that if a proof is accompanied by a claim that every step of the proof has been checked by a computer, that the proof must therefore be correct. But while it may be correct that the computer will always carry out its tasks according to its program, it can only check those aspects of a proof that it has been instructed to check. The software that is currently used for checking mathematical proofs is not intelligent, and it will only perform those tasks that it has been instructed to perform.
The failure of the computer proof checking software that was used in the three proofs that were examined indicates that proof checking software is not as advanced as its proponents would have us believe. Among them are Arkoudas (Footnote: Arkoudas, K. & Bringsjord, S. “PDF Computers, justification, and mathematical knowledge”, 2007. Minds and Machines 17(2), pp 185-202.) and Harrison, (Footnote: Harrison, J. PDF “Formal proof - theory and practice”, 2008, Notices of the American Mathematical Society, 55 pp 1395-1406.) who claim that provided one can trust the proof checking software then one can trust any proof that it checks. But the evidence is that we cannot trust the current computer checking software - and so we cannot simply rely on the computer software affirmation of a proof as a clear-cut certainty. To trust such a proof unconditionally is to trust:
- the claim by the creators of the software that the software can always detect any possible error in a proof that it checks, and
- the claim by the author that he has made no error in designing the computer software, and
- that every part of the proof has actually been subjected to a check, and
- that the author of the proof has not made any errors or unstated assumptions in his interpretation of the inputs and the generated result.
Although it should be obvious that a computer program isn’t incapable of being erroneous simply because it is a computer program, there is an irrational belief among some people that we can have a complete and utter trust in a computer program that checks a proof. Why should there be such an irrational belief ?
The uncritical use of irrelevant information in support of computer-checked proofs
At the present time, the status of computer proof checking appears to be elevated by the publication of lists of claims of computer verification of many well-established mathematical theorems. The publication of these lists is presumably intended to demonstrate the effectiveness of computerized proof checking, and to swing opinion in their favor. Presumably the authors of these lists feel that as the lists increase in size, that supports the contention that computer proof checking is past the experimental stage and is now of age.
But the reality is that virtually no-one actually checks these proofs. There is usually little reason for anyone to spend time analyzing a computer version of a proof when they are already familiar with a non-computer proof and they consider that the result is already well established and accepted. Generally speaking, people will have much more pressing work to attend to than examining such proofs in minute detail.
As Bundy (Footnote: Bundy, A. PDF “The Paradox of the Case Study”, 2004.) remarks, to actually produce such a proof for a computer checking system is a very difficult and time consuming process, so it is not surprising that the resultant proofs are themselves difficult and time consuming to examine. The fact remains that the vast majority of computer-checked proofs that have been cited in support of computer proof checking have not subjected to any in-depth critical appraisal. Despite that, there appears to be a growing credence in computer proof checking simply because of the growing list of such ‘proofs’.
The raison d’être of computer proofs
The raison d’être of computer-checked proofs should be the detection and elimination of errors that arise in imprecise informal reasoning. And while the object of the creation of proofs such as proofs of incompleteness is purportedly to demonstrate that the given proof checker is capable of checking difficult proofs, the very difficulty of the process of creation of the proof can lead to a crucial defect in the proof checking process - the bypassing of a difficult step of a proof by the introduction of an informal argument which is not subject to the proof checking process.
Now, it should be quite obvious that there is little reason in using a computer to check only the simpler parts of a proof, and which does not check the most difficult steps in the proof. But at least two of the authors of the proofs examined by me turn this concept on its head.
Shankar informally asserts that a certain proposition A is equivalent to another proposition B. He then subjects this proposition B to a check by the computer. Shankar considers that, since he has a proof of proposition B that has been computer-checked, he can claim that his proposition A is computer-checked. But this completely misses the entire point of the computer checking the proof in the first place. The crucial point is that the computer has not checked the claim of equivalence of proposition A and proposition B, so the proof as a whole is not computer-checked. Of course, that points to the obvious reason why Shankar elected to have his program operate on proposition B rather than proposition A - because there was some difficulty in proving proposition A. Difficult because it was incorrect.
Also, as noted above, I have now added an extension to my paper PDF An Error in a Computer Verified Proof of Incompleteness by Russell O’Connor on a computer proof by Russell O’Connor, showing that the proof, although supposedly fully checked by computer, actually relies on untenable assumptions regarding “representability”. In such a case, it is obviously incorrect to state that the proof has been ‘checked by computer’. This may seem blindingly obvious, but it is surprising how often you will see a proof described as ‘computer-checked’ when at least one detail in the proof has not been checked by computer. Of course, the author of such proofs know that to publish a proof and label it as ‘partially computer-checked’ will attract less attention than a claim of ‘computer-checked’.
The other crucial aspect is that the computer proof checking software should check every possible aspect of every step of the proof. If that is not done, then the full capability of a computer to check a proof has not been utilized; if there are details of the proof that can be checked, then those details should be checked.
Unless every single detail of the proof of a given statement is subject to the computer checking system, then it is simply incorrect to state that the proof has been checked by the computer checking system.
At present, computer proof systems are fairly obscure in their operations to the end observer. The situation regarding computer proof systems is somewhat reminiscent of the early days of computing, when the emphasis, for example in word processing and spreadsheets, was on achieving functionality, with operator and end user convenience very much a secondary consideration. That in itself indicates that perhaps computer proof software is not at such an advanced stage as some people would like us to believe. The goal should be to make the entire computer proof checking system as transparent as possible.
There is a further concern here which should drive the development of such computer programs in that direction. The development of software continues at an ever-increasing pace, and what is standard coding today is obsolescent tomorrow. The dangers here are obvious. In thirty years’ time will we be entirely reliant on assuming that a proof is correct simply because there was a computer proof checking proof created thirty years ago - but which is now so unreadable that no-one is prepared to spend the time to review it?
In summary, despite many claims to the contrary, the science of computer proof checking of mathematical proofs is still in its infancy, and has a long way to go before it can be accepted as a commonplace useful mathematical tool. It is to be hoped that the situation can improve. It is expected that the day will come when computer systems will have sufficient intelligence to be able to examine any mathematical proof with more rigor than any human, but on the current evidence, that day is still quite far away.
How quickly progress can be made in that direction will be dependent to a large extent on the willingness of the creators of computer proof checking systems to admit the failings of the current methodology and to commit to addressing such failings. Judging by the knee-jerk reaction of the authors of the ‘computer-checked’ proofs that contain fundamental errors, it would seem that current proponents of computer checking software prefer to duck their heads into the sand and ignore any findings of a deficiency in either their proofs or the proof checking software. Such a recalcitrant attitude does not bode well for the future of computer proof checking in the short-term - the goal should surely be that any such system does actually check every possible aspect of a proof that it is possible to check.