## Automated Reasoning

[mackenzie-proof] traces the history of computerized theorem proving. The early experiments were mainly concerned with completely automatic proofs of theorems, and work can broadly be divided into two tracks. Research in the Artificial Intelligence tradition, exemplified by [newell-simon], tried to emulate the way mathematicians actually think. Other more successful efforts, e.g. by [gilmore-pm] and by [wang-mm], simply relied on the speed of the computer, using exhaustive search for a proof in certain formal systems for (usually) first order logic. Of course even here the search wasn't completely blind: at the very least the formal systems were chosen to limit the search space. For example, the pioneering work of [prawitz-pp] was based on a tableau presentation of Gentzen's cut-free sequent systems.20

The next step forward, discussed by [prawitz-ipp] and Kanger,21 was to use free or 'meta' variables, discovering the right instantiations gradually during search. Subsequent implementations of tableaux and related calculi such as model elimination [loveland-me1] invariably used some such technique. These methods are 'global', in that the instantiations need to be applied throughout the partial proof tree. It's therefore not surprising that the modern incarnations rely heavily on Prolog technology; see the work of [stickel-pttp] or [beckert-posegga] for example. Another stream of work approaches the problem bottom-up rather than top-down, i.e. the idea is to start from the axiom level of the proof tree, generate an ever-expanding set of consequences, and aim eventually to find the goal among them. The 'inverse method' of [maslov-inverse] was presented in just this way, as a bottom-up approach to proof search in cut-free sequent calculus. These 'local' methods sacrifice goal-directedness but make instantiation cheaper and, since all variables in intermediate results are effectively universal, they allow the garnering of lemmas which can then be re-used with different instantiations; moreover, strategies such as subsumption and simplification can be applied to limit the explosion of theorems.

The most influential bottom-up method was resolution, invented by [robinson-resolution]. This relied on unification; Robinson seems to have been the first to present a unification algorithm explicitly and prove that it gives a most general unifier.22 With unification, just a single rule of inference is required, resolution, together with factoring (unifying literals in the same clause and deleting the duplicates).23 The introduction of resolution caused a renaissance of interest in automated theorem proving. Otter, directly descended from this line of research, has recently achieved impressive results, settling open problems in mathematics and producing quite new theorems [anl-new-results]. Tableaux and model elimination are more popular, but this is probably only because they are more straightforward to implement. Resolution-type systems need a certain amount of craftsmanship to deal with the sets of theorems generated in an efficient and directed way, and there are innumerable search techniques and refinements worthy of consideration.

General first order provers tend to be inefficient in restricted areas compared to specialized methods. Numerous algorithms that tackle certain areas of mathematics more efficiently have been devised, e.g. for tautology checking [bryant,stalmarck-patent], linear arithmetic [shostak-presburger] and equational reasoning [knuth-bendix], not to mention the panoply of techniques used by computer algebra systems. The NQTHM prover, developed by [boyer-acl], is a general theorem prover for quantifier-free arithmetic; since the logic is quite simple, powerful proof automation, notably for induction proofs, is feasible. Though the logic is restricted, it is amazingly expressive in the hands of a skilled practitioner, and NQTHM has been used in a wide variety of applications.

Nevertheless the achievements of fully automatic provers soon began to reach a plateau, well below the ability to prove many substantial mathematical theorems unaided. The emphasis moved to the development of 'proof assistants', 'proof checkers' and 'interactive theorem provers', which are closer to our present interests. The idea is not to prove difficult theorems automatically, but simply to assist a human in constructing a formal proof, or at the very least check its correctness. Several pioneering projects for the computer formalization of mathematics appeared in the 1970s. A notable example was the Automath effort led by [debruijn-Automath]. Here, a language for formal mathematics was designed, together with a computer system for checking the correctness of formal texts. Significant parts of mathematics were proof-checked; for example [jutting-thesis] formalized the famous book on the construction of the real number field by [landau]. The history of the project and the lessons derived from it are detailed by [nederpelt-automath]. Though the project was important and influential, the Automath system is hardly used today.

One of Automath's difficulties was that it was ahead of its time technologically, always struggling to make the computer keep pace with the ideas. It was a batch program, and this made the slowness and lack of memory of the machine more critical. Perhaps another factor contributing to its decline was that it used some rather original notational and foundational ideas (the banishment of named bound variables and the exploitation of the Curry-Howard correspondence,24 for example). The Mizar project, which began a little later, is still quite vital today, and one of the reasons may be that, as its originator [trybulec-allc] explicitly stated, it attempted 'not to depart too radically from the usual accepted practices of mathematics.' Though until recently it was not well known among the theorem proving community in the West, Mizar has been used to proof-check a very large body of mathematics, spanning pure set theory, algebra, analysis, topology, category theory and various unusual applications like mathematical puzzles and computer models. Selected Mizar articles are automatically abstracted and printed in human-readable form in the journal Formalized Mathematics.25' (in updated form).} Mizar is also a batch program, but can still proof-check its formal texts with remarkable speed.

There are reasons for dissatisfaction with very low-level proof checking à la Automath, simply because of the tedium involved and the unreadability of the resulting proofs. But fully automatic theorem provers, though they work well in some situations, are often hard to use in practice. Usually they need to be led to the required result via a carefully graded series of lemmas, each of which they can prove automatically. Choosing this series often requires intimate knowledge of the system if one is not to lead it up a blind alley. And if one is interested in the proof, a completely automated prover may not be what is wanted. The ideal seems to be a judiciously chosen combination of automation and interaction; this for example was the aim of the later part of the Semi-Automated Mathematics project [guard-sam]. Though this project produced a number of novel ideas, and is famous for ''SAM's Lemma'', a property of lattices proved automatically by the machine, it died out almost completely, again probably because of being ahead of its time technologically.

Perhaps the best methodology for combining interaction and automation was developed in Milner's Edinburgh LCF project [gordon-lcfbook]. In LCF-like systems, the ML programming language is used to define data types representing logical entities such as types, terms and theorems. A number of ML functions are provided that produce theorems; these implement primitive inference rules of the logic. The use of an abstract type of theorems with these inference rules as its only constructors ensures that theorems can only be produced this way. However the user is free to write arbitrarily complex proof procedures which ultimately decompose to these primitives. So in principle, most automatic proof procedures can be translated into LCF programs, while guaranteeing that even if the program has errors, no false 'theorems' will arise (the program may fail or produce a theorem other than the one intended, of course, but the latter is easy to guard against and both are rare). It seems in practice that this can be done reasonably efficiently; we discuss this issue in more detail later. HOL [gordon-holbook], which uses classical higher order logic rather than the LCF logic, takes this a step further by insisting that all theories be built up using special definitional extension mechanisms. These give a guarantee that consistency is preserved. Such an approach is consonant with the LCF philosophy, since it entails pushing back the burden of consistency proofs or whatever to the beginning, once and for all, such that all extensions, whether of the theory hierarchy or proof mechanisms, are correct per construction.

Some of the systems described here were not primarily developed in order to formalize textbook mathematics. For example, the HOL system was designed expressly for the purpose of verifying computer hardware. As we said above, this is a promising new application area for theorem provers, and many, e.g. NQTHM and HOL, have been applied successfully to it. This is still mathematics of a sort, and often quite a bit of pure mathematics is needed in a supporting role --- see the work of [harrison-fmsd94] for example. But the theorems proved tend to be different; shallower but more technically complex. It may be that proof techniques which work well in pure mathematics are unsuitable for verification, and vice versa. The LCF style offers the most hope here, since its inherent programmability allows multiple proof styles to be based on the same logical core.

John Harrison 96/8/13; HTML by 96/8/16