*[mackenzie-proof]* traces the history of computerized theorem proving. The
early experiments were mainly concerned with completely automatic proofs of
theorems, and work can broadly be divided into two tracks. Research in the
Artificial Intelligence tradition, exemplified by *[newell-simon]*, tried
to emulate the way mathematicians actually think. Other more successful
efforts, e.g. by *[gilmore-pm]* and by *[wang-mm]*, simply relied on
the speed of the computer, using exhaustive search for a proof in certain
formal systems for (usually) first order logic. Of course even here the search
wasn't completely blind: at the very least the formal systems were chosen to
limit the search space. For example, the pioneering work of *[prawitz-pp]*
was based on a tableau presentation of Gentzen's cut-free sequent
systems.^{20}

The next step forward, discussed by *[prawitz-ipp]* and
Kanger,^{21} was to use free or 'meta' variables, discovering the
right instantiations gradually during search. Subsequent implementations of
tableaux and related calculi such as model elimination *[loveland-me1]*
invariably used some such technique. These methods are 'global', in that the
instantiations need to be applied throughout the partial proof tree. It's
therefore not surprising that the modern incarnations rely heavily on Prolog
technology; see the work of *[stickel-pttp]* or *[beckert-posegga]* for
example. Another stream of work approaches the problem bottom-up rather than
top-down, i.e. the idea is to start from the axiom level of the proof tree,
generate an ever-expanding set of consequences, and aim eventually to find the
goal among them. The 'inverse method' of *[maslov-inverse]* was presented
in just this way, as a bottom-up approach to proof search in cut-free sequent
calculus. These 'local' methods sacrifice goal-directedness but make
instantiation cheaper and, since all variables in intermediate results are
effectively universal, they allow the garnering of lemmas which can then be
re-used with different instantiations; moreover, strategies such as subsumption
and simplification can be applied to limit the explosion of theorems.

The most influential bottom-up method was resolution, invented by
*[robinson-resolution]*. This relied on unification; Robinson seems to have
been the first to present a unification algorithm explicitly and prove that it
gives a most general unifier.^{22} With unification, just a single rule of inference
is required, resolution, together with factoring (unifying literals in the same
clause and deleting the duplicates).^{23} The introduction of resolution caused a renaissance of interest in
automated theorem proving. Otter, directly descended from this line of
research, has recently achieved impressive results, settling open problems in
mathematics and producing quite new theorems *[anl-new-results]*. Tableaux
and model elimination are more popular, but this is probably only because they
are more straightforward to implement. Resolution-type systems need a certain
amount of craftsmanship to deal with the sets of theorems generated in an
efficient and directed way, and there are innumerable search techniques and
refinements worthy of consideration.

General first order provers tend to be inefficient in restricted areas compared
to specialized methods. Numerous algorithms that tackle certain areas of
mathematics more efficiently have been devised, e.g. for tautology checking
*[bryant,stalmarck-patent]*, linear arithmetic *[shostak-presburger]* and
equational reasoning *[knuth-bendix]*, not to mention the panoply of
techniques used by computer algebra systems. The NQTHM prover, developed by
*[boyer-acl]*, is a general theorem prover for quantifier-free arithmetic;
since the logic is quite simple, powerful proof automation, notably for
induction proofs, is feasible. Though the logic is restricted, it is amazingly
expressive in the hands of a skilled practitioner, and NQTHM has been used in a
wide variety of applications.

Nevertheless the achievements of fully automatic provers soon began to reach a
plateau, well below the ability to prove many substantial mathematical theorems
unaided. The emphasis moved to the development of 'proof assistants', 'proof
checkers' and 'interactive theorem provers', which are closer to our present
interests. The idea is not to prove difficult theorems automatically, but
simply to assist a human in constructing a formal proof, or at the very least
check its correctness. Several pioneering projects for the computer
formalization of mathematics appeared in the 1970s. A notable example was the
Automath effort led by *[debruijn-Automath]*. Here, a language for formal
mathematics was designed, together with a computer system for checking the
correctness of formal texts. Significant parts of mathematics were
proof-checked; for example *[jutting-thesis]* formalized the famous book on
the construction of the real number field by *[landau]*. The history of the
project and the lessons derived from it are detailed by
*[nederpelt-automath]*. Though the project was important and influential,
the Automath system is hardly used today.

One of Automath's difficulties was that it was ahead of its time
technologically, always struggling to make the computer keep pace with the
ideas. It was a batch program, and this made the slowness and lack of memory of
the machine more critical. Perhaps another factor contributing to its decline
was that it used some rather original notational and foundational ideas (the
banishment of named bound variables and the exploitation of the Curry-Howard
correspondence,^{24} for example).
The Mizar project, which began a little later, is still quite vital today, and
one of the reasons may be that, as its originator *[trybulec-allc]*
explicitly stated, it attempted 'not to depart too radically from the usual
accepted practices of mathematics.' Though until recently it was not well known
among the theorem proving community in the West, Mizar has been used to
proof-check a very large body of mathematics, spanning pure set theory,
algebra, analysis, topology, category theory and various unusual applications
like mathematical puzzles and computer models. Selected Mizar articles are
automatically abstracted and printed in human-readable form in the journal *
Formalized Mathematics*.^{25}' (in updated form).} Mizar
is also a batch program, but can still proof-check its formal texts with
remarkable speed.

There are reasons for dissatisfaction with very low-level proof checking *
à la* Automath, simply because of the tedium involved and the unreadability
of the resulting proofs. But fully automatic theorem provers, though they work
well in some situations, are often hard to use in practice. Usually they need
to be led to the required result via a carefully graded series of lemmas, each
of which they can prove automatically. Choosing this series often requires
intimate knowledge of the system if one is not to lead it up a blind alley. And
if one is interested in the proof, a completely automated prover may not be
what is wanted. The ideal seems to be a judiciously chosen combination of
automation and interaction; this for example was the aim of the later part of
the Semi-Automated Mathematics project *[guard-sam]*. Though this project
produced a number of novel ideas, and is famous for ''SAM's Lemma'', a property
of lattices proved automatically by the machine, it died out almost completely,
again probably because of being ahead of its time technologically.

Perhaps the best methodology for combining interaction and automation was
developed in Milner's Edinburgh LCF project *[gordon-lcfbook]*. In LCF-like
systems, the ML programming language is used to define data types representing
logical entities such as types, terms and theorems. A number of ML functions
are provided that produce theorems; these implement primitive inference rules
of the logic. The use of an abstract type of theorems with these inference
rules as its only constructors ensures that theorems can only be produced this
way. However the user is free to write arbitrarily complex proof procedures
which ultimately decompose to these primitives. So in principle, most automatic
proof procedures can be translated into LCF programs, while guaranteeing that
even if the program has errors, no false 'theorems' will arise (the program may
fail or produce a theorem other than the one intended, of course, but the
latter is easy to guard against and both are rare). It seems in practice that
this can be done reasonably efficiently; we discuss this issue in more detail
later. HOL *[gordon-holbook]*, which uses classical higher order logic
rather than the LCF logic, takes this a step further by insisting that all
theories be built up using special definitional extension mechanisms. These
give a guarantee that consistency is preserved. Such an approach is consonant
with the LCF philosophy, since it entails pushing back the burden of
consistency proofs or whatever to the beginning, once and for all, such that
all extensions, whether of the theory hierarchy or proof mechanisms, are
correct per construction.

Some of the systems described here were not primarily developed in order to
formalize textbook mathematics. For example, the HOL system was designed
expressly for the purpose of verifying computer hardware. As we said above,
this is a promising new application area for theorem provers, and many, e.g.
NQTHM and HOL, have been applied successfully to it. This is still mathematics
of a sort, and often quite a bit of pure mathematics is needed in a supporting
role --- see the work of *[harrison-fmsd94]* for example. But the theorems
proved tend to be different; shallower but more technically complex. It may be
that proof techniques which work well in pure mathematics are unsuitable for
verification, and vice versa. The LCF style offers the most hope here, since
its inherent programmability allows multiple proof styles to be based on the
same logical core.