: ;,i! i.i.i; " '^! THE LOGIC THEORY MACHINE; EMPIRICAL EXPLORATIONS WITH A CASE STUDY IN HEURISTICS



Similar documents
An Alternative Way to Measure Private Equity Performance

benefit is 2, paid if the policyholder dies within the year, and probability of death within the year is ).

Section 5.4 Annuities, Present Value, and Amortization

The Development of Web Log Mining Based on Improve-K-Means Clustering Analysis

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Recurrence. 1 Definitions and main statements

Luby s Alg. for Maximal Independent Sets using Pairwise Independence

The OC Curve of Attribute Acceptance Plans

Institute of Informatics, Faculty of Business and Management, Brno University of Technology,Czech Republic

Answer: A). There is a flatter IS curve in the high MPC economy. Original LM LM after increase in M. IS curve for low MPC economy

What is Candidate Sampling

DEFINING %COMPLETE IN MICROSOFT PROJECT

A Probabilistic Theory of Coherence

PSYCHOLOGICAL RESEARCH (PYC 304-C) Lecture 12

1 Example 1: Axis-aligned rectangles

8 Algorithm for Binary Searching in Trees

Lecture 2: Single Layer Perceptrons Kevin Swingler

CHAPTER 14 MORE ABOUT REGRESSION

Brigid Mullany, Ph.D University of North Carolina, Charlotte

7.5. Present Value of an Annuity. Investigate

Finite Math Chapter 10: Study Guide and Solution to Problems

Project Networks With Mixed-Time Constraints

Extending Probabilistic Dynamic Epistemic Logic

1.1 The University may award Higher Doctorate degrees as specified from time-to-time in UPR AS11 1.

THE DISTRIBUTION OF LOAN PORTFOLIO VALUE * Oldrich Alfons Vasicek

An Empirical Study of Search Engine Advertising Effectiveness

Calculation of Sampling Weights

Feature selection for intrusion detection. Slobodan Petrović NISlab, Gjøvik University College

CHAPTER 5 RELATIONSHIPS BETWEEN QUANTITATIVE VARIABLES

Time Value of Money Module

ANALYZING THE RELATIONSHIPS BETWEEN QUALITY, TIME, AND COST IN PROJECT MANAGEMENT DECISION MAKING

1. Fundamentals of probability theory 2. Emergence of communication traffic 3. Stochastic & Markovian Processes (SP & MP)

1. Measuring association using correlation and regression

VRT012 User s guide V0.1. Address: Žirmūnų g. 27, Vilnius LT-09105, Phone: (370-5) , Fax: (370-5) , info@teltonika.

Causal, Explanatory Forecasting. Analysis. Regression Analysis. Simple Linear Regression. Which is Independent? Forecasting

Simple Interest Loans (Section 5.1) :

THE METHOD OF LEAST SQUARES THE METHOD OF LEAST SQUARES

This circuit than can be reduced to a planar circuit

Joe Pimbley, unpublished, Yield Curve Calculations

Chapter 4 ECONOMIC DISPATCH AND UNIT COMMITMENT

CHOLESTEROL REFERENCE METHOD LABORATORY NETWORK. Sample Stability Protocol

How To Calculate The Accountng Perod Of Nequalty

Section 5.3 Annuities, Future Value, and Sinking Funds

Staff Paper. Farm Savings Accounts: Examining Income Variability, Eligibility, and Benefits. Brent Gloy, Eddy LaDue, and Charles Cuykendall

Can Auto Liability Insurance Purchases Signal Risk Attitude?

Support Vector Machines

8.5 UNITARY AND HERMITIAN MATRICES. The conjugate transpose of a complex matrix A, denoted by A*, is given by

BERNSTEIN POLYNOMIALS

Efficient Project Portfolio as a tool for Enterprise Risk Management

Implementation of Deutsch's Algorithm Using Mathcad

v a 1 b 1 i, a 2 b 2 i,..., a n b n i.

1. Math 210 Finite Mathematics

A Novel Methodology of Working Capital Management for Large. Public Constructions by Using Fuzzy S-curve Regression

A powerful tool designed to enhance innovation and business performance

Financial Mathemetics

RequIn, a tool for fast web traffic inference

Lecture 3: Force of Interest, Real Interest Rate, Annuity

On the Optimal Control of a Cascade of Hydro-Electric Power Stations

How Sets of Coherent Probabilities May Serve as Models for Degrees of Incoherence

Traffic State Estimation in the Traffic Management Center of Berlin

Course outline. Financial Time Series Analysis. Overview. Data analysis. Predictive signal. Trading strategy

An Evaluation of the Extended Logistic, Simple Logistic, and Gompertz Models for Forecasting Short Lifecycle Products and Services

Activity Scheduling for Cost-Time Investment Optimization in Project Management

Calculating the high frequency transmission line parameters of power cables

Latent Class Regression. Statistics for Psychosocial Research II: Structural Models December 4 and 6, 2006

An Overview of Financial Mathematics

Time Value of Money. Types of Interest. Compounding and Discounting Single Sums. Page 1. Ch. 6 - The Time Value of Money. The Time Value of Money

Laws of Electromagnetism

Using Series to Analyze Financial Situations: Present Value

IDENTIFICATION AND CORRECTION OF A COMMON ERROR IN GENERAL ANNUITY CALCULATIONS

Number of Levels Cumulative Annual operating Income per year construction costs costs ($) ($) ($) 1 600,000 35, , ,200,000 60, ,000

Forecasting the Direction and Strength of Stock Market Movement

Testing Database Programs using Relational Symbolic Execution

FREQUENCY OF OCCURRENCE OF CERTAIN CHEMICAL CLASSES OF GSR FROM VARIOUS AMMUNITION TYPES

Exhaustive Regression. An Exploration of Regression-Based Data Mining Techniques Using Super Computation

Analysis of Premium Liabilities for Australian Lines of Business

We assume your students are learning about self-regulation (how to change how alert they feel) through the Alert Program with its three stages:

How To Know The Components Of Mean Squared Error Of Herarchcal Estmator S

7 ANALYSIS OF VARIANCE (ANOVA)

A Secure Password-Authenticated Key Agreement Using Smart Cards

Small pots lump sum payment instruction

Student Performance in Online Quizzes as a Function of Time in Undergraduate Financial Management Courses

Trade Adjustment and Productivity in Large Crises. Online Appendix May Appendix A: Derivation of Equations for Productivity

Multiple-Period Attribution: Residuals and Compounding

To manage leave, meeting institutional requirements and treating individual staff members fairly and consistently.

Frequency Selective IQ Phase and IQ Amplitude Imbalance Adjustments for OFDM Direct Conversion Transmitters

Face Verification Problem. Face Recognition Problem. Application: Access Control. Biometric Authentication. Face Verification (1:1 matching)

The Greedy Method. Introduction. 0/1 Knapsack Problem

Vembu StoreGrid Windows Client Installation Guide

Characterization of Assembly. Variation Analysis Methods. A Thesis. Presented to the. Department of Mechanical Engineering. Brigham Young University

POLYSA: A Polynomial Algorithm for Non-binary Constraint Satisfaction Problems with and

Section C2: BJT Structure and Operational Modes

GENESYS BUSINESS MANAGER

Linear Circuits Analysis. Superposition, Thevenin /Norton Equivalent circuits

A Replication-Based and Fault Tolerant Allocation Algorithm for Cloud Computing

A Performance Analysis of View Maintenance Techniques for Data Warehouses

Product-Form Stationary Distributions for Deficiency Zero Chemical Reaction Networks

For example, you might want to capture security group membership changes. A quick web search may lead you to the 632 event.

Transcription:

! EMPRCAL EXPLORATONS WTH THE LOGC THEORY MACHNE; A CASE STUDY N HEURSTCS. :, by Allen Newell, J. C. Shaw, & H. A. Smon Ths s a case study n problem-solvng, representng part of a program of research on complex nformaton-processng systems. We have specfed a system for fndng proofs of theorems n elementary symbolc logc, and by programmng a computer to these specfcatons, have obtaned emprcal data on the problem-solvng process n elementary logc. The program s called the Logc Theory Machne (LT); t was devsed to learn how t s possble to solve dffcult problems such as provng mathematcal theorems, dscoverng scentfc laws from data, playng chess, or understandng the meanng of Englsh prose. The research reported here s amed at understandng the complex processes (heurstcs) that are effectve n problem-solvng. Hence, we are not nterested n methods that guarantee solutons, but whch requre vast amounts of computaton. Rather, we wsh to understand how a mathematcan, for example, s able to prove a theorem even though he does not know when he starts how, or f, he s gong to succeed. Ths focuses on the pure theory of problem-research solvng (Newell and Smon, 1956a). Prevously we specfed n detal a program for the Logc Theory Machne; and we shall repeat here only as much of that specfcaton as s needed so that the reader can understand our data. n a companon study (Newell and Shaw, 1957) we consder how computers can be programmed to execute processes of the knds called for by LT, a problem that s nterestng n ts own rght. Smlarly, we postpone to later Papers a dscusson of the mplcatons of our work for the psychologcal theory of human thnkng and problem-solvng. Other areas of applcaton 109, : 1,! f : j,..;, 1 1 " ^! : : ;,! > V ll

110 ARTFCAL NTELLGENCE wll readly occur to the reader, but here we wll lmt our attenton to the nature of the problem-solvng process tself. Our research strategy n studyng complex systems s to specfy them n detal, program them for dgtal computers, and study ther behavor emprcally by runnng them wth a number of varatons and under a varety of condtons. Ths appears at present the only adequate means to obtan a thorough understandng of ther behavor. Although the problem area wth whch the present system, LT, deals s farly elementary, t provdes a good example of a dffcult problem logc s a subject taught n college courses, and s dffcult enough for most humans. Our data come from a seres of programs run on the JOHNNAC, one of RANDs hgh-speed dgtal computers. We wll descrbe the results of these runs, and analyze and nterpret ther mplcatons for the problemsolvng process. The Logc Theory Machne n Operaton We shall frst gve a concrete pcture of the Logc Theory Machne n operaton. LT, ot course, s a program, wrtten for the represented by marks on paper or holes n cards. However, we can thnk of LT as an actual physcal machne and the operaton of the program as the behavor of the machne. One can dentfy LT wth JOHNNAC after the latter has been loaded wth the basc program, but before the nput of data. LTs task s to prove theorems n elementary symbolc logc, or more precsely, n the sentental calculus. The sentental calculus s a formalzed system of mathematcs, consstng of expressons bult from combnatons of basc symbols. Fve of these expressons are taken as axoms, and there are rules of nference for generatng new theorems from the axoms and from other theorems. n flavor and form elementary symbolc logc s much lke abstract algebra. Normally the varables of the system are nterpreted as sentences, and the axoms and rules of nference as formalzatons of logcal operatons, e.g., deducton. However, LT deals wth the system as a purely formal mathematcs, and we wll have no further need of the nterpretaton. We need to ntroduce a smatterng of the sentental calculus to understand LTs task. There s postulated a set of varables p, q, r,... A, B, C,... wth whch the sentental calculus deals. These varables can be combned nto expressons by means of connectves. Gven any varable p, we can form the expresson "not-p." Gven any two varables p and q, we can form the expresson "p or q," or the expresson "p mples q," where "or" and "mples" are the connectves. There are other connectves, for example "and," but we wll not need them here. Once we have formed expressons, *

these can be further combned nto more complcated expressons. For example, we can form: 1 "(p mples not-p) mples not-p." (2.01) There s also gven a set of expressons that are axoms. These are taken to be the unversally true expressons from whch theorems are to be derved by means of varous rules of nference. For the sake of defnteness n our work wth LT, we have employed the system of axoms, defntons, and rules that s used n the Prncpa Mathematca, whch lsts fve axoms : (p or p) mples p (1.2) p mples (q or p) (1.3) (p or q) mples (q or p) (1.4) [p or (q or r)] mples [q or (p or r)] (1.5) (p mples q) mples [(r or p) mples (ror<?)]. (1.6) Gven some true theorems one can derve new theorems by means of three rules of nference: substtuton, replacement, and detachment. 1. By the rule of substtuton, any expresson may be substtuted for any varable n any theorem, provded the substtuton s made throughout the theorem wherever that varable appears. For example, by substtuton f "p or q" for "p," n the second axom we get the new theorem: (p or q) mples [q or (p or q)]. 2. By the rule of replacement, a connectve can be replaced by ts defnton, and vce versa, n any of ts occurrences. By defnton "p mples q" means the same as "not-p or q." Hence the former expresson can always be replaced by the latter and vce versa. For example from axom 1.3, by replacng "mples" wth "or," we get the new theorem: not-/? or (q or p). 3. By the rule of detachment, f "A" and "A mples B" are theorems, then "B" s a theorem. For example, from: and we get the new theorem: THE LOGC THEORY MACHNE 111 (p or p) mples p, [(p or p) mples p] mples (p mples p), p mples p. Gven an expresson to prove, one starts from the set of axoms and theorems already proved, and apples the varous rules successvely untl For easy reference we have numbered axoms and theorems to correspond to ther numbers n Prncpa Mathematca, 2nd cd., vol. 1, New York: by A. N. Whtehead and B. Russell, 1935. r j! "S 1 j ;l v j \, /Jljj 1!, l f! < 1 1 1!. 1 1 > j, \

112 ARTFCAL NTELLGENCE the desred expresson s produced. The proof s the sequence of expressons, each one valdly derved from the prevous ones, that leads from the axoms and known theorems to the desred expresson. Ths s all the background n symbolc logc needed to observe LT n operaton. LT "understands" expressons n symbolc logc that s, there s a smple code for punchng expressons on cards so they can be fed nto the machne. We gve LT the fve axoms, nstructng t that these are theorems t can assume to be true. LT already knows the rules of nference and the defntons how to substtute, replace, and detach. Next we gve LT a sngle expresson, say expresson 2.01, and ask LT to fnd a proof for t. LT works for about 10 seconds and then prnts out the followng proof: (p mples not-p) mples not-p 1. (A or A) mples A 2. (not-a or not-a) mples not-a 3. (A mples not-a) mples not-a 4. (p mples not-p) mples not-p (theorem 2.01, to be proved) (axom 1.2) (subs, of not-a for A) (repl. of "or" wth "mples") (subs. ofpfor/;<2ed). Next we ask LT to prove a farly advanced theorem (Whtehead and Russell, 1935), theorem 2.45; allowng t to use all 38 theorems proved pror to 2.45. After about 12 mnutes, LT produces the followng proof: not (p or q) mples not-p 1. A mples (A or B) 2. p mples (p or q) 3. (A mples B) mples (not-b mples not-a) 4. [p mples (p or q)] mples [not (p or q) mples not-p] (theorem 2.45, to be proved) (theorem 2.2) (subs, p for A, q for B n 1) (theorem 2.16) [subs, p for A, (p or q) for B n 3] 5. not (p or q) mples not-p (detach rght sde of 4, usng 2; QED). Fnally, all the theorems pror to (2.31 ) are gven to LT (a total of 28) ; and then LT s asked to prove: [p or (q or r)] mples [(p or q) or r]. (2.31) LT works for about 23 mnutes and then reports that t cannot prove (2.31 ), that t has exhausted ts resources. Now, what s there n ths behavor of LT that needs to be explaned? The specfc examples gven are dffcult problems for most humans, and most humans do not know what processes they use to fnd proofs, f they fnd them. There s no known smple procedure that wll produce such proofs. Varous methods exst for verfyng whether any gven expresson s

THE LOGC THEORY MACHNE 113 true or false; the best known procedure s the method of truth tables. But these procedures do not produce a proof n the meanng of Whtehead and Russell. One can nvent "automatc" procedures for producng proofs. We wll look at one brefly later, but these turn out to requre computng tmes of the orders of thousands of years for the proof of (2.45). We must clarfy why such problems are dffcult n the frst place, and then show what features of LT account for ts successes and falures. These questons wll occupy the rest of ths study. Problems, Algorthms, and Heurstcs n descrbng LT, ts envronment, and ts behavor we wll make repeated use of three concepts. The frst of these s the concept of problem. Abstractly, a person s gven a problem f he s gven a set of possble solutons, and a test for verfyng whether a gven element of ths set s n fact a soluton to hs problem. The reason why problems are problems s that the orgnal set of possble solutons gven to the problem-solver can be very large, the actual solutons can be dspersed very wdely and rarely throughout t, and the cost of obtanng each new element and of testng t can be very expensve. Thus the problem-solver s not really "gven" the set of possble solutons; nstead he s gven some process for generatng the elements of that set n some order. Ths generatorhas propertes of ts own, not usually specfed n statng the problem; e.g., there s assocated wth t a certan cost per element produced, t may be possble to change the order n whch t produces the elements, and so on. Lkewse the verfcaton test has costs and tmes assocated wth t. The problem can be solved f these costs are not too large n relaton to the tme and computng power avalable for soluton. One very specal and valuable property that a generator of solutons sometmes has s a guarantee that f the problem has a soluton, the generator wll, sooner or later, produce t. We wll call a process that has ths property for some problem an algorthm for that problem. The guarantee provded by an algorthm s not an unmxed blessng, of course, snce nothng has been specfed about the cost or tme requred to produce the solutons. For example, a smple algorthm for openng a combnaton safe s to try all combnatons, testng each one to see f t opens the safe. Ths algorthm s a typcal problem-solvng process: there s a generator that produces new combnatons n some order, and there s a verfer that determnes whether each new combnaton s n fact a soluton to the problem. Ths search process s an algorthm because t s known that some combnaton wll open the safe, and because the generatorwll exhaust all combnatons n a fnte nterval of tme. The algorthm s suffcently expensve,! t ;, f ml 1 ml t, ; " j, > " j J-!

114 ARTFCAL NTELLGENCE however, that a combnaton safe can be used to protect valuables even from people whoknow the algorthm. A process that may solve a gven problem, but offers no guarantees of dong so, s called a heurstc 2 for that problem. Ths lack of a guarantees not an unmxed evl. The cost nflcted by the lack of guarantee depends on what the process costs and what algorthms are avalable as alternatves. For most run-of-the-mll problems we have only heurstcs, but occasonally we have both algorthms and heurstcs as alternatves for solvng the same problem. Sometmes, as n the problem of fndng maxma for smple dfferentable functons, everyone uses the algorthm of settng the frst dervatve equal to zero; no one sets out to examne all the ponts on the lne one by one as f t were possble. Sometmes, as n chess, everyone plays by heurstc, snce no one s able to carry out the algorthm of examnng all contnuatons of the game to termnaton. The Problem of Provng Theorems n Logc Fndng a proof for a theorem n symbolc logc can be descrbed as selectng an element from a generated set, as shown by Fg. 1. Consder the set of all possble sequences of logc expressons call t E. Certan of these sequences, a very small mnorty, wll be proofs. A proof sequence satsfes the followng test: Each expresson n the sequence s ether 1. One of the accepted theorems or axoms, or 2. Obtanable from one or two prevous expressons n the sequence by applcaton of one of the three rules of nference. Call the set of sequences that are proofs P. Certan of the sequences n E have the expresson to be proved call t X, as ther fnal expresson. Call ths set of sequences Tx. Then, to fnd a proof of a gven theorem X means to select an element of E that belongs to the ntersecton of P and Tx. The set E s gven mplctly by rules for generatng new sequences of logc expressons. The dffculty of provng theorems depends on the scarcty of elements n the ntersecton of P and T x, relatve to the number of elements n E. Hence, t depends on the cost and speed of the avalable generators that produce elements of E, and on the cost and speed of makng tests that determne whether an element belongs to T x or P. The dffculty also de- 2As a noun, "heurstc" s rare and generally means the art of dscovery. The adjectve "heurstc" s defned by Webster as: servng to dscover or fnd out. t s. n ths sense that t s used n the phrase "heurstc process" or "heurstc method." For concseness, we wll use "heurstc" as a noun synonymous wth "heurstc process." No other Englsh word appears to have ths meanng.

pends on whether generators can be found that guarantee that any element they produce automatcally satsfes some of the condtons. Fnally, as we shall see, the dffculty depends heavly on what heurstcs can be found to gude the selecton. A lttle reflecton, and experence n tryng to prove theorems, make t clear that proof sequences for specfed theorems are rare ndeed. To reveal more precsely why provng THE LOGC THEORY MACHNE 115 Fgure 1. Relatonshps between E, P, and Tx. theorems s dffcult, we wll construct an algorthm for dong ths. The algorthm wll be based only on the tests and defntons gven above, and not on any "deep" nferred propertes of symbolc logc. Thus t wll reflect the basc nature of theorem provng; that s, ts nature pror to buldng up sophstcated proof technques. We wll Brtsh Museum algorthm, n recognton of the procedures of ths type. call ths algorthm the supposed orgnators of The Brtsh Museum Algorthm The algorthm constructs all possble proofs n a systematc manner, checkng each tme (1) to elmnate duplcates, and (2) to see f the fnal theorem n the proof concdes wth the expresson to be proved. Wth ths algorthm the set of one-step proofs s dentcal wth the set of axoms (.e., each axom s a one-step proof of tself). The set of n-step proofs s obtaned from the set of (n l)-step proofs by makng all the permssble substtutons and replacements n the expressons of the (n l)-step proofs, and by makng all the permssble detachments of pars of expressons as permtted by the recursve defnton of proof. 3 Fgure 2 shows how the set of n-step proofs ncreases wth n at the very start of the proof-generatng process. Ths enumeraton only extends to replacements of "or" wth "mples," "mples" wth "or," and negaton of varables (e.g., "not-p" for "p"). No detachments and no complex substtutons (e.g., "q or r" for "p") are ncluded. No specalzatons have been made (e.g., substtuton of p for q n "p or q"). f we nclude the specalzatons, whch take three more steps, the algorthm wll generate 3 A number of fussy but not fundamental ponts must be taken care of n constructng the algorthm. The phrase "all permssble substtutons" needs to be qualfed, for there s an nfnty of these. Care must be taken not to duplcate expressons that dffer only n the names of ther varables. We wll not go nto detals here, but smply state that these dffcultes can be removed. The essental feature n constructng the algorthm s to allow only one thng to happen n generatng each new ex- Presson,.e., one replacement, substtuton of "not-p" for "p," etc. f l ; " : "ft; > >V. J l f l,

V 116 ARTFCAL NTELLGENCE an (estmated) addtonal 600 theorems, thus provdng a set of proofs of 11 steps or less contanng almost 1000 theorems, none of them duplcates. n order to see how ths algorthm would provde proofs of specfed theorems, we can consder ts performance on the sxty-odd theorems of chap. 2 of Prncpa. One theorem (2.01) s obtaned n step (4) of the generaton, hence s among the frst 42 theorems proved. Three more (2.02, 2.03, and 2.04) are obtaned n step (6), hence among the frst 115. One more (2.05) s obtaned n step (8), hence n the frst 246. Only one more s ncluded n the frst 1000, theorem 2.07. The proofs of all the remander requre complex substtutons or detachment. We have no way at present to estmate how many proofs must be generated to nclude proofs of all theorems of chap. 2 of Prncpa. Our best guess s that t mght be a hundred mllon. Moreover, apart from the sx theorems lsted, there s no reason to suppose that the proofs of these theorems would occur early n the lst. Our nformaton s too poor to estmate more than very roughly the tmes requred to produce such proofs by the algorthm; but we can estmate tmes of about 16 mnutes to do the frst 250 theorems of Fg. 2 [.e., through step ( 8 ) ] assumng processng tmes comparable wth those n LT. The frst part of the algorthm has an addtonal specal property, whch holds only to the pont where detachment s frst used; that no check for duplcaton s necessary. Thus the tme of computng the frst few thousand proofs only ncreases lnearly wth the number of theorems generated. For the theorems requrng detachments, duplcaton checks must be made, and the total computng tme ncreases as the square of the number of expressons generated. At ths rate t would take hundreds of thousands of years of computaton to generate proofs for the theorems n chap. 2. The nature of the problem of 200 provng theorems s now reasonably clear. When sequences of expressons are produced by a smple and o Q. O cheap (per element produced) generator, the chance that any partcu- J> too 3 lar sequence s the desred proof s z exceedngly small. Ths s true even f the generator produces sequences that always satsfy the most complcated and restrctve of the solu- 012345678 Proof steps ton condtons: that each s a proof Fgure 2. Number of proofs generated by frst few steps of Brtsh Museum algorthm. of somethng. The set of sequences s so large, and the desred prool

F so rare, that no practcal amount of computaton suffces to fnd proofs by means of such an algorthm. The Logc Theory Machne THE LOGC THEORY MACHNE 117 f LT s to prove any theorems at all t must employ some devces that alter radcally the order n whch possble proofs are generated, and the way n whch they are tested. To accomplsh ths, LT gves up almost all the guarantees enjoyed by the Brtsh Museum algorthm. ts procedures guarantee nether that ts proposed sequences are proofs of somethng, nor that LT wll ever fnd the proof, no matter how much effort s spent. However, they often generate the desred proof n a reasonable computng tme. Methods The major type of heurstc that LT uses we call a method. As yet we have no precse defnton of a method that dstngushes t from all the other types of routnes n LT. Roughly, a method s a reasonably self-contaned operaton that, f t works, makes a major and permanent contrbuton toward fndng a proof. t s the largest unt of organzaton n LT, subordnated only to the executve routnes necessary to coordnate and select the methods.. l- l, ;, THE SUBSTTUTON METHOD Ths method seeks a proof for the problem expresson by fndng an axom or prevously proved theorem that can be transformed, by a seres of substtutons for varables and replacements of connectves, nto the problem expresson. t THE DETACHMENT METHOD Ths method attempts, usng the rule of detachment, to substtute for the problem expresson a new subproblem, whch, f solved, wll provde a proof for the problem expresson. Thus, f the problem expresson s B, the method of detachment searches for an axom or theorem of the form A mples B." f one s found, A s set up as a new subproblem. f A can be proved, then, snce "A mples B" s a theorem, B wll also be proved. 1 1 THE CHANNG METHODS These methods use the transtvty of the relaton of mplcaton to create a new subproblem whch, f solved, wll provde a proof for the problem expresson. Thus, f the problem expresson s "a mples c," the method f forward channg searches for an axom or theorem of the form "a

V 118 ARTFCAL NTELLGENCE mples b." f one s found, "b mples c" s set up as a new subproblem. Channg backward works analogously: t seeks a theorem of the form "b mples c," and f one s found, "a mples b" s set up as a new subproblem. Each of these methods s an ndependent unt. They are alternatves to one another, and can be used n sequence, one workng on the subproblems generated by another. Each of them produces a major part of a proof. Substtuton actually proves theorems, and the other three generate subproblems, whch can become the ntermedate expressons n a proof sequence. These methods gve no guarantee that they wll work. There s no guarantee that a theorem can be found that can be used to carry out a proof by the substtuton method, or a theorem that wll produce a subproblem by any of the other three methods. Even f a subproblem s generated, there s no guarantee that t s part of the desred proof sequence, or even that t s part of any proof sequence (e.g., t can be false). On the other hand, the generated methods do guarantee that any subproblem generated s part of a sequence of expressons that ends n the desred theorem (ths s one of the condtons that a sequence be a proof). The methods also guarantee that each expresson of the sequence s derved by the rules of nference from the precedng ones (a second condton of proof). What s not guaranteed s that the begnnng of the sequence can be completed wth axoms or prevously proved theorems. There s also no guarantee that the combnaton of the four methods, used n any fashon whatsoever and wth unlmted computng effort, comprses a suffcent set of methods to prove all theorems. n fact, we have dscovered a theorem [(2.13), "p or not-not-not-p"] whch the four methods of LT cannot prove. All the subproblems generated for (2.13) after a certan pont are false, and therefore cannot lead to a proof. We have yet no general theory to explan why the methods transform LT nto an effectve problem-solver. That they do, n conjuncton wth the other mechansms to be descrbed shortly, wll be demonstrated amply n the remander of ths study. Several factors may be nvolved. Frst, the methods organze the sequences of ndvdual processng steps nto larger unts that can be handled as such. Each processng step can be orented toward the specal functon t performs n the unt as a whole, and the unts can be manpulated and organzed as enttes by the hgher-level routnes. Apart from ther "untzng" effect, the methods that generate subproblems work "backward" from the desred theorem to axoms or known theorems rather than "forward" as dd the Brtsh Museum algorthm. Snce there s only one theorem to be proved, but a number of known true theorems, the effcacy of workng backward may be analogous to the

m THE LOGC THEORY MACHNE 119 ease wth whch a needle can fnd ts way out of a haystack, compared wth the dffculty of someone fndng the lone needle n the haystack. The Executve Routne n LT the four methods are organzed by an executve routne, whose flow dagram s shown n Fg. 3. 1. When a new problem s presented to LT, the substtuton method s tred frst, usng all the axoms and theorems that LT has been told to assume, and that are now stored n a theorem lst. 2. f substtuton fals, the detachment method s tred, and as each new subproblem s created by a successful detachment, an attempt s made to prove the new subproblem by the substtuton method. f substtuton fals agan, the subproblem s added to a subproblem lst. 3. f detachment fals for all the theorems n the theorem lst, the same cycle s repeated wth forward channg, and then wth backward channg: try to create a subproblem; try to prove t by the substtuton method; f unsuccessful, put the new subproblem on the lst. By the nature of the methods, f the substtuton method ever succeeds wth a sngle subproblem, the orgnal theorem s proved. 4. f all the methods have been tred on the orgnal problem and no proof has been produced, the executve routne selects the next untred subproblem from the subproblem lst, and makes the same sequence of attempts wth t. Ths process contnues untl (1) a proof s found, (2) the tme allotted for fndng a proof s used up, (3) there s no more aval- (start) able memory space n the machne, r (4) no untred problems reman Select problem (no more methods) on the subproblem lst. T... r Try method - theorem- n the three examples cted earler, the proof of (2.01) [(p rples not-p) mples not-p] was obtamed by the substtuton method drectly, hence dd not nvolve use (no more theorems)- (get new problem) of _ the subproblem lst. * T A-... The proof of (2.45) [not (p or 9) mples not-p] was acheved by an applcaton of the detachment method followed by a substtuton. Ths proof requred LT to create a subproblem, and to use the substtuton method on t. t dd not requre LT ever to select any sub- Fgure 3, suhctltntnn Try substtuton (""\) Selecttheorem -"-> Try t (fal)-*- (proof) through Generalflow dagram of LT. : l " ; 1, 1 j! t 1 1 l ".... :! :r, m

V r 120 ARTFCAL NTELLGENCE problem not {p or q) mples from the subproblem lst, not-/) snce the substtuton was successful. Fgure 4 shows the tree of subproblems correspondng to the proof of x? (2.45). The subproblems are gven n the form of a downward branchng \ tree. Each node s a subproblem, the orgnal problem beng the sngle node Fgure 4. Subproblem tree of proof at the top. The lnes radatng down by LT of (2.45) (all prevous from a node lead to the new subproblems generated from the subproblem theorems avalable). correspondng to the node. The proof sequence s gven by the dashed lne; the top lnk was constructed by the detachment method, and the bottom lnk by the substtuton method. The other lnks extendng down from the orgnal problem lead to other subproblems generated by the detachment method (but not provable by drect substtuton) pror to the tme LT tred the theorem that leads to the fnal proof. LT dd not prove theorem 2.31, also mentoned earler, and gave as ts reason that t could thnk of nothng more to do. Ths means that LT had consdered all subproblems on the subproblem lst (there were sx n ths case) and had no new subproblems to work on. n none of the examples mentoned dd LT termnate because of tme or space lmtatons; however, ths s the most common result n the cases where LT does not fnd a proof. Only rarely does LT run out of thngs to do. Ths secton has descrbed the organzaton of LT n terms of methods. We have stll to examne n detal why t s that ths organzaton, n connecton wth the addtonal mechansms to be descrbed below, allows LT to prove theorems wth a reasonable amount of computng effort. The Matchng Process The tmes requred to generate proofs for even the smplest theorems by the Brtsh Museum algorthm are larger than the tmes requred by LT by factors rangng from fve (for one partcular theorem) to a hundred and upward. Let us consder an example from the earlest part of the generaton, where we have detaled nformaton about the algorthm. The 79th theorem generated by the algorthm (see Fg. 2) s theorem 2.02 of Prncpa, one of the theorems we asked LT to prove. Ths theorem, "p mples (q mples p)," s generated by the algorthm n about 158 seconds wth a sequence of substtutons and replacements; t s proved by LT n about 10 seconds wth the method of substtuton. The reason for the dfference becomes apparent f we focus attenton on axom 1.3, "p mples (q or p)," from whch the theorem s derved n ether scheme.

f : THE LOGC THEORY MACHNE 121 Fgure 5 shows the tree of proofs of the frst twelve theorems obtaned from (1.3) by the algorthm. The theorem 2.02 s node (9) on the tree and s obtaned by substtuton of "not-q" for "q" n axom 1.3 to reach node (5); and then by replacng the "(not-g or p)" by "(q mples p)" n (5) to get (9). The 9th theorem generated from axom 1.3 s the 79th generatedfrom the fve axoms consdered together. Ths proof s obtaned drectly by LT usng the followng matchng procedure. We compare the axom wth (9), the expresson to be proved: p mples (q or p) (1-3) p mples (q mples p). (9) r t l l j E; l *! 1 P,1; 1 1 :, _, \ " 1 1!. Frst, by a drect comparson, LT determnes that the man connectves are dentcal. Second, LT determnes that the varables to the left of the man connectves are dentcal. Thrd, LT determnes that the connectves wthn parentheses on the rght-hand sdes are dfferent. t s necessary to replace the "or" wth "mples," but n order to do ths (n accordance wth the defnton of mples) there must be a negaton sgn before the varable that precedes the "or." Hence, LT frst replaces the "q" on the rght-hand sde wth "not-q" to get the requred negaton sgn, obtanng (5). Now LT can change the "or" to "mples," and determnes that the resultng expresson s dentcal wth (9). The matchng process allowed LT to proceed drectly down the branch from (1) through (5) to (9) wthout even explorng the other branches. Quanttatvely, t looked at only two expressons nstead of eght, thus reducng the work of comparson by a factor of four. Actually, the savng s even greater, snce the matchng procedure does not deal wth whole expressons, but wth a sngle par of elements at a tme. An mportant source of effcency n the matchng process s that t proceeds componentwse, obtanng at each step a feedback of the results of a substtuton or replacement that can be used to gude the next step. Ths feedback keeps the search on the rght branch of the tree of possble exfgure 5. Proof tree of proof 2.02 by Brtsh Museum algorthm (usng axom 1.3).

V 122 ARTFCAL NTELLGENCE pressons. t s not mportant for an effcent search that the goal be known from the begnnng; t s crucal that hnts of "warmer" or "colder" occur as the search proceeds. 4 Closely related to ths feedback s the fact that where LT s called on to make a substtuton or replacement at any step, t can determne mmedately what varable or connectve to substtute or replace by drect comparson wth the problem expresson, and wthout search. Thus far we have assumed that LT knows at the begnnng that (1.3) s the approprate axom to use. Wthout ths nformaton, t would begn matchng wth each axom n turn, abandonng t for the next one f the matchng should prove mpossble. For example, f t tres to match the theorem aganst axom 1.2, t determnes almost mmedately (on the second test) that "p or p" cannot be made nto "p" by substtuton. Thus, the matchng process permts LT to abandon unproftable lnes of search as well as gudng t to correct substtutons and replacements. MATCHNG N THE SUBSTTUTON METHODS The matchng process s an essental part of the substtuton method. Wthout t, the substtuton method s just that part of the Brtsh Museum algorthm that uses only replacements and substtutons. Wth t, LT s able, ether drectly or n combnaton wth the other methods, to prove many theorems wth reasonable effort. To obtan data on ts performance, LT was gven the task of provng n sequence the frst 52 theorems of Prncpa. n each case, LT was gven the axoms plus all the theorems prevously proved n chap. 2 as the materal from whch to work (regardless of whether LT had proved the theorems tself). 5 Of the 52 theorems, proofs were found for a total 38 (73 per cent). These proofs were obtaned by varous combnatons of methods, but the substtuton method was an essental component of all of them. Seventeen of these proofs, almost a half, were accomplshed by the substtuton method alone. Subjectvely evaluated, the theorems that were proved by 4 The followng analogy may be nstructve. Changng the symbols n a logc expresson untl the "rght" expresson s obtaned s lke turnng the dals on a safe untl the rght combnaton s obtaned. Suppose two safes, each wth ten dals and ten numbers on a dal. The frst safe gves a sgnal (a "clck") when any gven dal s turned to the correct number; the second safe clcks only when all ten dals are correct. Tral-and-error search wll open the frst on the average, n 50 trals; the second n fve bllon trals. 5 The verson of LT used for seekng solutons of the 52 problems ncluded a smlarty test (see next secton). Snce the matchng process s more mportant than the smlarty test, we have presented the facts about matchng usng adjusted statstcs. A noton of the sample szes can be ganed from Table 1. The sample was lmted to the frst 52 of the 67 theorems n chap. 2 of Prncpa because of memory lmtatons of JOHNNAC.

THE LOGC THEORY MACHNE 123 the substtuton method alone have the appearance of "corollares" of the theorems they are derved from; they occur farly close to them n the chapter, generally requrng three or fewer attempts at matchng per theorem proved (54 attempts for 17 theorems). The performance of the substtuton method on the subproblems s somewhat dfferent, due, we thnk, to the knd of selectvty mplct n the order of theorems n Prncpa. n 338 attempts at solvng subproblems by substtuton, there were 21 successes (6.2 per cent). Thus, there was about one chance n three of provng an orgnal problem drectly by the substtuton method, but only about one chance n 16 of so provng a subproblem generated from the orgnal problem. n r t ;. " MATCHNG N DETACHMENT AND CHANNG So far the matchng process has been consdered only as a part of the substtuton method, but t s also an essental component of the other three methods. n detachment, for example, a theorem of form "A mples B" s sought, where B s dentcal wth the expresson to be proved. The chances of fndng such a theorem are neglgble unless we allow some modfcaton of B to make t match the theorem to be proved. Hence, once a theorem s selected from the theorem lst, ts rght-hand subexpresson s matched aganst the expresson to be proved. An analogous procedure s used n the channg methods. We can evaluate the performance of the detachment and channg methods wth the same sample of problems used for evaluatng the substtuton method. However, a successful match wth the former three methods generates a subproblem and does not drectly prove the theorem. Wth the detachment method, an average of three new subproblems were generated for each applcaton of the method; wth forward channg the average was 2.7; and wth backward channg the average was 2.2. For all the methods, ths represents about one subproblem per 7% theorems tested (the number of theorems avalable vared slghtly). As n the case of substtuton, when these three methods were appled to the orgnal problem, the chances of success were hgher than when they were appled to subproblems. When appled to the orgnal problem, the number of subproblems generated averaged eght to nne; when appled to subproblems derved from the orgnal, the number of subproblems generatedfell to an averageof two or three. n handlng the frst 52 problems n chap. 2 of Prncpa, 17 theorems were proved n one step that s, n one applcaton of substtuton. Nneteen theorems were proved n two steps, 12 by detachment followed by substtuton, and seven by channg forward followed by substtuton. Two others were proved n three steps. Hence, 38 theorems were proved n all. There are no two-step proofs by backward channg, snce, for two-step 1 f 1 ;. -, 1 Mm

124 ARTFCAL NTELLGENCE proofs only, f there s a proof by backward channg, there s also one by forward channg. n 14 cases LT faled to fnd a proof. Most of these unsuccessful attempts were termnated by tme or space lmtatons. One of these 14 theorems we know LT cannot prove, and one other we beleve t cannot prove. Of the remanng twelve, most of them can be proved by LT f t has suffcent tme and memory (see secton on subproblems, however). Smlarty Tests and Descrptons Matchng elmnates enough of the tral and error n substtutons and replacements to make LT nto a successful problem solver. Matchng permeates all of the methods, and wthout t none of them would be useful wthn practcal amounts of computng effort. However, a large amount of search s stll used n fndng the correct theorems wth whch matchng works. Returnng to the performance of LT n chap. 2, we fnd that the over-all chances of a partcular match beng successful are 0.3 per cent for substtuton, 13.4 per cent for detachment, 13.8 per cent for forward channg, and 9.4 per cent for backward channg. The amount of search through the theorem lst can be reduced by nterposng a screenng process that wll reject any theorem for matchng that has low lkelhood of success. LT has such a screenng devce, called the smlarty test. Two logc expressons are defned to be smlar f both ther left-hand and rght-hand sdes are equal, wth respect to, (1) the maxmum number of levels from the man connectve to any varable; (2) the number of dstnct varables; and (3) the number of varable places. Speakng ntutvely, two logc expressons are "smlar" f they look alke, and look alke f they are smlar. Consderfor example: (p or q) mples (q or p) p mples (q or p) r mples (m mples r) (1) (2) (3) By the defnton of smlarty, (2) and (3) are smlar, but (1) s not smlar to ether (2) or (3). n all of the methods LT apples the smlarty tests to all expressons to be matched, and only apples the matchng routne f the expressons are smlar; otherwse t passes on to the next theorem n the theorem lst. The smlarty test reduces substantally the number of matchngs attempted, as the numbers n Table 1 show, and correspondngly rases the probablty of a match f the matchng s attempted. The effect s partcularly strong n substtuton, where the smlarty test reduces the matchngs attempted by a factor of ten, and ncreases the probablty of a successful match by a factor of ten. For the other methods attempted matchngs were

TABLE 1 THE LOGC THEORY MACHNE 125 Statstcs of Smlarty Tests and Matchng ] Method Substtuton Detachment forward backward Theorems consdered 11,298 1,591 869 673 Theorems smlar 993 406 200 146 Theorems matched 37 210 120 63 Per cent smlarof theorems consdered 8.8 25.5 23.0 21.7 Per cent matched of theorems smlar 3.7 51.7 60.0 43.2 reduced by a factor of four or fve, and the probablty of a match ncreased by the same factor. These fgures reveal a gross, but not necessarly a net, gan n performance through the use of the smlarty test. There are two reasons why all the gross gan may not be realzed. Frst, the smlarty test s only a heurstc. t offers no guarantee that t wll let through only expressons that wll subsequently match. The smlarty test also offers no guarantee that t wll not reject expressons that would match f attempted. The smlarty test does not often commt ths type of error (correspondng to a type statstcal error), as wll be shown later. However, even rare occurrences of such errors can be costly. One example occurs n the proof of theorem 2.07: p mples (p or p). (2.07) Ths theorem s proved smply by substtutng p for q n axom 1.3: p mples (q or p). (1.3) \ { 1 ; r However, the smlarty test, because t demands equalty n the number of dstnct varables on the rght-hand sde, calls (2.07) and (1.3) dssmlar because (2.07) contans only p whle (1.3) contans p and q. LT dscovers the proof through channg forward, where t checks for a drect match before creatng the new subproblem, but the proof s about fve tmes as expensve as when the smlarty test s omtted. The second reason why the gross gan wll not all be realzed s that the smlarty test s not costless, and n fact for those theorems whch pass the test the cost of the smlarty test must be pad n addton to the cost of the matchng. We wll examne these costs n the next secton when we consder the effort LT expends. Experments have been carred out wth a weaker smlarty test, whch compares only the number of varable places on both sdes of the son. Ths test wll not commt the partcular type error cted above, expres- and (2.07) s proved by substtuton usng t. Apart from ths, the modf- r 1 1! [ M r p!.! <]< r >,j 1 vtt

V 126 ARTFCAL NTELLGENCE caton had remarkably lttle effect on performance. On a sample of ten problems t admtted only 10 per cent more smlar theorems and about 10 per cent more subproblems. The reason why the two tests do not dffer more radcally s that there s a hgh correlaton among the descrptve measures. Effort n LT So far we have focused entrely on the performance characterstcs of the heurstcs n LT, except to pont out the tremendous dfference between the computng effort requred by LT and by the Brtsh Museum algorthm. However, t s clear that each addtonal test, search, descrpton, and the lke, has ts costs n computng effort as well as ts gans n performance. The costs must always be balanced aganst the performance gans, snce there are always alternatve heurstcs whch could be added to the system n place of those beng used. n ths secton we wll analyze the computng effort used by LT. The memory space used by the varous processes also consttutes a cost, but one that wll not be dscussed n ths study. MEASURNG LT s wrtten n an nterpretve language or pseudocode, whch s descrbed n the companon paper to ths one. LT s defned n terms of a set of prmtve operatons, whch, n turn, are defned by subroutnes n JOHNNAC machne language. These prmtves provde a convenent unt of effort, and all effort measurements wll be gven n terms of total number of prmtves executed. The relatve frequences of the dfferent prmtves are reasonably constant, and, therefore, the total number of prmtves s an adequate ndex of effort. The average tme per prmtve s qute constant at about 30 mllseconds, although for very low totals (less than 1000 prmtves) a fgure of about 20 mllseconds seems better. COMPUTNG EFFORT AND PERFORMANCE On a pror grounds we would expect the amount of computng effort requred to solve a logc problem to be roughly proportonal to the total number of theorems examned (.e., tested for smlarty, f there s a smlarty routne; or tested for matchng, f there s not) by the varous methods n the course of solvng the problem. n fact, ths turns out to be a reasonably good predctor of effort; but the ft to data s much mproved f we assgn greater weght to theorems consdered for detachment and channg than to theorems consdered for substtuton. Actual and predcted efforts are compared below (wth the full smlarty test ncluded, and excludng theorems proved by substtuton) on the assumpton that the number of prmtves per theorem consdered s twce as great for channg as for substtuton, and three tmes as great for de-

THE LOGC THEORY MACHNE 127! tachment. About 45 prmtves are executed per theorem consdered wth the substtuton method (hence 135 wth detachment and 90 wth channg). As Table 2 shows, the estmates are generally accurate wthn a few per cent, except for theorem 2.06, for whch the estmate s too low.!» : - - TABLE 2 Effort Statstcs wth "PrecomputeDescrpton" Routne Total prmtves, thousands!, l j There s an addtonal source of varaton not shown n the theorems selected for Table 2. The descrptons used n the smlarty test must be computed from the logc expressons. Snce the descrptons of the theorems are used over and over agan, LT computes these at the start of a problem and stores the values wth the theorems, so they do not have to be computed agan. However, as the number of theorems ncreases, the space devoted to storng the precomputed descrptons becomes prohbtve, and LT swtches to recomputng them each tme t needs them. Wth recomputaton, the problem effort s stll roughly proportonal to the total number of theorems consdered, but now the number of prmtves per theorem s around 70 for the substtuton method, 210 for detachment, and 140 for channg. Our analyss of the effort statstcs shows, then, that n the frst approxmaton the effort requred to prove a theorem s proportonal to the number of theorems that have to be consdered before a proof s found; the number of theorems consdered s an effort measure for evaluatng a heurstc. A good heurstc, by securng the consderaton of the "rght" theorems early n the proof, reduces the expected number of theorems to be consdered before a proof s found.» -, EVALUATON OF THE SMLARTY TEST As we noted n the prevous secton, to evaluate an mproved heurstc, a ccount must be taken of any addtonal computaton that the mprovement ntroduces The net advantage may be less than the gross advantage,

V saaasjsßsaasses 128 ARTFCAL NTELLGENCE or the extra computng effort may actually cancel out the gross gan n selectvty. We are now n a poston to evaluate the smlarty routnes as preselectors of theorems for matchng. A number of theorems were run, frst wth the full smlarty routne, then wth the modfed smlarty routne (whch tests only the number of varable places), and fnally wth no smlarty test at all. We also made some comparsons wth both precomputed and recomputed descrptons. When descrptons are precomputed, the computng effort s less wth the full smlarty test then wthout t; the factor of savng ranged from 10 to 60 per cent (e.g., 3534/5206 for theorem 2.08). However, f LT must recompute the descrptons every tme, the full smlarty test s actually more expensve than no smlarty test at all (e.g., 26,739/22,914 for theorem 2.45). The modfed smlarty test fares somewhat better. For example, n provng (2.45) t requres only 18,035 prmtves compared to the 22,914 for no smlarty test (see the paragraph above). These comparsons nvolve recomputed descrptons; we have no fgures for precomputed descrptons, but the addtonal savng appears small snce there s much less to compute wth the abrdged than wth the full test. Thus the smlarty test s rather margnal, and does not provde anythng lke the factors of mprovement acheved by the matchng process, although we have seen that the performance fgures seem to ndcate much more substantal gans. The reason for the dscrepancy s not dffcult to fnd. n a sense, the matchng process conssts of two parts. One s a testng part that locates the dfferences between elements and dagnoses the correctve acton to be taken. The other part comprses the processes of substtutng and replacng. The latter part s the major expense n a matchng that works, but most of ths effort s saved when the matchng fals. Thus matchng turns out to be nexpensve for precsely those expressons that the smlarty test excludes. Subproblems LT can prove a great many theorems n symbolc logc. However, there are numerous theorems that LT cannot prove, and we may descrbe LT as havng reached a plateau n ts problem solvng ablty. Fgure 6 shows the amount of effort requred for the problems LT solved out of the sample of 52. Almost all the proofs that LT found took less than 30,000 prmtves of effort. Among the numerous attempts at proofs that went beyond ths effort lmt, only a few succeeded, and these requred a total effort that was very much greater. The predomnance of short proofs s even more strkng than the approxmate upper lmt of 30,000 prmtves suggests. The proofs by substtuton

20 15 Q. THE LOGC THEORY MACHNE 129 10 Fgure 6. Dstrbuton of»j LTs proofs by effort. Data 5 nclude all proofs from z attempts on the frst 52 fcljjl theorems n chap. 2 of o 10 20 30 40 50 60 70 80 90 100 frmcpa. Effort, thousonds of prmtves almost half of the total requred about 1000 prmtves or less each. The effort requred for the longest proof 89,000 prmtves s some 250 tmes the effort requred for the short proofs. We estmate that to prove the 12 addtonal theorems that we beleve LT can prove requres the effort lmt to be extended to about a mllon prmtves. From these data we nfer that LTs power as a problem solver s largely restrcted to problems of a certan class. Whle t s logcally possble for LT to solve others by large expendtures of effort, major adjustments are needed n the program to extend LTs powers to essentally new classes of problems. We beleve that ths stuaton s typcal: good heurstcs produce dfferences n performance of large orders of magntude, but nvarably a "plateau" s reached that can be surpassed only wth qute dfferent heurstcs. These new heurstcs wll agan make dfferences of orders of magntude. n ths secton we shall analyze LTs dffcultes wth those theorems t cannot prove, wth a vew to ndcatng the general type of heurstc that mght extend ts range of effectveness. The Subproblem Tree Let us examne the proof of theorem 2.17 when all the precedng theorems are avalable. Ths s the proof that cost LT 89,000 prmtves. t s reproduced below, usng channg as a rule of nference (each channg could be expanded nto two detachments, to conform strctly to the system of Prncpa). (not-q mples not-p) mples (p-m- (theorem 2.17, to be proved) ples q) 1. A mples not-not-^4 (theorem 2.12) 2. p mples not-not-p (subs, p for A n 1 ) 3. (A mples B) mples [(B mples (theorem 2.06) C) mples (A mples C)] 4. (p mples not-not-p) mples [(not- (subs, p for A, not-not-p for not-p mples q) mples (p mples B, q for Cm 3) ( ; x, J,

V 130 ARTFCAL NTELLGENCE 5. (not-not-p mples q) mples (p r- (det. 4 from 3) ples q) 6. (not-a mples B) mples (not-b (theorem 2.15) mples A) 7. (not-q mples not-p) mples (notnot-p mples q) (subs, q for A, not-p for B) 8. (not-? mples not-p) mples (p r- (chan 7 and 5; QED) ples q) The proof s longer than ether of the two gven earler. n terms of LTs methods t takes three steps nstead of two or one: a forward channg, a detachment, and a substtuton. Ths leads to the not surprsng noton, gven human experence, that length of proof s an mportant varable n determnng total effort: short proofs wll be easy and long proofs dffcult, and dffculty wll ncrease more than proportonately wth length of proof. ndeed, all the one-step proofs requre 500 to 1500 prmtves, whle the number of prmtves for two-step proofs ranges from 3000 to 50,000. Further, LT has obtaned only sx proofs longer than two steps, and these requre from 10,000 to 90,000 prmtves. The sgnfcance of length of proof can be seen by comparng Fg. 7, whch gves the proof tree for (2.17), wth Fg. 4, whch gves the proof tree for (2.45), a two-step proof. n gong one step deeper n the case of (2.17), LT had to generate and examne many more subproblems. A comparson of the varous statstcs of the proofs confrms ths statement: the problems are roughly smlar n other respects (e.g., n effort per theorem consdered); hence the dfference n total effort can be attrbuted largely to the dfference n number of subproblems generated. Let us examne some more evdence for ths concluson. Fgure 8 shows the subproblem tree for the proof of (2.27) from the axoms, whch s the only four-step proof LT has acheved to date. The tree reveals mmedately (not-? mples not-jd ) mples {p mples q ) Fgure 7. Subproblem tree of proof by LT of (2.17) (all prevous theorems avalable). A.

THE LOGC THEORY MACHNE 131 1 j why LT was able to fnd the proof. nstead of branchng wdely at each pont, multplyng rapdly the number of subproblems to be looked at, LT n ths case only generates a few subproblems at each pont. t thus manages to penetrate to a depth of four steps wth a reasonable amount of effort (38,367 prmtves). f ths tree had branched as the other two dd, LT would have had to process about 250 subproblems before arrvng at a proof, and the total effort would have been at least 250,000 prmtves. The statstcs quoted earler on the effectveness of subproblem generaton support the general hypothess that the number of subproblems to be examned ncreases more or less exponentally wth the depth of the proof. The dffculty s that LT uses an algorthmc procedure to govern ts generatonof subproblems. Apart from a few subproblems excluded by the type errors of the smlarty test, the procedure guarantees that all subproblems that can be generated by detachment and channg wll n fact be obtaned (duplcatons are elmnated). LT also uses an algorthm to determne the order n whch t wll try to solve subproblems. The subproblems are consdered n order of generaton, so that a proof wll not be mssed through falure to consder a subproblem that has been generated. Because of these systematc prncples ncorporated n the executve program, and because the methods, appled to a theorem lst averagng 30 expressons n length, generate a large number of subproblems, LT must fnd a rare sequence that leads to a proof by searchng through a very large set of such sequences. For proofs of one step, ths s no problem at all; for proofs of two steps, the set to be examned s stll of reasonable sze n relaton to the computng poweravalable. For proofs of three steps, the sze of the search already presses LT aganst ts computng lmts; and f one or two addtonal steps are added the amount of search requred to, Ml.! n; Jl h.1 J p mples Up mples q) mples q\ 1 l f *,1! j, r \- A\r V \ \\h fgure 8. Subproblem tree of proof by / \ LT of (2.27) (usng the axoms). llllllb // 4b b \ \ 1 1;.l!! j 1 1

V 132 ARTFCAL NTELLGENCE fnd a proof exceeds any amount of computng power that could practcally be made avalable. The set of subproblems generated by the Logc Theory Machne, however large t may seem, s exceedngly selectve and rch n proofs compared wth the set through whch the Brtsh Museum algorthm searches. Hence, the latter algorthm could fnd proofs n a reasonable tme for only the smplest theorems, whle proofs for a much larger number are accessble wth LT. The lne dvdng the possble from the mpossble for any gven problem-solvng procedure s relatvely sharp; hence a further ncrease n problem-solvng power, comparable to that obtaned n passng from the Brtsh Museum algorthm to LT, wll requre a correspondng enrchment of the heurstc. Modfcaton of the Logc Theory Machne There are many possble ways to modfy LT so that t can fnd proofs of more than two steps n a way whch has reason and nsght, nstead of by brute force. Frst, the unt cost of processng subproblems can be substantally reduced so that a gven computng effort wll handle many more subproblems. (Ths does not, perhaps, change the "brute force" character of the process, but makes t feasble n terms of effort.) Second, LT can be modfed so that t wll select for processng only subproblems that have a hgh probablty of leadng to a proof. One way to do ths s to screen subproblems before they are put on the subproblem lst, and elmnate the unlkely ones altogether. Another way s to reduce selectvely the number of subproblems generated. For example, to reduce the number of subproblems generated, we may lmt the lsts of theorems avalable for generatng them. That ths approach may be effectve s suggested by the statstcs we have already cted, whch show that the number of subproblems generated by a method per theorem examned s relatvely constant (about one subproblem per seven theorems). An mpresson of how the number of avalable theorems affects the generaton of subproblems may be ganed by comparng the proof trees of (2.17) (Fg. 7) and (2.27) (Fg. 8). The broad tree for (2.17) was produced wth a lst of twenty theorems, whle the deep tree for (2.27) was produced wth a lst of only fve theorems. The smaller theorem lst n the latter case generated fewer subproblems at each applcaton of one of the methods. Another example of the same pont s provded by two proofs of theorem 2.48 obtaned wth dfferent lsts of avalable theorems. n the one case, (2.48) was proved startng wth all pror theorems on the theorem lst; n the other case t was proved startng only wth the axoms and theorem 2.16. We had conjectured that the proof would be more

THE LOGC THEORY MACHNE 133 dffcult to obtan under the latter condtons, snce a longer proof chan would have to be constructed than under the former. n ths we were wrong: wth the longer theorem lst, LT proved theorem 2.48 n two steps, employng 51,450 prmtves of effort. Wth the shorter lst, LT proved the theorem n three steps, but wth only 18,558 prmtves, one-thrd as many as before. Examnaton of the frst proof shows that the many "rrelevant" theorems on the lst took a great deal of processng effort. The comparson provdes a dramatc demonstraton of the fact that a problem solver may be encumbered by too much nformaton, just as he may be handcapped by too lttle. We have only touched on the possbltes for modfyng LT, and have seen some hnts n LTs current behavor about ther potental effectveness. All of the avenues mentoned earler appear to offer worthwhle modfcatons of the program. We hope to report on these exploratons at a later tme. We have provded data on the performance of a complex nformaton processng system that s capable of fndng proofs for theorems n elementary symbolc logc. We have used these data to analyze and llustrate the dfference between systematc, algorthmc processes, on the one hand, and heurstc, problem-solvng processes, on the other. We have shown how heurstcs gve the program power to solve problems n a reasonable computng tme that could be solved algorthmcally only n large numbers of years. Fnally, we have assessed the lmtatons of the present program of the Logc Theory Machne and have ndcated some of the drectons that mprovement would have to take to extend ts powers to problems at new levels of dffculty. Our exploratons of the Logc Theory Machne represent a step n a program of research on complex nformaton processng systems that s amed at developng a theory of such systems, and applyng that theory to such felds as computer programmng and human learnng and problemsolvng. --.( Concluson K:!.1