SiCoTHEO—Simple competitive parallel theorem provers based on SETHEO

SiCoTHEO—Simple competitive parallel theorem provers based on SETHEO

Parallel Processing for Artificial Intelligence 3 J. Geller, H. Kitano and C.B. Suttner (Editors) 9 1997 Elsevier Science B.V. All rights reserved. 2...

776KB Sizes 10 Downloads 54 Views

Parallel Processing for Artificial Intelligence 3 J. Geller, H. Kitano and C.B. Suttner (Editors) 9 1997 Elsevier Science B.V. All rights reserved.

231

SiCoTHEO Simple Competitive Parallel Theorem Provers based on SETHEO* Johann Schumann

Institut fiir Informatik, Technische UniversitSt Miinchen email: s chumann@inf ormat ik. tu-muenchen, de In this paper, we present SiCoTHEO, a collection of parallel theorem provers for first order predicate logic. SiCoTHEO is based on the sequential prover SETHEO. Parallelism is exploited by competition: on each processor, an identical copy of SETHEO tries to prove the entire formula. However, certain parameters which influence SETHEO's behavior are set differently for each processor. As soon as one processor finds a proof, the entire system is stopped. Three different versions of SiCoTHEO are presented in this paper. We have used competition on completeness mode (parallel iterative deepening, SiCoTHEO-PID), on completeness bounds (a parameterized combination of bounds, SiCoTHEO-CBC), and on the search mode (top-down combined with bottom-up, SiCoTHEO-DELTA). The experimental results were obtained with a prototypical implementation, running on a network of workstations. This parallel model is fault-tolerant and does not need communication during run-time (except start and stop messages). We found that only little efficiency is gained for SiCoTHEO-PID, which reaches peak performance with only 4 processors. SiCoTHEO-CBC and SiCoTHEO-DELTA, however, showed significant speed-up, and improved performance up to the 50 processors used. 1. I n t r o d u c t i o n Automated theorem provers, like many other AI tools must explore large search spaces. This leads to long (and often too long) run-times of such systems. One possibility to reduce run-time is the exploitation of parallelism. Automated theorem proving in general seems to be suitable for parallelization, as many implemented parallel theorem provers show (see [12] for an extensive survey). Current trends lead away from special purpose, tightly coupled multiprocessor systems, often with shared memory. Much more interesting and feasible seem to be networks of workstations, connected by a local area network (e.g., Ethernet, ATM). Such a hardware configuration is readily available in many places. It features processing nodes with high processing power and (comparatively) large resources of local memory and disk space. The operating system (mostly UNIX) allows multi-tasking and multi-user operation, features *This work is supported by the Deutsche Forschungsgemeinschaft within the Sonderforschungsbereich 342, Subproject A5: PARIS (Parallelization in Inference Systems).

232 not necessarily available on a multiprocessor machine. The underlying communication principle is message passing. Common data (e.g., the formula to be proven) can be kept in file-systems which are shared between the processors (e.g., by NFS). However, the bandwidth of the connection between the workstations is comparatively low and the latency for each communication is rather high. Models of parallelism which are ideally suited for such networks of workstations must therefore obey the following requirements: small (or ideally no) necessary communication between the processors and no dependency on short latencies. A parallel model which fulfills these requirements is competition: each processor tries to solve the entire problem, using different methods or parameters. As soon as one processor finds a solution, the entire system can be stopped. In a competitive system there is no need for communication except for the start and stop messages. Competitive parallel models have been studied in various approaches (cf. [3] and [12] for competitive parallel theorem provers). A competitive parallel theorem prover based on the sequential prover SETHEO [7,6] is RCTHEO (Random Competition) [2]. Here, the search is controlled by a pseudo-random number generator which is initialized with a different number on each processor. Therefore, when the system has been started, the search space is processed in a different way on each processor. A detailed evaluation of the RCTHEO system can be found in [3]. In this paper, we will focus on models where competition is accomplished by a different setting of parameters of the proof algorithm for each processor. This is in contrast to RCTHEO which exploits parallelism by randomizing the proof algorithm. Also, we only discuss systems, which are based on one sequential prover (homogeneous competition). In a heterogeneous system, different theorem provers (e.g., OTTER, METEOR, SETHEO, ... ) could compete for a proof. ~ This paper proceeds as follows: First, we will give a short introduction to the sequential theorem prover SETHEO which is the basis for SiCoTHEO. Next, we define competition between homogeneous processes and discuss important properties of competitive theorem provers, such as efficiency, scalability, soundness and completeness. Then, we will describe which parameters are suitable for competition for SETHEO, and we will sketch the basics of the prototypical implementation of all SiCoTHEO systems on a network of workstations. Finally, we describe in detail the different SiCoTHEO provers and present results of experiments. In the conclusions, we will summarize the paper and give an outlook on future work.

2. S E T H E O

SETHEO is a sequential theorem prover for first order predicate logic. Since it is the basis of SiCoTHEO, we will shortly describe SETHEO's proof procedure and the parameters which influence the search. For details on SETHEO, the reader is referred to [7,5,6]. The proof calculus underlying SETHEO is Model Elimination [8], a sound and complete tableau based calculus. The input formula for SETHEO is a set of clauses consisting of literals (e.g., p(X), ~q(a)) which in turn are negated or unnegated atoms. An example is shown below in Figure 1.

233 The system tries to refute the given set of clauses by constructing a tableau, a tree with nodes labeled by literals of the formula, as shown in the example below. The root node of the tableau is always marked by c, comprising the empty tableau. All direct child-nodes of a given inner node belong to one clause. In order to see, if a refutation has already been found, we have to look for complementary pairs of literals. Two literals are complementary, if they have opposite signs, but are otherwise equal to each other. A branch in the tree is said to be closed, if the path from the root to the current leaf node contains at least one pair of complementa .ry literals. A formula is unsatisfiable, if and only if there exists a tableau with all its branches closed. Given a set of clauses, SETHEO tries to construct a closed Model Elimination tableau by continuously applying the following rules: S t a r t S t e p Given an empty tableau (with only the root node e), one may select an arbitra .ry clause ("start clause") and add its literals as the children of the root node to the tableau. Per default, SETHEO behaves like PROLOG and selects such start clauses with negative literals only (e.g., clause (1) in Figure 1). E x t e n s i o n S t e p Given is a tableau with an open leaf node L ("subgoal"). If there exists a literal K in one of the clauses of the formula which is unifiable with L, yielding a complementary pair of literals, then all literals of that clause may be appended to the tableau as the children of L, and the leaf K is marked closed. Note, that all other literals of that clause (if there are any) cause new open branches in the tableau. R e d u c t i o n S t e p If there exists a leaf node L and an ancestor K of it, such that L and K are unifiable and yield a complementary pair, then the leaf node can be marked as closed. The substitutions which are necessa .ry in the extension and reduction steps are applied to the entire tableau. Figure 1 shows a formula and a corresponding closed tableau. Variables are written as upper-case characters (e.g., X, Y). In this example, variable X is substituted by a, and Y by b.

Formula: ~p(X, Y) V -~q(Y, X)

p(a, b) v -~q(b, a) q(b, a) V ~p(a, b)

p(~, b) v q(b, ~) Substitution:

x\~,Y\b

A

(1)

^

(2)

A

(3)

(4)

p(a, b) q(b, ~) *

~q(b, a) p(~, b) *R

q(b, a)

-p(a, b) p(a, b) *

q(b, ~) *R

Figure 1. A closed Model Elimination tableau. An asterisk indicates a closed branch; a "*R" marks a branch closed by a reduction step.

234 Of course, there may be more than one way to construct a tableau out of a given formula, since there are many possibilities for selecting a rule or clause for extension. Some of these tableaux can eventually be closed, whereas others cannot be closed at all. This shows t h a t we have to search for a closed tableau. Formally, we can depict the search by a tree, the Model Elimination OR-search tree (OR-tree for short). Each node of the tree is labeled by a tableau, the root is the empty tableau. If we select an open node of a tableau T, we may construct new tableaux out of T by applying all possibilities for an extension or reduction step. These new tableaux become the child nodes of T. If such a rule leads to a closed tableau, we may stop with "proof found". If no extension or reduction step is possible at this point, the node will be marked by a "FAIL". Within SETHEO, the OR-tree and its tableaux are constructed in a depth-first, left-toright manner and its nodes are traversed by backtracking. This search, however, can be varied by setting parameters of SETHEO (e.g., which clause to try next for an extension step). In order to obtain completeness, we apply depth-first iterative deepening. This means that we set a certain bound which must not be violated during the search. Typical bounds in S E T H E O are the maximal depth of a tableau (A-literal depth), or the maximal number of leaves in a tableau (inference bound). If a proof cannot be found with the given bound, the bound is increased and the search starts again. In general, the size of the search space increases tremendously when the bounds are incremented. Therefore, a variety of techniques have been implemented which t.ry to reduce the size and complexity of the search space. These techniques contain pruning methods (realized as constraints), and additional inference rules in order to use intermediate results ("lemmata", "fold-up"). For a detailed description of these techniques which can be turned on and off individually via parameters see [6].

3. Parallelism by Competition Given is a sequential theorem proving algorithm 2 fl.(7~1,..., 7~) where Pi are parameters which may influence the behavior of the system and its search (e.g., completeness bounds or pruning methods). Then, a homogeneous competitive theorem prover running on P processors is defined as follows: on each processor p (1 _ p < P ) a copy of the sequential algorithm . A ( P [ , . . . , P~) tries to prove the entire given formula. Some (or all) parameters P~' are set differently for each processor p. All processors start at the same time. As soon as one processor finds a solution, it is reported to the user and the other processors are stopped ("winner-takes-all strategy"). The efficiency of the resulting competitive system strongly depends on the influence the respective parameter setting has on the search behavior of the system. The larger the difference, created by the values of 7~, the higher the probability that one processor finds a proof very fast (assuming, of course that there exists a proof). If the influence of the parameters on the search is only weak, all processes will have a run-time (i.e., the time needed to find a proof) which is quite similar. Then, the efficiency will be ve.ry low (the 2The definition of a parallel competitive system can easily be generalized to any search algorithm. An algorithm, suitable for competition takes a problem as its input and tries to solve it. If a solution exists, the algorithm eventually must terminate with a message "solution found". Furthermore, the algorithm must be controlled by parameters which influence its behavior.

235 speed-up is about 1). Good scalability and efficiency can be obtained only, if there are enough different values for a parameter, and if no good default estimation to set that parameter is known. Only then, a large number of processors can be employed reasonably. A good default estimation for a parameter will in general result in poor speed-up values for a competitive system. Then, a different model of parallelization, (e.g., partitioning as in PARTHEO [10] or SPTHEO [13]) will be appropriate. When we want to develop and evaluate competitive parallel theorem provers, we additionally have to address the following issues:

S o u n d n e s s : although our parallel theorem provers are based on the sound sequential prover SETHEO, care must be taken that we obtain only correct proofs with SiCoTHEO. This means in particular that only such combinations of parameter settings are allowed which retain the correctness of the proof algorithm.

C o m p l e t e n e s s : the entire system must be complete, i.e., we must not loose any proofs which can be found with the sequential prover. Since our system is intended to run on a network of workstations (where processors or communication links may fail), the competitive system should be complete even in cases with a reduced number of processors (fault-tolerance). This can be easily accomplished by using complete search strategies on each processor. Note that with a standard partitioning scheme, like OR-parallelism (e.g., PARTHEO [10]), a failure of processing elements would lead to an incomplete system.

In order to evaluate the parallel system, we compare the run-times of the parallel system with the sequential one for a given formula to be proven. As the sequential reference, we use SETHEO (V3.2) with its default parameters. Its run-time for a given example is denoted by Tseq. The run-time of the parallel system Tll (for the same formula) is the time until one of the processors has found a proof, Til = minp(Ta(~,~,.....~,~)). Based on the run-times, we can define the speed-up s and efficiency rl of SiCoTHEO for a given formula in the usual way: s = Tseq/TiI and ~ = s/P where P is the number of processors. In contrast to many parallel algorithms (e.g., numeric computations), the speed-up s obtained varies strongly from example to example. This is due to the complex behavior of the search algorithm which depends on the input formula. Therefore, a reasonable evaluation of a parallel system can be given only, if measurements with a large number of different examples are made. In this paper, we present a graphical representation of the ratio 711 over T~eq for each measurement; a representation which allows to make reasonable estimates of the system's behavior even in cases of varying speed-ups. In general, it is rather difficult to give a good estimation of a mean value for the speedup over a set of examples, especially in cases where the speed-up shows a high variance. Therefore, different definitions of mean values yield quite different results (see e.g. [4]). For our measurements, we give four common mean values where si is the speed-up obtained

236 with example #i: ga = ~1 ~ i si

gg = r

...-sn

Sh = v~nl/s,

(arithmetic mean) (geometric mean) (harmonic mean)

#t = ~-,i Tseq(i)/ F-,i TII(i) (waiting times)

The arithmetic mean is often too optimistic, resulting in a mean value too large, because a few large values of si are taken into account too much. On the other hand, the harmonic mean gives rather low values, because examples with speed-up values near 1 are considered too much. Often, the geometric mean is considered appropriate, yielding results between the other two mean values (s-a _> s-g > S-h). #t relates the time needed to solve all problems (one after the other) with one processor to the time needed with p processors. This measure is especially useful for applications of the theorem prover, where one proof obligation after the other is to be solved. Then #t represents the ratio of the "waiting time" for the user, before the prover has finished all examples. The run-times given in this paper are those of the SETHEO Abstract Machine, including the time to load the compiled formula. The times needed to start and stop the system are not considered here. For a discussion of these times, see Section 5. All proof attempts (sequential and parallel) have been aborted after a maximal run-time of Tm~ = 300s (for SiCoTHEO-PID, Tmax = 300s is used). All times are CPU-times and are measured on HP-750 workstations with a granularity of 1/60 seconds. 4. P a r a m e t e r Competition for S E T H E O The Model Elimination Calculus and SETHEO's proof procedure can be parameterized in several ways. Table I shows a number of typical ways for modifying the basic algorithm. For each parameter, common values are shown. Values which are default for SETHEO are given in bold-face. The selection function determines which clause and literal is to be taken next, the search mode determines in which way the OR-tree is explored. Additional inference rules ( "fold-up" and "unit-lemmata") allow SETHEO to use intermediate results (lemmas). Finally, completeness mode and completeness bound determine SETHEO's search strategy.

parameter Selection function Search mode addt'l inference rules completeness modes completeness bounds

values as in formula/random/heuristically ordered top-down/bottom-up/combination

none/fold-up/unit-lemmata i t e r a t i v e d e e p e n i n g / o t h e r fair strategies depth/#inferences/#copies/combinations

Table 1 Basic parameters for SETHEO's calculus and proof-procedure. Values shown in bold-face are default values for SETHEO.

237 Given the parameters and their possible values from Table 1, a number of different paralleltheorem provers , based on competition could be designed. , In the following, we will focus on three parallel competitive systems, based on SETHEO. Since they compete on rather simple settings of parameters, the system is called SiCoTHEO (Simple Competitive provers based on SETHEO). The three systems compete via different completeness modes ("parallel iterative deepening", SiCoTHEO-PID), via a combination of completeness bounds (SiCoTHEO-CBC), and a combination of top-down and bottomup processing (SiCoTHEO-DELTA). Before we go into details of each prover, we sketch the common prototypical implementation for all SiCoTHEO provers. 5. P r o t o t y p i c a l Implementation SiCoTHEO is running on a (possibly heterogeneous) network of UNIX workstations. The control of the proving processes, the setting of the parameters and the final assembly of the results is accomplished by the tool pmake [1]. This implementation of SiCoTHEO has been inspired by a prototypical implementation of RCTHEO [2]. Pmake is a parallel version of make, a software engineering tool used commonly to generate and compile pieces of software given their source files. Pmake exploits parallelism by exporting as many independent jobs as possible to other processors. Hereby it assumes that all files are present on all processors (e.g., via NFS). Pmake stops, as soon as all jobs are finished or an error occurs. In our case, however, we need a "winner takes all strategy" which stops the system, as soon as one job is finished. This can be accomplished easily, by adapting SETHEO so that it returns "error" (i.e., a value :/: 0), as soon as it found a proof. Then pmake aborts all actions per default. In contrast, the implementation of RCTHEO had to transfer the output generated by all provers to the master processor. There, a separate process searched for success messages. This resulted in heavy network traffic and long delays. A critical issue in using pmake is its behavior w.r.t, the load of workstations: as soon as there is activity (e.g., keyboard entries) on workstations used by pmake, the current job will be aborted (and restarted later). Therefore, the number of active processors (and even the start-up times) can vary strongly during a run of SiCoTHEO.

6. Evaluation and Results In this section we look in detail at the results, obtained with the three different versions of SiCoTHEO. The experiments have been carried out on a network of HP-750 workstations, connected via Ethernet. All formulae for the experiments have been taken from the TPTP-problem libra .ry [11].

6.1. S i C o T H E O - P I D Parallel iterative deepening is one of the simplest forms of competition: each processor explores the search space to a specific bound. Assume we have to perform iterative deepening over the A-literal depth as the completeness bound. Then processor i (1 < i _< P) explores the search space to a depth i. If that processor could not find a proof with the given bound i, it starts the search again with bound i + P, i § 2P, and so on. This parallel scheme for iterative deepening (written in a C-like notation below) assures

238 completeness with a limited number of processors: all values for the depth bound are used by a processor eventually, while no two processors work with the same bound. f o r i = 1 , 2 . . . . . P i n p a r a l l e l do on p r o c e s s o r i do f o r k = 0 , 1 , 2 . . . . do depth_bound = i + k,P; s e t h e o (depth_bound) Due to time and resource restrictions, results on SiCoTHEO-PID have been obtained by evaluating existing run-time data 3 of SETHEO. Figure 2A shows the resulting mean values of the speed-up for different numbers of processors. As can be seen immediatedly, the variance of the speed-up values is extremely high. This fact results in a high arithmetic mean, whereas the gg, .Oh and st are ve.ry close to 1. This behavior can also be seen in Figure 2B which shows the ratio of 711 over Tseq for each example, using 5 processors. The speed-up s is always _> 1 since the entire search space (which has to be searched in the sequential case) is partitioned.

A

~

s-a

21

700

.o

B

600

or

500

16

m"

T, 400

g 11

...................

300

~

200

:4-h

100

Ce 9

,P

.0.o .s

,I

.m

...........

6

1 1

.......... 1 2 3 4 5 6 7 8 9 P

0

~ , ,. ~,o 0

"

.....~~~

200 300 460 56o 660 760

T~eq

Figure 2. SiCoTHEO-PID: A: mean speed-up values for different numbers of processors P (x for ffa, o for fig, 9 for fib, o for gt). The dotted line marks linear speed-up s = P. B: parallel run-time 711 over sequential run-time Tseq for P - 5. The dotted line corresponds to s --- 1, the solid line to s -- P.

Furthermore, it is evident from Figure 2A that SiCoTHEO-PID is not scalable. The speed-up values reach a saturation level already with 4 processors. Adding more processors 3The data have been obtained by running SETHEO (V3.1) on all examples of the TPTP [11] with Tmax = 1000s [13]. For our experiments, we have selected all examples which have a run-time Tseq <_ 1000s on a HP-750.

239 does not increase the speed-up any more. This behavior is obvious, since about two third of the examples (67%) could be solved with a depth bound of 3 or 4. The number of examples which need a higher depth (and thus can sensefully occupy more processors) is rather low, as the histogram in Figure 3 shows.

1000

8O0 48% 19% 12% 10% 6OO number of samples 4OO

2OO

0

, 3

i 4

i ] l 5 6

I

I

,........__1

7

I

I

I

8

9

10

I

|

I

|

11 > 12

depth

Figure 3. Number of examples with a proof found with A-literal depth d over the depth d. Numbers are % of the total number of 858 samples.

Although in many cases, high speed-up values can be obtained, SiCoTHEO-PID should be used in applications only where deep proofs are expected. 6.2. S i C o T H E O - C B C

The completeness bound which is used for iterative deepening determines the shape of the search space and therefore has an extreme influence on the run-time the prover needs to find a proof. There exist many examples, for which a proof cannot be found using iterative deepening over the depth of the proof tree within reasonable times, whereas iterative deepening over the number of inferences almost immediatedly reveals a proof, and vice versa. 4 In general, during iterating over the depth of the proof-tree balanced trees are preferred. The growth of the search space per iteration level, however, is extremely high. On the other hand, the inference bound first tries rather unbalanced and deep trees. Here, the search often reaches areas with unmanageable search spaces. In order 4This dramatic effect can be seen clearly in e.g. [7], Table 3.

240 to level both extremes, R. Letz 5 proposed to combine the A-literal-depth bound d with the inference bound imoz. When iterating over depth d, the inference bound i m ~ is set according to imoz = d o where r/is the mean length of the clauses. For our experiments, however, we take a slightly different approach by using a quadratic polynomial:

imam, = ad 2 + fld where a,/3 E R+. 6 This polynomial approximates the structure of a tableau: a tableau (a tree) with a given depth do has do < i < #ao inferences (leaf nodes), where # is the maximal number of literals per clause. Hence, we estimate the number of inferences in the tableau to be i = x a~ for some x < # and allow the prover to search for tableaux with at most i inferences by setting ima~ "= i. A Taylor development of this formula leads to i = 1 + ~ dko(logx)k/k!. Since, in most cases, x is ve.ry close to 1, we only use the linear and quadratic terms, finally obtaining our quadratic polynomial. SiCoTHEO-CBC (ComBine Completeness bounds) explores a set of parameters (or,/3) in parallel by assigning different values to each processor. For the experiments we selected 0.1 < c~ < 1 and 0 < / 3 < 1. In our first experiment (Experiment 1) we used 50 processors with the following parameter settings: Pl" (0.1,0.0)

(0.1,0.2)

...

(0.1,0.S)

(0.2, 0.0)

(0.2, 0.2)

...

(0.2, 0.s)

(1.0,0.0}

(1.0,0.2}

...

Ps0" (1.0,0.8)

Note, that this grid does not reflect the architecture of the system. It rather represents a two-dimensional arrangement of the parameter values. For Experiment 2, Experiment 3 and Experiment 4, the number of processors was reduced to 25, 9, and 4 respectively by equally thinning out the grid. For all experiments, a total of 92 different formulae from the T P T P have been used. 48 examples show a sequential run-time Tseq of less than one second. 36 of the remaining examples have a sequential run-time which is higher than 100 seconds. Although measurements have been made with all examples, we do not present the results for those with run-times of less than one second. In that case, the resulting speed-up (ga = 1.57 for P = 50) is by far outweighted by the time, SiCoTHEO needs to export proof tasks to other processors. In a real application, this problem could be solved by the following strategy: first, start one sequential prover with a time-limit of I second. If a proof cannot be found within that time, SiCoTHEO-CBC would start exporting proof tasks to other processors. Table 2 (first group of rows) shows the mean values for all three experiments. These figures can be interpreted more easily when looking at the graphical representation of the ratio between Tseq and 7il , as shown in Figure 4. Each 9 represents a measurement with one formula. The dotted line corresponds to s = 1, the solid line to s = P, where P is the number of processors. The area above the dotted line contains examples where the parallel system is slower than the sequential prover, i.e., s < 1. Dots below the solid 5Personal communication. ~For a - 0,/~ = 1 we yield inference-bounded search, a = oo, ~ = c~ corresponds to depth-bounded search.

241 line (with a gradient of l/P) represent experiments which yield a super-linear speed-up

s>P. Figure 4 shows that even for few processors a large number of examples with superlinear speed-up exist. This encouraging fact is also reflected in Table 2 which exhibits good average speed-up values for 4 and 9 processors. For our long-running examples and P = 4 or P = 9, ~g is even larger than the number of processors. This means that in most cases, a super-linear speed-up can be accomplished. Table 2 furthermore shows that with an increasing number of processors, the speedup values are also increasing. However, for larger numbers of processors (25 or 50), the efficiency 77 = siP is decreasing. This means that SiCoTHEO-CBC obtains its peak efficiency with about 15 processors and thus is only moderately scalable.

Experiment ..

SiCoTHEO-CBC SiCoTHEO-CBC SiCoTHEO-CBC SiCoTHEO-CBC Experiment

s-a s-g S-h gt ..

4 P=4 61.21 5.92 2.12 2.06 7

,,

P=4 ,,

SiCoTHEO-DELTA SiCoTHEO-DELTA SiCoTHEO-DELTA SiCoTHEO-DELTA Experiment

s-a

SiCoTHEO-DELTA+ SiCoTHEO-DELTA+ SiCoTHEO-DELTA+ SiCoTHEO-DELTA+

s-a s-g S-h ~t

gg 8-h

gt ..

18.31 4.49 1.34 1.42 10 P=5 18.39 5.15 1.43 1.83

3 P=9 77.30 12.38 3.37 2.93 6 P=9 63.78 10.89 2.00 2.45 9 P=10 63.84 12.07 2.92 3.47

2 P=25 98.85 18.18 4.34 3.84 5

1 P=50 101.99 19.25 4.41 3.88

P=25 76.46 15.96 2.76 3.16 8 P=26 76.50 16.97 3.71 4.13

Table 2 SiCoTHEO: Mean values of speed-up for different numbers of processors P. The number of examples is 44.

6.3. S i C o T H E O - D E L T A

The third competitive system which will be considered in this paper affects the search mode of the prover. SETHEO normally performs a top-down search. Starting from a goal, Model Elimination Extension and Reduction steps are performed, until all branches of the tableau are closed. The DELTA iterator [9], on the other hand, generates small tableaux, represented as unit clauses in a bottom-up way during a preprocessing phase.

242 P - 9

300

9 9 9

:

200

.....:

P -- 25

300

...:

P - 50

300

..:

9

200

200

7il 100

,s ..."

. 9o

9

100

s

711 100

s""

~~ ~

9

o ..-'"

."

9

9

..,----

' ~ ' ~ - - ' : : - " = "~ 0 100 200 300

T~q

o 9

0~ 0

_~o~,~_,

100

200

T.~q

9

0

300

9

,

0

100

200

--,

300

T~q

Figure 4. Parallel run-time 711 over sequential run-time Tseq for SiCoTHEO-CBC and different numbers of processors.

For example, the left subtree of Fig. 1 corresponds to the clause p(a, b). These unit clauses are added to the original formula. Then, in the main proving phase, SETHEO works in its usual top-down search mode. The generated unit clauses now can be used to close open branches of the tableau much earlier, thus combining top-down with bottom-up processing. This decrease of the proof size can reduce the amount of necessary search dramatically. On the other hand, adding new clauses to the formula increases the search space. In cases, where these clauses cannot be used for the proof, the run-time to find a proof increases (or a proof cannot be found within the given time limit). Thus, adding too many (or useless) clauses has a strong negative effect. The DELTA preprocessor has various parameters to control its operation. Here, we focus on two parameters: the number of iteration levels l, and the maximally allowable term depth td. l determines how many iterations the preprocessor executes. The number of generated unit clauses increases monotonically with 1. The term depth td of a term is the maximal nesting level of function symbols in that term. E.g., td(a) -- 1, td(f(a, f(b, c))) -3. In order to avoid an excessive generation of unit clauses, the maximal term depth td of any term in a generated unit clause can be restricted. Furthermore, DELTA is configured in such a way that a maximum number of 100 unit clauses are generated to avoid excessively large formulas. For our experiments, we use competition on the parameters l and td of DELTA. The resulting formula is then processed by SETHEO, using standard parameters (iterative deepening over A-literal depth). Hence, execution time in the parallel case consists of the time needed for the bottom-up iteration TDELTAplus that needed for the subsequent topdown search Ttd. As before, the overall execution time of the abstract machine, including the time to load the formula is used. With I E { 1 , 2 , . . . , 5 } and td E ( 1 , 2 , . . . , 5 } , a total of 25 processors are used. Figure 5 shows the ratio between Tseq and TII for all examples. Again, we va.ry the number of processors (4, 9, 25). Our experiment has been carried out with the same examples as in the previous section (Tseq > ls). Table 2 (middle section, Experiments 5-7) shows the

243 numeric values for the obtained speed-up.

P=4 200

P=9

.....""""

t ....i

200

.....i.""""*

TH 1oo

..,.. .-'"

0

P=25

....

$

200

Tll .

1oo

100

200

Zseq

300

.-""

m...~..---'-~

0 L-..~-~-= ~ . . . . 0 100 200 300 9

0

:,.

Zseq

Figure 5. Parallel run-time TII over sequential run-time different numbers of processors.

0 0

:

.

i!" 100

.. 200

300

Zseq

Tseq for

SiCoTHEO-DELTA and

In general, the speed-up figures obtained with SiCoTHEO-DELTA show a similar behavior as those for SiCoTHEO-CBC. This can be seen in Figure 5 and Table 2 (Experiments 5-7). Here, however, there are several cases in which the parallel system is running slower than the sequential one. The reason for this behavior is that the additional unitclauses are not useful for the proof and increase the search space too much. This negative effect can easily be overcome by using an additional processor which runs the sequential SETHEO with default parameters. The resulting speed-up figures are shown in Table 2 (Experiments 8-10, rows marked by SiCoTHEO-DELTA+). It is obvious that in this case the speed-up will always be greater or equal to 1. Although the arithmetic mean is not influenced dramatically, we can observe a considerable increase in the geometric mean. This fact indicates a "smoothing effect" when the additional processor is used. The scalability of SiCoTHEO-DELTA is also relatively limited. This is due to the coarse controlling parameters of DELTA. The speed-up and scalability could be increased, if one succeeded in producing a greater variety of preprocessed formulas. 7. C o n c l u s i o n s In this paper, we have presented a parallel theorem prover based on the theorem prover SETHEO. Parallelism is exploited by homogeneous competition. Each processor in the network is running SETHEO and tries to prove the entire formula. However, on each processor, a different set of parameters influence the search of SETHEO. If this influence results in large variations of the run-time, good speed-up values can be obtained. In this work, we compared three different systems based on this model: SiCoTHEO-PID performs parallel iterative deepening, SiCoTHEO-CBC combines two different completeness bounds using a parameterized function, and SiCoTHEO-DELTA combines the traditional top-

244 down search of SETHEO with the bottom-up preprocessor DELTA, where the parameters of DELTA are the basis for competition. Since the search space is partitioned by the prover, the speed-up values of SiCoTHEOPID are always larger than one. However, only little efficiency could be obtained, and SiCoTHEO-PID's peak performance is reached with only 4 processors. In general, good efficiency and reasonable scalability can be obtained only, if there are enough different values for a parameter, the parameter setting strongly influences the behavior of the prover, and if there is no good default estimation for that parameter. Both the combination of search-bounds and the combination of search modes have been shown to be appropriate for competition. The scalability of both systems is still relatively limited. The implementation of SiCoTHEO, using pmake combines simplicity and high flexibility (w.r.t. the network and modifications) with good performance. In many cases, superlinear speed-up could be obtained. Future enhancements of SiCoTHEO will certainly incorporate ways to control DELTA'S behavior more subtly. Furthermore, a combination of SiCoTHEO-DELTA with SiCoTHEO-CBC and heuristics control using Neural Networks will increase the overall efficiency and scalability of SiCoTHEO substantially. Finally, experiments with SiCoTHEO can reveal, how to set the parameters of sequential SETHEO and DELTA in an optimal way. REFERENCES

1. A. de Boor. PMake - A Tutorial. Berkeley Softworks, Berkeley, CA, Janua~. 1989. 2. W. Ertel. OR-Parallel Theorem Proving with Random Competition. In Proceedings of LPAR '92, pages 226-237, St. Petersburg, Russia. Springer LNAI 624, 1992. 3. W. Ertel. Parallele Suche mit randomisiertem Wettbewerb in Inferenzsystemen. Series DISKI 25. Infix, St. Augustin, 1993. 4. W. Ertel. On the Definition of Speedup. In PARLE, Parallel Architectures and Languages Europe, pages 289-300. Springer, 1994. 5. C. Goller, R. Letz, K. Mayr, and J. Schumann. SETHEO V3.2: Recent Developments (System Abstract) . In Proc. of Conference on Automated Deduction (CADE) 12, pages 778-782. Springer LNAI 814, 1994. 6. R. Letz, K. Ma.yr, and C. Goller. Controlled Integration of the Cut Rule into Connection Tableau Calculi. Journal of Automated Reasoning, 13:297-337, 1994. 7. R. Letz, J. Schumann, S. Ba.yerl, and W. Bibel. SETHEO: A High-Performance Theorem Prover. Journal of Automated Reasoning, 8:183-212, 1992. 8. D.W. Loveland. Automated Theorem Proving: a Logical Basis. North-Holland, 1978. 9. J. Schumann. D E L T A - A Bottom-up Preprocessor for Top-Down Theorem Provers. System Abstract. In Proc. of Conference on Automated Deduction (CADE) 12, pages 774-777. Springer LNAI 814, 1994. 10. J. Schumann and R. Letz. PARTHEO: a High Performance Parallel Theorem Prover. In Proc. of Conference on Automated Deduction (CADE) 10, pages 40-56. Springer LNAI 449, 1990. 11. G. Sutcliffe, C.B. Suttner, and T. Yemenis. The TPTP Problem Library.. In Proc. of Conference on Automated Deduction (CADE) 12, pages 252-266. Springer LNAI 814,

245 1994. 12. C.B. Suttner and J. Schumann. Parallel Automated Theorem Proving. In Parallel Processing for Artificial Intelligence, pages 209-257. Elsevier, 1994. 13. C.B. Suttner. Static Partitioning with Slackness. Series DISKI. Infix, St. Augustin, 1995.

246 Johann Schumann

Johann Schumann studied computer science at the Technische Universit~it Miinchen (TUM) from 1980 to 1986. Then he joined the research group "Automated Reasoning" at the TUM and worked on the development of sequential and parallel theorem proving. In 1991 he obtained his doctoral degree with a thesis on efficient theorem provers based on an abstract machine. 1991-1992 he worked in industry as project manager in the area of network management and control systems. His current research interests include sequential and parallel theorem proving and application of automated theorem provers in the area of Software Engineering.