Performance analysis of common bus multimicroprocessor systems

Performance Analysis of Common Bus Multimi~roprocessor Systems P. A. Grasso, T. S. Dillon, and K. E. Forward Monad Wniversity, Clayton, Australia Pe...

Download PDF

1005KB Sizes 139 Downloads 82 Views

Report

PDF Reader
Full Text

Performance Analysis of Common Bus Multimi~roprocessor Systems P. A. Grasso, T. S. Dillon, and K. E. Forward Monad

Wniversity, Clayton, Australia

Performance analysis of a common bus multimicroprocessor system is given, utilizing bus interference and speedup as measures. Three mathematical models are developed, one a discrete time queueing model, the second a Markov model, and the third an approximate-probability model. Comparison of results obtained with hardware measurements and a computer simulation help calibrate these mathematical models.

1. INTRODUCTION

Many of the currently available multiprocessor systems models consider an idealized system of processor and memory modules interconnected via a cross-bar switch, an example of this being the C.mmp multiprocessor. The models assume a synchronous system with the memory cycle being the basic unit of time. Important quantities in systems of this type are the memory access and rewrite times, instruction-processing time, and memory-access distribution. Measures of performance include the average number of busy memories, memory idle time, and instruction execution rate. Systems of this type are modelled in references [5, 6, 10, 13, 14, 16, 17, 18, and 191 using Markov chain techniques and other methods. As system size grows, these models become increasingly complex. Approximation models, which attempt to decrease this complexity, are considered by Baskett and Smith [5], H~gend~rn [8] and Speelenning and Nievergelt [ 151. For several reasons [3], assumptions upon which these models are based are not applicable to dedicated real-time multimicroprocessors in which we are interested [ 1,2]. These include the fact that multimicropr~ cessor systems need not be synchronous, the basic unit of time in these should be the processor clock cycle, in-

Address c~rres~ndence to P. A. Gtosso, department of Efecrricaf Engineering, Monash University. Clayton, Australia.

The Journal 8

Elsevier

of Systems Science

and Software

Publishing

Co.,

I. 2: Inc.;

7 l-79

1986

occurs while accessing data or I/O devices via the shared bus, and the measure of performance should be the increase in execution time caused by interference. More recently [9, 11, 12, 20, 21, 221, models have appeared that consider a system of processor modules and memory modules interconnected by one or more time-shared common buses. References [9] and [ 111 assume exponential distributions for the time between common memory accesses and the length of each access. Hoener et al. used a conditional probability for a system in which the priority order of bus access of processors is fixed and requests occur on memory access cycle boundaries. In general, this is not the case. In addition, the models become too complex, as the system size is increased, necessitating approximate solutions. terference

2. FORMULATiON INTERFERENCE

OF THE MEMORY PROBLEM

In this paper, several memory-interference models are developed for the multimicropr~essor system described in [l, 21. Architectural features relevant to the memory-interference problem are summarized as follows. A multimicroprocessor system is assumed to consist of a number of microcomputers, common memory modules, and I/O devices interconnected via a single time-shared common bus as shown in Figure 1. Each micropr~essor has its own private memory and is capable of fully independent operation. The private memories are used for storing programs, whereas the common or global memories are used for storing data. In this way the processors are able to communicate and therefore cooperate in the execution of a single program. Common-memory modules and I/O devices are accessed (or controlled) by the processors via the common bus. Memory interference occurs when more than one processor attempts to access the common bus at the same time. The arbitration logic, which interfaces the processors to the 71

(I 986) 0164-I212/86jS3.50

72

P. A. Grasso, T. S. Dillon, and K. E. Forward

u COrnlnOrl

mW?tOry

1 1 If0

request cycle; and 4) Requests for access to the common bus may occur in any cycle.

II

3. MEMORY-INTERFERENCE

MODELS

Three models of memory interference are developed and presented here. The first model is an accurate solution to the problem stated in the previous section, whereas the other two are approximations. Although the approximations do not give exact results, they are less complex and therefore easier to use. The exact model, however, can still be used for systems of any size. Each model calculates I, the fractional increase in execution time due to interference, and S, the program speedup as compared to a uniprocessor. These are defined as [ 151 Figure 1. System

architecture. I=

common bus, ensures that all but one of the requesting processors is “locked out” until the common bus is free, and the rest are queued. Immediately, the common bus is free, and it grants access to one of the queued processors. This has the effect of reducing execution speed for queued processors and therefore results in an overall increase in execution time. Because I/O devices are mapped onto the common bus, there is no need to distinguish between bus access to memory or I/O devices, Each of the processors operates asynchronously. However, the arbitration logic is synchronized by the clock of the currently accessing processor. As to queues waiting processors, additional clock cycles are inserted into their memory cycle, thus delaying them by an integral number of clock cycles. The models, therefore, are not appIicable to processors wherein the memory cycle is extended by freezing the clock. Once granted, the bus will be retained for a fixed number of clock cycles equal to the time required for a single memory access. Multiprocessor programs consist of a set of tasks related by data dependencies and can be represented by a precedencegraph that is used to control the execution order of the tasks. Because some of the tasks are run in parallel, an increase in execution speed is obtained over a uniprocessor. System synchronization is obtained by ensuring that each processor does not start on its next task until the previous set of parallel executable tasks have been completed. Each of these tasks is assumed to have the following characteristics: 1) Each task runs for the same large but finite time; 2) Assuming no interference, the probability that a processor is accessing the common bus is r and this is the same for each processor; 3) Assuming no interference, each access to the common bus requires M clock cycles with the first being a

SE

exec. time with interference - 1 and exec. time with no interference exec. time on uniprocessor exec. time on multiprocessor ’

(1) (2)

It can be shown [3], that s=--J.--.

N

1+ 1

=A

1’

N N, - N, ’

where N, = number of processors within the system and N, 5: average number of processors waiting for common bus access. 4. DISCRETE-TIME SINGLE-QUEUE MODEL The model considers the queue of processors waiting for access to the common bus and uses the processor clock as the basic unit of time. In order to determine the number of active processors within the system, program execution is divided into nonaccess periods, wherein no processors are using the common bus, and access periods, wherein one or more processors are using or waiting for the common bus. Let T’ and T be the average lengths of the nonaccess period and access period, respectively. These periods alternate during program execution as processors request and release the common bus. Thus, once the system is in steady state, it is only necessary to consider a single cycle that lasts for T + T’ clock periods. A single processor requires the bus for a fraction of time = r. The average number of active processors, S, and interference I, are given by

s=‘._ r

T TfT’

Common Bus Multimicroprocessor Systems

73

Therefore, to find I or S we need to find T and T’. First consider T’, the average Iength of a nonaccess period. Let Prob(processor does not request common bus access on its next cycle ] not accessing common bus this clock

9=

quantities are given by binomial distributions and thus we have

l 1 NP 1,

p,=‘

nonaccess time nonaccess time + request time

it can be shown that [3),

The time spent requesting the common bus is one clock cycle per access, and thus the request time is proportional to r/M, which gives l-r q=l-r+r/M=

M(1 - r) r+M(l

(7)

-r)’

Thus, because the processors

(13)

1 - q’?

cycle) =

‘0 i . (, - q)’

‘4

f

are independent,

we

have Prob(nonaccess period next clock cycle 1nonaccess period this clock cycle) = q’+‘.

(8)

T;

=

M

+

(/;NP;‘) . (q”

g

‘)““-‘-“(I

- qM ‘Y . X,,,

1ii

1

,

(141

where X,, = average clock cycles to the end of an access period if there are i processors and j processors want to begin the first cycle of an access (an access period ends in the clock cycle in which no processors want to begin access). An expression for it is given in [ 31.

It can readily be shown [ 131 that

P(t) = Prob(nonaccess period lasts at least t clock cycies), = (qVP)‘.

(9)

From this we obtain Prob(nonaccess per&i lasts exactly t clock cycles) = P(t).Prob(access common memory in the t + I-th cycle) = (qNp)‘(t - qhp).

(10)

This is a geometric distribution and because its mean is equal to T’ we have

Next we consider T, the length of an access period. Now an access period begins in a clock cycle in which any processor requests access to the common bus, and in this clock cycle anywhere from 1 to Np processors may simultaneously request access to the common bus. Thus, T can be found by considering the average length of an access period started by y simultaneous requests. Summing over these we obtain T = 2 P,T,, i-h where P, =

T, =

5. ~ARKOV

CHAIN MODEL

During common-memory access, the system state at the next clock cycle will depend upon the number of new memory requests during this access (that is, during the last M clock cycles). The system, therefore, is an M-th order Markov process that makes its analysis difficult. To simplify the analysis, to that of a first-order Markov process, we assume: 1) The basic unit of time is taken to be M clock cycles, which tend to synchronize the processors. Thus the probability that a processor requests access to the common bus in any time unit is r; and 2) The system is synchronous, in that all zero clock-cycle boundaries counted module M are aligned and all processors begin a new cycle at this time. These two assumptions make the model independent of M. Np states numbered from t to Np can be identified, and if the system is in state i then i - I processors are waiting on access to the common bus. Probability that the system goes from state i to state j is P,, and it can be shown that [3]

(12) P,, = i

“1” (1 -- r)“’ ’ . r + (1 -- r)ip 1

Probfaccess period started by y simultaneous requests) and

P,, = 0 j < i -

average length of such an access period.

p,, =: (yp_,i:

I;

j’) (I - r)Lp ‘(r)’ ‘.(.

(1% (16) (17)

Now P _ Prob(exactly y processors out of Np request access) ‘iProb(at least I processor requests access)

Because a given processor either does or does not request common bus access on the next clock cycle, these

From these the transition matrix can be written down and r,, the steady-state probability that the system is in state i, can be found. Now if the system is in state i, there are i - 1 processors waiting for access to the common bus. Thus, by summing over all states, the av-

74

P. A. Grasso, T. S. Dillon, and K. E. Forward

erage number of waiting processors, NW, can be found and this is given by (18) It should be noted that this model is only an approximate solution and produces results independent of M. Therefore, as memory references are synchronized, the model can be expected to yield high values of interference, particularly as M is increased. And because it involves the solution of Np simultaneous equations (for the calculation of the ?ri’s), it becomes time consuming for large values of Np.

6. APPROXIMATE-P~BABILITY

MODEL

The approximate somtion developed in this section yields results that are too high, setting an upper bound on 1. Firstly, considering a two-processor system, because the processors are independent, P,,, the probability that both processors require the common bus at the same time is approximately r* (this is only an approximation because it neglects the effect of interference). Because the number of extra clock cycles ranges from M, if both request in the same clock cycle, to 1 if one requests during the last access cycle of the other, Ti, the average extra time due to interference is given by clock cycles M-t1 = ~ 2M

memory cycles.

(19)

Therefore, the fractional increase in time due to interference for a two-processor system is MS1

I = P, 1 T, = 2M

. r’.

We derive the solution for n processors under the assumption that only significant interference occurs between two processors at a time. Thus the total increase in execution time can be found by considering the increase due to interference between all combinations of processors taken two at a time, to give 1 =r NP 2 7M+l i !

r’.

system described in [ 1, 21. instruction, execution times for the 8085 vary from four clock cycles to 18 clock cycles and thus, depending on the mix of instructions in a program, memory references for data can be assumed to occur at any clock cycle. Each memory reference requires three clock cycles but because the arbitration logic requires one additional cycle to reset it’s internal state, common bus access requires four clock cycles. Because the available hardware has nine processors, interference measurements are available for Np ranging from two to eight (processor 0 is the system controller and does not execute programs) and M = 4. Measurements were made on programs with a set number of common-memory accesses randomly distributed throughout a large program loop. The MOV, M,A instruction was used for these memory accesses and because this instruction requires seven clock cycles, only three of which are for data access, the maximum possible value of r is 0.42. In practice, results were obtained for values of r up to 0.4. Interference was found by measuring the execution time of each processor using a logic analyser on the control lines to the master processor. First the time for each processor on its own was measured and then the time for each processor with all running was measured, allowing the increase in execution time to be found. These results were then averaged over several inde~ndent runs. In addition to the difficulty in obtaining measurements for large values of r, several other differences exist between the actual hardware and the assumptions made in the models and these are: 1) In the real system if a processor is not fetching data from common memory it is fetching an opcode from private memory. The minimum time for an opcode fetch is four clock cycles and at least one opcode fetch must separate two data fetches. This will tend to cause the system to synchronize and thus reduce interference; 2) The arbitration system used in the actual hardware guarantees common bus access to all processors within a finite time; when simultaneous requests occur access ‘is always granted to the lower-numbered processors, which tend to execute slightly faster and reduce interference, and 3) In general, the tasks of real programs will not have the same value of r as assumed. For the interference measurements, however, this was the case.

(21)

8. SIMULATION OF MEMORY INTERFERENCE 7. HARDWARE MEASUREMENTS OF MEMORY INTERFERENCE

The memory-interference models described in the previous section were evaluated by comparing results obtained from them with hardware measurements on the

Due to difficulty in obtaining hardware measurements and also to investigate the effects of the differences between the assumptions of the models and the real hardware, a multiprocessor system was simulated for values of Np up to eight. This simulation is written in FORTRAN and has several options that cater to the mod-

Common Bus ~ultimicropr~essor

75

Systems

elling of either the real hardware or the different model

Table 1. I from Model 1, M = 4

assumptions.

Number of Drocessors. N,

Each processor must spend a fraction of time = r accessing the common memory. For Simulation 1, which follows, the assumptions of the model’s commonmemory requests were scheduled using Time to next request = T clock cycles

= truncated value of !!!-? Inq ’ where X = q =

(22)

random variable uniformly distributed over0 to 1, Prob (does not request cycle [ not accessing now),

3

4

5

6

1

8

.04 .08 .I2 .I6 .20 .24 .28 .32 .36 .40

0.0008

0.0017 0.0074 0.0179 0.0343 0.0576 0.0890 0.1295 0.1802 0.2414 0.3135

0.0027 0.0121 0.0308 0.0616 0.1081 0.1730 0.2581 0.3633 0.4864 0.6237

0.0037 0.0177 0.0475 0.1003 0.1831 0.2997 0.4480 0.6203 0.8075 1.0025

0.0059 0.0245 0.0698 0.1551 0.2911 0.4749 0.6910 0.9229 1.1607 1.4001

0.0062 0.0328 0.0998 0.2318 0.4358 0.6886 0.9616 1.2402 1.5200 1.8000

0.0076 0.0431 0.1403 0.3342 0.6106 0.9214 0.2402 1.5600 1.8800 2.2000

0.0034 0.0079 0.0146 0.0235 0.0351 0.0494 0.0668 0.0875 0.1119

in the next Table 2. I from Model 2

M(1 - r) = r+M(l

2

r

-c)’

This expression for T is obtained by noting that the probability of requests after T clock cycles, P(T), is P(T) = qT(l - q),

(23)

where P(T)/(l - q) is a random variable distributed over 0 to I If it cannot make two consecutive common-memory accesses and, if it is the case for Simulation 2, which follows the real hardware, the time to the next commonmemory access is scheduled using T = time remaining in current opcode cycle + A . g

Number of processors, N,

_

, (24)

where q must be modified from Expression (9) to account for the opcode executive time. The value of N used for each simulation run was 10,000, which ensures that the system reaches steady state and M was taken as four clock cycles because this corresponds to the real hardware system. The figures for I are a result of averaging the increase in execution time over twenty simulation runs corresponding to 200,000 clock cycles for each processor. Results were obtained for Np ranging from 2 to 8 with r up to 0.4. In all cases 90% confidence intervals were obtained using the t distribution [7, p. 1211.

r

2

3

4

5

6

7

8

.04 .OS .I2 .I6 .20 .24 .28 .32 .36 .40

0.0008 0.0035 0.0081 0.0150 0.0244 0.0365 0.0516 0.0700 0.0919 0.1176

0.0017 0.0075 0.0183 0.0353 0.0595 0.0920 0.1340 0.1861 0.2487 0.3217

0.0027 0.0123 0.0314 0.0632 0.1109 0.1772 0.2636 0.3694 0.4922 0.6286

0.0038 0.0180 0.0485 0.1024 0.1866 0.3041 0.4523 0.6237 0.8098 1.0037

0.0049 0.0249 0.0711 0.1578 0.2946 0.4781 0.6931 0.9239 1.1610 1.4003

0.0062 0.0333 0.1014 0.2345 0.4384 0.6900 0.9621 1.2404 1.5201 1.8000

0.0076 0.0437 0.1421 0.3365 0.6119 0.9218 1.2402 1.5600 1.8800 2.2000

for Simulation 2, which follows the hardware. The results shown give the 90% confidence interval. Figure 2 is a plot of results for Model 1 and Simulation 1 for several values of r. The effect of the difference between the real hardware and the model can be seen in Figure 3, which is a graph of results from hardware measurements, Simulation 1, and Simulation 2 for Np = 2. Figure 4 presents similar results for r = 0.2 and Np from 1 to 8. Another performance measure that can be found from the models is the speedup or average number of active processors. Figure 5 shows speedup ob-

Table 3. I from Model 3, M = 4 Number of processors, N,

9. COMPARISON OF RESULTS Table 1, Table 2, and Table 3 show the results obtained for each of the models. They give the fractional increase in execution time as both Np and r are varied. For Models 1 and 3 the value of M was 4. Model 2, which relaxes some of the assumptions of Section 2, is independent of M. Table 4 and Table 5 show the results from the two simulations. The figures in Table 4 are for Simulation 1, which follows the models, and Table 5 is

r

2

3

4

5

6

7

8

.04 .08 .I2 .I6 .20 .24 .28 .32 .36 .40

0.0010 0.0040 0.0090 0.0160 0.0250 0.0360 0.0490 0.0640 0.0810 0.1000

0.0030 0.0120 0.0270 0.0480 0.0750 0.1080 0.1470 0.1920 0.2430 0.3000

0.0060 0.0240 0.0540 0.0960 0.1500 0.2160 0.2940 0.3840 0.4860 0.6000

0.0100 0.0400 0.0900 0.1600 0.2500 0.3600 0.4900 0.6400 0.8100 1.0000

0.0150 0.0600 0.1350 0.2400 0.3750 0.5400 0.7350 0.9600 1.2150 1.5000

0.0210 0.0840 0.1890 0.3360 0.5250 0.7560 0.0290 1.3440 1.7010 2.1000

0.0280 0.1120 0.2520 0.4480 0.7000 1.0080 1.3720 1.7920 2.2680 2.8000

76

P. A. Grasso, T. S. Dillon, and K. E. Forward

Table 4. 90%

Confidence Interval

for I, Simulation

1

Numberof processors, N, r .04 .08 .I2 .16 .20 .24 .28 .32 .36 .40

Yore!

2

3

4

S

6

7

8

0.0007 0.0010 0.0031 0.0036 0.0071 0.0081 0.0144 0.0155 0.0222 0.0242 0.0333 0.0355 0.0479 0.0502 0.0652 0.0679 0.0869 0.0905 0.1092 0.1123

0.0015 0.0019 0.0073 0.00~0 0.0175 0.0193 0.0335 0.0350 0.0560 0.0590 0.0854 0.0890 0.1295 0.1352 0.1735 0.1788 0.2371 0.2430 0.3071 0.3158

0.0025 0.0031 0.0111 0.0121 0.0305 0.0326 0.0607 0.0637 0.1042 0.1084 0.1669 0.1747 0.2519 0.2601 0.3537 0.3649 0.4734 0.4835 0.6131 0.6257

0.0036 0.0040 0.0173 0.0185 0.0452 0.0477 0.0964 0.1021 0.1769 0.1828 0.2899 0.3009 0.4360 0.4506 0.6048 0.6199 0.7882 0.8010 0.9817 0.9957

0.0048 0.0054 0.0235 0.0250 0.0687 0.0721 0.1500 0.1572 0.2829 0.2904 0.4666 0.4816 0.6714 0.6842 0.8993 0.9144 1.1357 1.1538 1.3775 1.3934

0.0059 0.0065 0.0320 0.0344 0.0955 0.1013 0.2272 0.2387 0.4230 0.4315 0.6669 0.6812 0.9337 0.9508 1.2137 1.2312 1.4885 1.5083 1.7610 1.7799

0.0076 0.0084 0.0418 0.0447 0.1339 0.1425 0.3287 0.3391 0.5909 0.6050 0.8380 0.9135 1.2046 1.2272 1.5212 1.5407 2.8238 1.8438 2.1457 2.1702

tamed from Model I with M = 4. The vaIue of M is usually set by the type of processors and memories used and is not under the control of the system designer. Thus, in the previous results M = 4 was used, because this corresponds to the existing hardware. Table 6 gives results from Model 1 for M = 2 and M = 16. By comparing Tables 1 and 4 and from Figure 2 it can be seen that Model 1 and Simulation 1 agree very closely. This is to be expected because Model 1 is an exact solution to the memory-interference problem as

Table 5. 90%

Confidence Interval Number

r .04 .08 .I2 .I6 .20 .24 .28 .32 .36 .40

for I, Simulation

2

Np

of processors,

2

3

4

5

6

7

8

0.0008 0.0011 0.0027 0.0033 0.0074 0.0082 0.0131 0.0142 0.0211 0.0227 0.0308 0.0322 0.0421 0.0436 0.0557 0.0577 0.0708 0.0738 0.0887 0.0907

0.0016 0.0020 0.0070 0.0078 0.0168 0.0181 0.0323 0.0340 0.0529 0.0550 0.0794 0.0818 0.1142 0.1170 0.1569 0.1603 0.2044 0.2085 0.2608 0.2652

0.0023 0.0028 0.0115 0.0125 0.0286 0.0301 0.0542 0.0581 0.0970 0.1005 0.1563 0.1610 0.2256 0.2310 0.3138 0.3203 0.4102 0.4189 0.5187 0.5260

0.0037 0.0041 0.0165 0.0177 0.0434 0.0456 0.0921 0.0969 0.1631 0.1707 0.2649 0.2721 0.3902 0.3970 0.5289 0.5381 0.6747 0.6818 0.8301 0.8376

0.0044 0.0049 0.0228 0.0242 0.0631 0.0665 0.1405 0.1470 0.2654 0.2734 0.4209 0.4313 0.6027 0.6105 0.7849 0.7951 0.9714 0.9817 I.1569 1.1671

0.0057 0.0065 0.0313 0.0336 0.0916 0.0956 0.2130 0.2197 0.3825 0.3939 0.6035 0.6162 0.8210 0.8323 0.0395 1.0504 1.2707 0.2836 1.5030 1.5168

0.0069 0.0090 0.0390 0.0420 0.1280 0.1330 0.3010 0.3110 0.5928 0.5419 0.7891 0.8000 1.0480 1.0680 1.3220 1.3340 1.5880 1.5950 1.8500 0.8680

:

Simulation

1

0,l 5

1.0

2.0

3.0

NUMBER Figure 2.

4.0 OF

I

I

,

I

5.0

6.0

7.0

8.0

PROCESSORS

Results for Model I and Simulation

1.

stated in the second section. Comparing Tables 1 and 2, it can be seen that Model 2 gives consistently higher values for interference than does Model 1. This is to be expected, because Model 2 is independent of M and thus the minimum waiting time is one memory cycle, whereas the minimum waiting time in Model 1 is one clock cycle. Model 3 is only a very approximate solution to the memory-interference problem; however, from Table 3 it can be seen that it gives satisfactory results, particularly for light bus loadings, As the value of M increases, the minimum waiting time for processors delayed at the common bus will decrease, thus interference can also be expected to decrease. This is confirmed by Table 6, which shows lower interference for M = 16 than for M = 2. This effect becomes less marked at higher bus loadings, because the processors are now delayed for several memory cycles. As the value of M increases the system can be expected to approach the coutinuous time case. The effect of difference between the assumptions in the second section and the real hardware can be seen by comparing Tables 4 and 5 and from Figures 3 and

Common Bus ~ultimicroprocessor

77

Systems

0.12

IO. RECAPITULATION 0.11

Simulation

1

Simulation

2 *-*-.-

In this paper three models of memory interference have been evaluated. These models are specifically tailored to performance analysis of time-shared bus multimicroprocessor systems and yield the systems speedup or the increase in execution time due to interference at the common bus. The first model is a discrete-time queueing-theory model and gives the most accurate results, whereas the others trade off accuracy versus complexity. Comparison of the results from each of the models and a simulation of a multiprocessor system show that all three models give reasonable results for light bus loadings. As expected, the results from the simpiified model become less accurate as the * bus loading increases. The results obtained from these models have also been compared with measurements on the multiprocessor system described earlier. Due to differences between model assumptions and the hardware, the results from the models do not exactly follow the hardware measurements but give an upper bound on interference, thus they yield conservative results for system speedup.

0.10

I-

0.07 1

c /1

0.06

0.05

r

r

f

1

0.04

/’

1

(I

0.03 _

/'

0.02

/'

0.01 I/,,,, 01 0

0.1

0.2

0.3

0.4

Figure 4. Results from hardware and simulations. Figure 3. Results from hardware and simulations.

4. it can be seen that Simulation 1 and the three models

give larger values of interference than observed in the real system, particularly at higher bus loadings. This is to be expected, because the arbitration system used in the hardware favors lower-number processors, causing them to appear to execute faster. Thus, the models set an upper bound on memory interference and results obtained from them will tend to be conservative. Results of this type are useful to the system designer as an aid to determining the optimum number of processors for a given application. By using graphs such as Figure 5, conclusions on system performance can quickly be drawn. NpXr is the interference-free bandwidth of the common bus. Because this cannot be greater than unity this gives the maximum value of Np before the bus saturates as Np

= 1, r

0.5

Simulation

i

Simulation

2

--

X

Hardware

t 0.4

_

0.2 _

O!_

t-26)

Thus, the expected maximum value of Np for r = 0.2 is 5, and from Figure 4 we get a maximum useful Np of 6 with a speed of 4.5.

0, 0

l'.O 2.0

I

L

I

3.0

4.0

5.0

NLJM8ER

I

6.0

OF PROCESSORS

,

I

7.0

8.0

9.0

P. A. Grasso, T. S. Dillon, and K. E. Forward

78

s-o-

2. P. A. Grasso, K. E. Forward, and T. S. Dillon, Operating Systems for a 10 Dictated Common Memory Multiprocessor System, IEE Proc. E. Comput. & Digital

8.0 _

7.0

Tech. 129, 200-206, 1982. 3. P. A. Grasso, K. E. Forward, and T. S. Dillon, Memory

Interference in MuItimi~ropro~essor Systems with a Time-Shared Bus, IEE Pro&. E. Comput. & Digital

_

Tech. 131, 61-68, 9.0

_

J a. 2

3.0,

i g

4.0

_

3.0 _

2.0 _

1.0 _

OL

’

0

1.0

L

2.0

8

I

3.0

4.0

I

5.0

I

6.0

,

7.0

I

8.0

1 9.0

NUMSEff OF PROCESSORS Figure 5. Speedup from Model 1.

Table 6. I from Model 1,

M = 2 and M = X6

r = 0.2 M=2

NP 2 3 4 5 6 I 8

0.0239 0.0584 0.1093 0.1846 0.2926 0.4369 0.6111

M = 16 0.0232 0.0568 0.1069 0.1816 0.2897 0.4348 0.6101

r = 0.4 M-2 0.1146 0.3172 0.6259 1.0030 1.4002 1.8000 2.2000

M = 16 0.1092 0.3097 0.6217 1.0021 1.4001 1.8000 2.2000

ACKNOWLEDGMENTS The authors would like to thank the General Manager of Telecom, the Computer Research Board, and the Monash University Special Grants Scheme for the financial support for the work reported in this paper. REFERENCES 1. P. A. Grasso, K. E. Forward, and T. S. Dillon, Hardware design of a parallel processor for control applications, I. E. Aust. Conf Digital System Design. 1980, pp. 38-41.

1984.

4. H. Ashcroft,

The Productivity of Several Machines under the Care of One Operator, Royal Stat. Sot. J., Series B, Vol. 12, 145-151, 1950. 5. F. Baskett and A. J. Smith, Interference in Multiprocessor Computer Systems with Interleaved Memories, Commun. ACN 19,327-334, June 1976. 6. D. P. Bhandarhar, Analysis of Memory Interference in Multiprocessors, IEEE Trans. Computer C-24, 897908, Sept. 1975. 7. G. S. Fishman, Principles of Discrete Event Simulation, Wiley, New York, 1978. 8. C. H. Hoogendoorn, A General Model for Memory Interference in Multiprocessors, IEEE Trans. Computer C-26, 998-1005, Oct. 1977. 9. M. A. Marsan and M. Gerla, Markov Models for Multiple Bus Multiprocessor Systems, IEEE Trans. Computer C-3 1, 239-248, March 1982. 10. R. C. Pearce and J. C. Majithia, Analysis of a Shared Resource MIMD Computer Organization. IEEE Trans. Computers C-27, 64-67, Jan. 1978. 11. D. A. Protopapas and E. J. Smith, Modeling and Analysis of Single and Multiple Bus Multi-Microcomputer Systems, IEEE Compcon, 47 l-478, 1980. 12. V. K. Ravindran and T. Thomas, Characterization of Multiple Microprocessor Networks, IEEE Compcon, 133-137,1973. 13. K. V. Sastry and R. Y. Kain, On the Performance of Certain Multiprocessor Computer Organizations, IEEE Trans. Computer C-24, 1066-1074, Nov. 1975.

14. A. S. Sethi and N. Deo, Interference in Multiprocessor Systems with Localized Memory Access Probabilities, IEEE Trans. Computers C-28, 157-163, Feb. 1979. 15. B. Speelenning and J. Nievergelt, A Simple Model of Processor-Resource Utilization in Networks of Communicating Modules, IEEE Trans. Computers C-28, 927-929, Dec. 1979. 16. T. Lang, M. Valero, and I. Alegre, Bandwidth of Crossbar and MultipIe-Bus Connections for Multiprocessor Systems, IEEE Trans. Computers C-31, 1227-1234, Dec. 1982. 17. M. Valero, J. M. Llamberia, J. Labarta, E. Sanvicente, and T. Lang, A performance evaluation of the multi-bus network for multiprocessor systems, Proc. 1983 ACM Sigmetrics Conf on Meas and Mod Comput. Systems.

Aug. 1983, pp. 200-206. 18. D. Towsley, An approximate analysis of multiprocessor systems, Proc. I983 ACM Sigmetrica Conf on Meas and Mod Comput. Syst.. Aug. 1983, pp. 207-213. 19. K. B. Iran and I. H. Onyuksel, A Closed-Form Solution

Common Bus Multimicroprocessor Systems for the Performance Analysis of Multi-bus Multiprocessor Systems, IEEE Trans. Computers C-33, 1004-1012, Nov. 1984. 20. S. Hoener and W. Roehder, Efficiency of a Multi-microprocessor Systems. with time shared buses. Eurmicro 1977, pp. 35-42. 21. M. Ajmone, G. Balbno, and G. Conte, Comparative

79 Performance Analysis of Single Bus Multiprocessor Architectures, IEEE Trans. Computers C-3 1, I 179- 119 1, Dec. 1982. 22. E. Sanvicente, M. Valero, T. Lang, and 1. Alegre, Exact and approximate models for multiprocessor systems with single bus and distributed memory. MIMI ‘82, Paris, June 1982, pp. 15-18.

Performance analysis of common bus multimicroprocessor systems

Performance analysis of common bus multimicroprocessor systems

Recommend Documents